[postgis-devel] Re: [postgis-users] Possible index stats problem?

strk at refractions.net strk at refractions.net
Tue Sep 14 04:48:13 PDT 2004


Mark. Note that Alexander estimates are better with the default
statistics target. The problem is not the selectivity estimate,
but the cost estimate itself. I've tried on my own data:

Seq Scan on plmshp02_1
(cost=0.00..19.14 rows=26 width=81)
(actual time=0.54..61.68 rows=75 loops=1)
Total runtime: 61.83 msec
 
Index Scan using plmshp02_1_gist on plmshp02_1
(cost=0.00..81.53 rows=26 width=81)
(actual time=0.39..19.22 rows=75 loop=1)
Total runtime: 19.37 msecs

Compare the 'cost' estimate and actual runtime.
cost=0.00..19.14 => runtime 61.83 ms
cost=0.00..81.53 => runtime 19.37 ms

It runs faster when cost estimate is higher !!
Alexander's report also show this.

--strk;

On Tue, Sep 14, 2004 at 11:54:42AM +0100, Mark Cave-Ayland wrote:
> (Alexandre, I've CC'd this to you as I'm not sure if you're on the
> postgis-devel mailing list)
> 
> > -----Original Message-----
> > From: strk at refractions.net [mailto:strk at refractions.net] 
> > Sent: 14 September 2004 08:22
> > To: Mark Cave-Ayland; postgis-devel at postgis.refractions.net
> > Subject: Re: [postgis-devel] Re: [postgis-users] Possible 
> > index stats problem?
> > 
> 
> (lots cut)
> 
> > 
> > I get it back.
> > There is no change in cost estimation changing stawidth....
> > The cost estimation is performed by gistcostestimate() 
> > function, which is defined by standard postgresql 
> > distribution and in turn calls the default cost estimation 
> > routine. Now I need to know how to use a custom one (or patch 
> > the gist one). --strk;
> 
> 
> Hi strk,
> 
> I don't think that you need to be looking at changing gistcostestimate()
> since this is  concerned with producing functions that work at the
> higher index AM level; so any changes made here would affect all GiST
> index accesses and not just those concerned with geometries.
> 
> In fact, scanning through postgis_estimate.c and postgis.sql.in, I've
> noticed that in fact postgisgistcostestimate() and
> genericcostestimate2(), which are replacement index AM cost functions
> and so can't be used in PostGIS anyway, aren't even referenced! I can
> only assume that Dave used them as a debugging aid when writing his
> first cut of the index selectivity code.
> 
> It might be worth looking at the default settings in postgresql.conf to
> see the effect of changing some of the cost parameters; for example, I
> find that the default value of random_page_cost = 4 grossly
> overestimates the cost of grabbing an index from the tuple on modern
> hardware - I tend to reduce this to either 1 or 2 which has the effect
> of reducing the overall cost of the index scan to the planner.
> 
> I'm also reasonably sure that this is not caused by using a sample of
> geometries rather than the entire set; if this were the case, I would
> expect to see wildly differing query plans for the same query with
> repeated ANALYZE statements; in any case, Alexandre reports that there
> are only 5260 rows in the table, and from testing here, I've still seen
> good results with ~300k geometries using the default statistics target
> of 10.
> 
> >    ->  Index Scan using municipios_gidx on municipios  
> > (cost=0.00..475.56 rows=52 width=0) (actual time=75.169..6712.530 
> > rows=413 loops=1)  Total runtime: 6718.763 ms
> 
> Now this part of the estimate seems really wrong; 52 estimated versus
> 413 actual. The only thing I can think of is that some characteristic of
> the data is affecting the estimates, for example perhaps it contains
> polygons which have a large area of overlap in a small area, or maybe
> the data consists of large irregular polygons/linestrings?
> 
> To solve this, I think we need to ask Alexandre for a copy of his data
> off-list so we can look at exactly how the selectivity code is using
> this data in much greater detail.
> 
> 
> Kind regards,
> 
> Mark.
> 
> ---
> 
> Mark Cave-Ayland
> Webbased Ltd.
> Tamar Science Park
> Derriford
> Plymouth
> PL6 8BX
> England
> 
> Tel: +44 (0)1752 764445
> Fax: +44 (0)1752 764446
> 
> 
> This email and any attachments are confidential to the intended
> recipient and may also be privileged. If you are not the intended
> recipient please delete it from your system and notify the sender. You
> should not copy it or use it for any purpose nor disclose or distribute
> its contents to any other person.
> 



More information about the postgis-devel mailing list