[postgis-devel] Re: [postgis-users] Possible index stats problem?

Tue Sep 14 05:07:41 PDT 2004

Mark suggestion was correct, changing random_page_cost
modifies sequencial and index scan costs changing
their relation.
Thanks Mark.
I think this should be noted in the documentation...

--strk;

On Tue, Sep 14, 2004 at 01:48:13PM +0200, strk at refractions.net wrote:
> Mark. Note that Alexander estimates are better with the default
> statistics target. The problem is not the selectivity estimate,
> but the cost estimate itself. I've tried on my own data:
> 
> Seq Scan on plmshp02_1
> (cost=0.00..19.14 rows=26 width=81)
> (actual time=0.54..61.68 rows=75 loops=1)
> Total runtime: 61.83 msec
>  
> Index Scan using plmshp02_1_gist on plmshp02_1
> (cost=0.00..81.53 rows=26 width=81)
> (actual time=0.39..19.22 rows=75 loop=1)
> Total runtime: 19.37 msecs
> 
> Compare the 'cost' estimate and actual runtime.
> cost=0.00..19.14 => runtime 61.83 ms
> cost=0.00..81.53 => runtime 19.37 ms
> 
> It runs faster when cost estimate is higher !!
> Alexander's report also show this.
> 
> --strk;
> 
> On Tue, Sep 14, 2004 at 11:54:42AM +0100, Mark Cave-Ayland wrote:
> > (Alexandre, I've CC'd this to you as I'm not sure if you're on the
> > postgis-devel mailing list)
> > 
> > > -----Original Message-----
> > > From: strk at refractions.net [mailto:strk at refractions.net] 
> > > Sent: 14 September 2004 08:22
> > > To: Mark Cave-Ayland; postgis-devel at postgis.refractions.net
> > > Subject: Re: [postgis-devel] Re: [postgis-users] Possible 
> > > index stats problem?
> > > 
> > 
> > (lots cut)
> > 
> > > 
> > > I get it back.
> > > There is no change in cost estimation changing stawidth....
> > > The cost estimation is performed by gistcostestimate() 
> > > function, which is defined by standard postgresql 
> > > distribution and in turn calls the default cost estimation 
> > > routine. Now I need to know how to use a custom one (or patch 
> > > the gist one). --strk;
> > 
> > 
> > Hi strk,
> > 
> > I don't think that you need to be looking at changing gistcostestimate()
> > since this is  concerned with producing functions that work at the
> > higher index AM level; so any changes made here would affect all GiST
> > index accesses and not just those concerned with geometries.
> > 
> > In fact, scanning through postgis_estimate.c and postgis.sql.in, I've
> > noticed that in fact postgisgistcostestimate() and
> > genericcostestimate2(), which are replacement index AM cost functions
> > and so can't be used in PostGIS anyway, aren't even referenced! I can
> > only assume that Dave used them as a debugging aid when writing his
> > first cut of the index selectivity code.
> > 
> > It might be worth looking at the default settings in postgresql.conf to
> > see the effect of changing some of the cost parameters; for example, I
> > find that the default value of random_page_cost = 4 grossly
> > overestimates the cost of grabbing an index from the tuple on modern
> > hardware - I tend to reduce this to either 1 or 2 which has the effect
> > of reducing the overall cost of the index scan to the planner.
> > 
> > I'm also reasonably sure that this is not caused by using a sample of
> > geometries rather than the entire set; if this were the case, I would
> > expect to see wildly differing query plans for the same query with
> > repeated ANALYZE statements; in any case, Alexandre reports that there
> > are only 5260 rows in the table, and from testing here, I've still seen
> > good results with ~300k geometries using the default statistics target
> > of 10.
> > 
> > >    ->  Index Scan using municipios_gidx on municipios  
> > > (cost=0.00..475.56 rows=52 width=0) (actual time=75.169..6712.530 
> > > rows=413 loops=1)  Total runtime: 6718.763 ms
> > 
> > Now this part of the estimate seems really wrong; 52 estimated versus
> > 413 actual. The only thing I can think of is that some characteristic of
> > the data is affecting the estimates, for example perhaps it contains
> > polygons which have a large area of overlap in a small area, or maybe
> > the data consists of large irregular polygons/linestrings?
> > 
> > To solve this, I think we need to ask Alexandre for a copy of his data
> > off-list so we can look at exactly how the selectivity code is using
> > this data in much greater detail.
> > 
> > 
> > Kind regards,
> > 
> > Mark.
> > 
> > ---
> > 
> > Mark Cave-Ayland
> > Webbased Ltd.
> > Tamar Science Park
> > Derriford
> > Plymouth
> > PL6 8BX
> > England
> > 
> > Tel: +44 (0)1752 764445
> > Fax: +44 (0)1752 764446
> > 
> > 
> > This email and any attachments are confidential to the intended
> > recipient and may also be privileged. If you are not the intended
> > recipient please delete it from your system and notify the sender. You
> > should not copy it or use it for any purpose nor disclose or distribute
> > its contents to any other person.
> >