[postgis-users] Possible index stats problem?

strk at refractions.net strk at refractions.net
Mon Sep 13 23:25:34 PDT 2004


On Mon, Sep 13, 2004 at 06:28:24PM -0300, Alexandre Florio wrote:
> Em Mon, 13 Sep 2004 20:54:02 +0200
> strk at refractions.net escreveu:
> 
> > > 4. create lang plpgsql and insert postgis.sql and spatial_ref_sys.sql
> > Didn't you get lots of errors in this step ?
> No.  Why should I get them?

Many objects defined in postgis.sql should already be in your dump.
Functions are create-or-replaced, but other objects aren't/can't.
For example the cost estimator needs a modification of the Geometry
type definition, and that type is (old) in the dump and (new) in
the postgis.sql file.
I can't belive you didn't get any error (or at least notice) message.

> > Estimate is 103. Actual is 413. How many rows in the table ?
> 5260 rows.  Each one's geometry attribute has about 514 points.

So (413-103)/5260 = ~6% error.
Not so bad.

> > Try also to increment statistics target on the geometry column
> > of your table to see if you can get a better estimate.
> This is what I've done:
> ALTER TABLE municipios ALTER the_geom SET STATISTICS 1000;
> > Note that while with PG<8.0 full dataset was inspected with new
> > VACUUM ANALYZE integration statistics are gathered inspecting only
> > a subset of it. Incrementing target statistics (check postgres manual)
> > does increment both the returned subset and the frequency of cells.
> > (search_box overlaps 49.000000 cells ...).
> Does it explains why analyzing in 8.0 is much faster than in <8.0?

Yes. It does.

> > Let me know.
> Unfortunately, no changes:

[...]
>    ->  Seq Scan on municipios  (cost=0.00..321.90 rows=52 width=0) (actual time=161.512..48120.603 rows=413 loops=1)
>  Total runtime: 48125.845 ms

[...]
>    ->  Index Scan using municipios_gidx on municipios  (cost=0.00..475.56 rows=52 width=0) (actual time=75.169..6712.530 rows=413 loops=1)
>  Total runtime: 6718.763 ms

You got better results with lower statistics target.
Anyway, in both cases the estimated number or rows returned was
lower then the actual, so that is not the reason why the planner
choosed Sequencial scan.
The wrong estimation is due to a wrong estimate on the cost of
using the index as contrary to execute a sequencial scan.

As you can see the Sequencial scan has a cost 0.00..321.90
while the Index scan has an higher cost of 0.00..475.56.
This does not reflect reality as Sequencial scan took 48125+ms
while Index scan took 6718+ms, so cost estimation was not correct.

I'll make some research on the topic and let you know.

--strk;



More information about the postgis-users mailing list