[postgis-users] Trying to figure out how statistics for the planner work

Thu Mar 9 22:49:18 PST 2006

I'm having an interesting issue with a table on my system. It has about 1.6
million rows with a geometry field of lat and lon points spread over most of
the northern hemisphere.

I have one query where the Postgres planner is coming up with a totally
wrong estimate(which causes some other problems)

explain  analyze select * from  meta_data meta where
SetSRID(MakeBox2D(MakePoint(-91.560758196875,-91.560758196875),MakePoint(1.5
6075819687499,1.56075819687501)),4326) && meta.wkb_geometry;

QUERY PLAN         
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
 Seq Scan on meta_data meta  (cost=0.00..106689.40 rows=1 width=165) (actual
time=0.230..2214.168 rows=18186 loops=1)
   Filter: 
('0103000020E6100000010000000500000000000080E3E356C000000080E3E356C000000080
E3E356C0000000A0DDF8F83F000000A0DDF8F83F000000A0DDF8F83F000000A0DDF8F83F0000
0080E3E356C000000080E3E356C000000080E3E356C0'::geometry && wkb_geometry)
 Total runtime: 2238.866 ms
(3 rows)

Note that it guesses 1 row and is really getting 18168 back.

Obviously most of the data is outside this box(about 1.5 million rows worth)
but the return of only a single row is causing the planner to do other
queries wrong and do things like a sequential scan of another table 18,000
times in a join.

I've tried setting the statistics on the column to 10, 100 and 1000 and
re-analyzing each time and I always end up with a plan with it expecting
only one row.

Any suggestions, ideas, other ways to make this better?

Thanks,
-Steve Wormley