[postgis-users] Point as Index

Wed Mar 30 07:15:03 PDT 2011

Ok, I continue my spamming :-)

Your compare isn't fair. "Your" index is just doing a bounding box
comparasion not a recheck to see what points is actually inside your
geometry. 

It looks like your polygon is a box, but PostGIS don't know that so it
will do a recheck on all rows fetched by the index and do a "real"
calculation.

To only do the bounding box test you can use && as operator.

And using count(*) instead of retrieving all the rows is good to avoid
the I/O bottleneck

Do a vacuum analyze on the table and try:

select 
count(*)
 from
     feed_all.common_pos_messages
 where
ST_GeomFromText('POLYGON((0.00433541700872238
 51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687
 51.2018083846302,0.00433541700872238
 51.2018083846302,0.00433541700872238 51.6875086022247))', 4326
&&
 pos_point;

Little more thought through I hope :-)

/Nicklas

On Wed, 2011-03-30 at 16:01 +0200, Nicklas Avén wrote:
> Sorry, I was too fast on the keys.
> 
> before I saw the comparasion with your own index
> 
> /Nicklas
> 
> On Wed, 2011-03-30 at 14:40 +0100, Ioannis Anagnostopoulos wrote:
> > The precise numbers are as follows:
> > 
> > Total Rows in the table: 45922415 
> > 
> > if I use:
> > select 
> >     pos_lat,
> >     pos_lon
> > from
> >     feed_all.common_pos_messages
> > where
> > st_contains(ST_GeomFromText('POLYGON((0.00433541700872238
> > 51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687
> > 51.2018083846302,0.00433541700872238
> > 51.2018083846302,0.00433541700872238 51.6875086022247))', 4326),
> > pos_point)
> > 
> > I get back: 4493678 in 4.77 mins
> > 
> > Doing explain analyze I get:
> > "Bitmap Heap Scan on common_pos_messages  (cost=82262.99..522647.01
> > rows=771600 width=8) (actual time=127735.424..198141.843 rows=4493678
> > loops=1)"
> > "  Recheck Cond:
> > ('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry && pos_point)"
> > "  Filter:
> > _st_contains('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry, pos_point)"
> > "  ->  Bitmap Index Scan on idx_pos  (cost=0.00..82070.09 rows=2314801
> > width=0) (actual time=127732.000..127732.000 rows=4493679 loops=1)"
> > "        Index Cond:
> > ('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry && pos_point)"
> > "Total runtime: 199206.428 ms"
> > 
> > Obviously the times and the rows do not seem to much with the actual
> > results. However after creating my own index based on lat/lon values
> > (integers) on the same table, executing this:
> > select 
> >     pos_lat,
> >     pos_lon
> > from
> >     feed_all.common_pos_messages
> > where 
> >     pos_lat between 30721085 and 31012505 and
> >     pos_lon between 2601 and 867037
> > 
> > I get back 4493680 in 2.8 mins
> > 
> > Doing explain analyze I get:
> > "Bitmap Heap Scan on common_pos_messages  (cost=161748.26..601144.64
> > rows=1686719 width=8) (actual time=10064.427..60738.808 rows=4493680
> > loops=1)"
> > "  Recheck Cond: ((pos_lat >= 30721085) AND (pos_lat <= 31012505) AND
> > (pos_lon >= 2601) AND (pos_lon <= 867037))"
> > "  ->  Bitmap Index Scan on idx_lat_lon  (cost=0.00..161326.58
> > rows=1686719 width=0) (actual time=10061.108..10061.108 rows=4493680
> > loops=1)"
> > "        Index Cond: ((pos_lat >= 30721085) AND (pos_lat <= 31012505)
> > AND (pos_lon >= 2601) AND (pos_lon <= 867037))"
> > "Total runtime: 61850.720 ms"
> > 
> > The predictions are still "out" from the actual but the btree index
> > seems to behave better.
> > 
> > Any suggestions? Probably I need to bring up to date my statistics. 
> > 
> > Kind Regards
> > Yiannis
> > 
> > 
> > On 30/03/2011 13:30, Sandro Santilli wrote: 
> > > On Wed, Mar 30, 2011 at 10:58:57AM +0100, Ioannis Anagnostopoulos wrote:
> > > 
> > > > I am involved in a heavy database design initiative where the only kind 
> > > > of geometries I am dealing with are points. I have recently hit a 
> > > > 50million rows long table with those points and my default gist index on 
> > > > the points does not seem to be working very fast (if not at all to be 
> > > > honest). I have started now thinking that probably for "points" an index 
> > > > may not be the best option since in a 50million rows long table most of 
> > > > the points are unique so the index may just duplicate the actual table, 
> > > > of course I may be wrong and I may just missing a very important part of 
> > > > the concept. 
> > > How many points from the 50M set does your tipical query hits ?
> > > Does PostgreSQL selectivity estimator make a good guess about that ?
> > > Use EXPLAIN ANALYZE <your query here> to see.
> > > 
> > > --strk; 
> > > 
> > >   ()   Free GIS & Flash consultant/developer
> > >   /\   http://strk.keybit.net/services.html
> > 
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>