[postgis-users] Point as Index

Ioannis Anagnostopoulos ioannis at anatec.com
Wed Mar 30 12:20:59 PDT 2011


Hello, I think I have some results that I deem inconclusive although I 
am more and more convinced that a spatial index on points is not that 
great. I have attached an open office spreadsheet with the details of 
the different queries. I also try to summarise in the following table:

Query
	Execution time
	Predicted Rows
	Rows Returned
select
     obj_id,
     msg_date_rec,
     pos_point
from
     feed_all.common_pos_messages inner join
     feed_all.messages on (common_pos_messages.msg_id = messages.msg_id)
where
     pos_lat between 30721085 and 31012505 and
     pos_lon between 2601 and 867037 and
     msg_date_rec BETWEEN '2011-03-21' AND '2011-03-22'
ORDER BY
     obj_id, msg_date_rec
	133146.196 ms
	1751339
	663769
select
     obj_id,
     msg_date_rec,
     pos_point
from
     feed_all.common_pos_messages inner join
     feed_all.messages on (common_pos_messages.msg_id = messages.msg_id)
where
     st_contains(ST_GeomFromText('POLYGON((0.00433541700872238 
51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687 
51.2018083846302,0.00433541700872238 
51.2018083846302,0.00433541700872238 51.6875086022247))', 4326), 
pos_point) and
     msg_date_rec BETWEEN '2011-03-21' AND '2011-03-22'
ORDER BY
     obj_id, msg_date_rec
	 271619.013 ms
	911564
	663769
select
     obj_id,
     msg_date_rec,
     pos_point
from
     feed_all.common_pos_messages inner join
     feed_all.messages on (common_pos_messages.msg_id = messages.msg_id)
where
     ST_GeomFromText('POLYGON((0.00433541700872238 
51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687 
51.2018083846302,0.00433541700872238 
51.2018083846302,0.00433541700872238 51.6875086022247))', 4326) && 
pos_point AND
     msg_date_rec BETWEEN '2011-03-21' AND '2011-03-22'
ORDER BY
     obj_id, msg_date_rec
	249301.858 ms
	2734692
	663769


All in all, the second query takes longer but has the closest prediction 
when explain - analyze is performed while the quickest one is the first 
with the custom btree index. What really concerns me is that regardless 
of the Vacuum Analyze I perform times and predictions remain unchanged. 
Thus I will try to persevere  with the first query for this development 
cycle aiming to implement in the future a gist index not on points but 
on clusters of points.

Nicklas, I had a reading for your suggestion but if I got it right, 
st_contains and st_within both run a && operator on the bounding box 
before, so I guess that the improved execution time of the 3rd query may 
be attributed to the missing cycle of running the st_contains.

As always any suggestion is more than welcomed.
Thanks and kind regards
Yiannis

On 30/03/2011 15:15, Nicklas Avén wrote:
> Ok, I continue my spamming :-)
>
> Your compare isn't fair. "Your" index is just doing a bounding box
> comparasion not a recheck to see what points is actually inside your
> geometry.
>
> It looks like your polygon is a box, but PostGIS don't know that so it
> will do a recheck on all rows fetched by the index and do a "real"
> calculation.
>
> To only do the bounding box test you can use&&  as operator.
>
> And using count(*) instead of retrieving all the rows is good to avoid
> the I/O bottleneck
>
> Do a vacuum analyze on the table and try:
>
> select
> count(*)
>   from
>       feed_all.common_pos_messages
>   where
> ST_GeomFromText('POLYGON((0.00433541700872238
>   51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687
>   51.2018083846302,0.00433541700872238
>   51.2018083846302,0.00433541700872238 51.6875086022247))', 4326
> &&
>   pos_point;
>
>
> Little more thought through I hope :-)
>
> /Nicklas
>
>
>
>
>
>
> On Wed, 2011-03-30 at 16:01 +0200, Nicklas Avén wrote:
>> Sorry, I was too fast on the keys.
>>
>> before I saw the comparasion with your own index
>>
>> /Nicklas
>>
>> On Wed, 2011-03-30 at 14:40 +0100, Ioannis Anagnostopoulos wrote:
>>> The precise numbers are as follows:
>>>
>>> Total Rows in the table: 45922415
>>>
>>> if I use:
>>> select
>>>      pos_lat,
>>>      pos_lon
>>> from
>>>      feed_all.common_pos_messages
>>> where
>>> st_contains(ST_GeomFromText('POLYGON((0.00433541700872238
>>> 51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687
>>> 51.2018083846302,0.00433541700872238
>>> 51.2018083846302,0.00433541700872238 51.6875086022247))', 4326),
>>> pos_point)
>>>
>>> I get back: 4493678 in 4.77 mins
>>>
>>> Doing explain analyze I get:
>>> "Bitmap Heap Scan on common_pos_messages  (cost=82262.99..522647.01
>>> rows=771600 width=8) (actual time=127735.424..198141.843 rows=4493678
>>> loops=1)"
>>> "  Recheck Cond:
>>> ('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry&&  pos_point)"
>>> "  Filter:
>>> _st_contains('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry, pos_point)"
>>> "  ->   Bitmap Index Scan on idx_pos  (cost=0.00..82070.09 rows=2314801
>>> width=0) (actual time=127732.000..127732.000 rows=4493679 loops=1)"
>>> "        Index Cond:
>>> ('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry&&  pos_point)"
>>> "Total runtime: 199206.428 ms"
>>>
>>> Obviously the times and the rows do not seem to much with the actual
>>> results. However after creating my own index based on lat/lon values
>>> (integers) on the same table, executing this:
>>> select
>>>      pos_lat,
>>>      pos_lon
>>> from
>>>      feed_all.common_pos_messages
>>> where
>>>      pos_lat between 30721085 and 31012505 and
>>>      pos_lon between 2601 and 867037
>>>
>>> I get back 4493680 in 2.8 mins
>>>
>>> Doing explain analyze I get:
>>> "Bitmap Heap Scan on common_pos_messages  (cost=161748.26..601144.64
>>> rows=1686719 width=8) (actual time=10064.427..60738.808 rows=4493680
>>> loops=1)"
>>> "  Recheck Cond: ((pos_lat>= 30721085) AND (pos_lat<= 31012505) AND
>>> (pos_lon>= 2601) AND (pos_lon<= 867037))"
>>> "  ->   Bitmap Index Scan on idx_lat_lon  (cost=0.00..161326.58
>>> rows=1686719 width=0) (actual time=10061.108..10061.108 rows=4493680
>>> loops=1)"
>>> "        Index Cond: ((pos_lat>= 30721085) AND (pos_lat<= 31012505)
>>> AND (pos_lon>= 2601) AND (pos_lon<= 867037))"
>>> "Total runtime: 61850.720 ms"
>>>
>>> The predictions are still "out" from the actual but the btree index
>>> seems to behave better.
>>>
>>> Any suggestions? Probably I need to bring up to date my statistics.
>>>
>>> Kind Regards
>>> Yiannis
>>>
>>>
>>> On 30/03/2011 13:30, Sandro Santilli wrote:
>>>> On Wed, Mar 30, 2011 at 10:58:57AM +0100, Ioannis Anagnostopoulos wrote:
>>>>
>>>>> I am involved in a heavy database design initiative where the only kind
>>>>> of geometries I am dealing with are points. I have recently hit a
>>>>> 50million rows long table with those points and my default gist index on
>>>>> the points does not seem to be working very fast (if not at all to be
>>>>> honest). I have started now thinking that probably for "points" an index
>>>>> may not be the best option since in a 50million rows long table most of
>>>>> the points are unique so the index may just duplicate the actual table,
>>>>> of course I may be wrong and I may just missing a very important part of
>>>>> the concept.
>>>> How many points from the 50M set does your tipical query hits ?
>>>> Does PostgreSQL selectivity estimator make a good guess about that ?
>>>> Use EXPLAIN ANALYZE<your query here>  to see.
>>>>
>>>> --strk;
>>>>
>>>>    ()   Free GIS&  Flash consultant/developer
>>>>    /\   http://strk.keybit.net/services.html
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20110330/ab841f74/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pla_queries.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 15377 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20110330/ab841f74/attachment.ods>


More information about the postgis-users mailing list