[postgis-users] Point as Index

Ioannis Anagnostopoulos ioannis at anatec.com
Wed Mar 30 06:40:57 PDT 2011


The precise numbers are as follows:

Total Rows in the table: 45922415

if I use:
select
     pos_lat,
     pos_lon
from
     feed_all.common_pos_messages
where
st_contains(ST_GeomFromText('POLYGON((0.00433541700872238 
51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687 
51.2018083846302,0.00433541700872238 
51.2018083846302,0.00433541700872238 51.6875086022247))', 4326), pos_point)

I get back: 4493678 in 4.77 mins

Doing explain analyze I get:
"Bitmap Heap Scan on common_pos_messages  (cost=82262.99..522647.01 
rows=771600 width=8) (actual time=127735.424..198141.843 rows=4493678 
loops=1)"
"  Recheck Cond: 
('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry 
&& pos_point)"
"  Filter: 
_st_contains('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry, 
pos_point)"
"  ->  Bitmap Index Scan on idx_pos  (cost=0.00..82070.09 rows=2314801 
width=0) (actual time=127732.000..127732.000 rows=4493679 loops=1)"
"        Index Cond: 
('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry 
&& pos_point)"
"Total runtime: 199206.428 ms"

Obviously the times and the rows do not seem to much with the actual 
results. However after creating my own index based on lat/lon values 
(integers) on the same table, executing this:
select
     pos_lat,
     pos_lon
from
     feed_all.common_pos_messages
where
     pos_lat between 30721085 and 31012505 and
     pos_lon between 2601 and 867037

I get back 4493680 in 2.8 mins

Doing explain analyze I get:
"Bitmap Heap Scan on common_pos_messages  (cost=161748.26..601144.64 
rows=1686719 width=8) (actual time=10064.427..60738.808 rows=4493680 
loops=1)"
"  Recheck Cond: ((pos_lat >= 30721085) AND (pos_lat <= 31012505) AND 
(pos_lon >= 2601) AND (pos_lon <= 867037))"
"  ->  Bitmap Index Scan on idx_lat_lon  (cost=0.00..161326.58 
rows=1686719 width=0) (actual time=10061.108..10061.108 rows=4493680 
loops=1)"
"        Index Cond: ((pos_lat >= 30721085) AND (pos_lat <= 31012505) 
AND (pos_lon >= 2601) AND (pos_lon <= 867037))"
"Total runtime: 61850.720 ms"

The predictions are still "out" from the actual but the btree index 
seems to behave better.

Any suggestions? Probably I need to bring up to date my statistics.

Kind Regards
Yiannis


On 30/03/2011 13:30, Sandro Santilli wrote:
> On Wed, Mar 30, 2011 at 10:58:57AM +0100, Ioannis Anagnostopoulos wrote:
>
>> I am involved in a heavy database design initiative where the only kind
>> of geometries I am dealing with are points. I have recently hit a
>> 50million rows long table with those points and my default gist index on
>> the points does not seem to be working very fast (if not at all to be
>> honest). I have started now thinking that probably for "points" an index
>> may not be the best option since in a 50million rows long table most of
>> the points are unique so the index may just duplicate the actual table,
>> of course I may be wrong and I may just missing a very important part of
>> the concept.
> How many points from the 50M set does your tipical query hits ?
> Does PostgreSQL selectivity estimator make a good guess about that ?
> Use EXPLAIN ANALYZE<your query here>  to see.
>
> --strk;
>
>    ()   Free GIS&  Flash consultant/developer
>    /\   http://strk.keybit.net/services.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20110330/644865ac/attachment.html>


More information about the postgis-users mailing list