<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    Hello, I think I have some results that I deem inconclusive although

    I am more and more convinced that a spatial index on points is not

    that great. I have attached an open office spreadsheet with the

    details of the different queries. I also try to summarise in the

    following table:<br>

    <br>

    <table border="1" cellpadding="2" cellspacing="2" height="62"

      width="1288">

      <tbody>

        <tr>

          <td valign="top" width="40%">Query<br>

          </td>

          <td valign="top">Execution time<br>

          </td>

          <td valign="top">Predicted Rows<br>

          </td>

          <td valign="top">Rows Returned<br>

          </td>

        </tr>

        <tr>

          <td valign="top"><small><tt>select <br>

                    obj_id,<br>

                    msg_date_rec,<br>

                    pos_point<br>

                from<br>

                    feed_all.common_pos_messages inner join<br>

                    feed_all.messages on (common_pos_messages.msg_id =

                messages.msg_id)<br>

                where <br>

                    pos_lat between 30721085 and 31012505 and<br>

                    pos_lon between 2601 and 867037 and<br>

                    msg_date_rec BETWEEN '2011-03-21' AND '2011-03-22'<br>

                ORDER BY<br>

                    obj_id, msg_date_rec</tt></small><br>

          </td>

          <td valign="top">133146.196 ms<br>

          </td>

          <td valign="top">1751339<br>

          </td>

          <td valign="top">663769<br>

          </td>

        </tr>

        <tr>

          <td valign="top"><tt><small>select <br>

                    obj_id,<br>

                    msg_date_rec,<br>

                    pos_point<br>

                from<br>

                    feed_all.common_pos_messages inner join<br>

                    feed_all.messages on (common_pos_messages.msg_id =

                messages.msg_id)<br>

                where<br>

                st_contains(ST_GeomFromText('POLYGON((0.00433541700872238

                51.6875086022247,1.4450615615687

                51.6875086022247,1.4450615615687

                51.2018083846302,0.00433541700872238

                51.2018083846302,0.00433541700872238

                51.6875086022247))', 4326), pos_point) and<br>

                    msg_date_rec BETWEEN '2011-03-21' AND '2011-03-22'<br>

                ORDER BY<br>

                    obj_id, msg_date_rec</small></tt><br>

          </td>

          <td valign="top"> 271619.013 ms<br>

          </td>

          <td valign="top">911564<br>

          </td>

          <td valign="top">663769<br>

          </td>

        </tr>

        <tr>

          <td valign="top"><tt><small>select <br>

                    obj_id,<br>

                    msg_date_rec,<br>

                    pos_point<br>

                from<br>

                    feed_all.common_pos_messages inner join<br>

                    feed_all.messages on (common_pos_messages.msg_id =

                messages.msg_id)<br>

                where<br>

                    ST_GeomFromText('POLYGON((0.00433541700872238

                51.6875086022247,1.4450615615687

                51.6875086022247,1.4450615615687

                51.2018083846302,0.00433541700872238

                51.2018083846302,0.00433541700872238

                51.6875086022247))', 4326) && pos_point AND<br>

                    msg_date_rec BETWEEN '2011-03-21' AND '2011-03-22'<br>

                ORDER BY<br>

                    obj_id, msg_date_rec</small></tt><br>

          </td>

          <td valign="top">249301.858 ms<br>

          </td>

          <td valign="top">2734692<br>

          </td>

          <td valign="top">663769<br>

          </td>

        </tr>

      </tbody>

    </table>

    <br>

    All in all, the second query takes longer but has the closest

    prediction when explain - analyze is performed while the quickest

    one is the first with the custom btree index. What really concerns

    me is that regardless of the Vacuum Analyze I perform times and

    predictions remain unchanged. Thus I will try to persevere  with the

    first query for this development cycle aiming to implement in the

    future a gist index not on points but on clusters of points.<br>

    <br>

    Nicklas, I had a reading for your suggestion but if I got it right,

    st_contains and st_within both run a && operator on the

    bounding box before, so I guess that the improved execution time of

    the 3rd query may be attributed to the missing cycle of running the

    st_contains. <br>

    <br>

    As always any suggestion is more than welcomed.<br>

    Thanks and kind regards<br>

    Yiannis <br>

    <br>

    On 30/03/2011 15:15, Nicklas Avén wrote:

    <blockquote cite="mid:1301494503.2175.251.camel@ubuntunav"

      type="cite">

      <pre wrap="">Ok, I continue my spamming :-)

Your compare isn't fair. "Your" index is just doing a bounding box

comparasion not a recheck to see what points is actually inside your

geometry. 

It looks like your polygon is a box, but PostGIS don't know that so it

will do a recheck on all rows fetched by the index and do a "real"

calculation.

To only do the bounding box test you can use && as operator.

And using count(*) instead of retrieving all the rows is good to avoid

the I/O bottleneck

Do a vacuum analyze on the table and try:

select 

count(*)

 from

     feed_all.common_pos_messages

 where

ST_GeomFromText('POLYGON((0.00433541700872238

 51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687

 51.2018083846302,0.00433541700872238

 51.2018083846302,0.00433541700872238 51.6875086022247))', 4326

&&

 pos_point;

Little more thought through I hope :-)

/Nicklas

On Wed, 2011-03-30 at 16:01 +0200, Nicklas Avén wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">Sorry, I was too fast on the keys.

before I saw the comparasion with your own index

/Nicklas

On Wed, 2011-03-30 at 14:40 +0100, Ioannis Anagnostopoulos wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">The precise numbers are as follows:

Total Rows in the table: 45922415 

if I use:

select 

    pos_lat,

    pos_lon

from

    feed_all.common_pos_messages

where

st_contains(ST_GeomFromText('POLYGON((0.00433541700872238

51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687

51.2018083846302,0.00433541700872238

51.2018083846302,0.00433541700872238 51.6875086022247))', 4326),

pos_point)

I get back: 4493678 in 4.77 mins

Doing explain analyze I get:

"Bitmap Heap Scan on common_pos_messages  (cost=82262.99..522647.01

rows=771600 width=8) (actual time=127735.424..198141.843 rows=4493678

loops=1)"

"  Recheck Cond:

('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry && pos_point)"

"  Filter:

_st_contains('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry, pos_point)"

"  ->  Bitmap Index Scan on idx_pos  (cost=0.00..82070.09 rows=2314801

width=0) (actual time=127732.000..127732.000 rows=4493679 loops=1)"

"        Index Cond:

('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry && pos_point)"

"Total runtime: 199206.428 ms"

Obviously the times and the rows do not seem to much with the actual

results. However after creating my own index based on lat/lon values

(integers) on the same table, executing this:

select 

    pos_lat,

    pos_lon

from

    feed_all.common_pos_messages

where 

    pos_lat between 30721085 and 31012505 and

    pos_lon between 2601 and 867037

I get back 4493680 in 2.8 mins

Doing explain analyze I get:

"Bitmap Heap Scan on common_pos_messages  (cost=161748.26..601144.64

rows=1686719 width=8) (actual time=10064.427..60738.808 rows=4493680

loops=1)"

"  Recheck Cond: ((pos_lat >= 30721085) AND (pos_lat <= 31012505) AND

(pos_lon >= 2601) AND (pos_lon <= 867037))"

"  ->  Bitmap Index Scan on idx_lat_lon  (cost=0.00..161326.58

rows=1686719 width=0) (actual time=10061.108..10061.108 rows=4493680

loops=1)"

"        Index Cond: ((pos_lat >= 30721085) AND (pos_lat <= 31012505)

AND (pos_lon >= 2601) AND (pos_lon <= 867037))"

"Total runtime: 61850.720 ms"

The predictions are still "out" from the actual but the btree index

seems to behave better.

Any suggestions? Probably I need to bring up to date my statistics. 

Kind Regards

Yiannis

On 30/03/2011 13:30, Sandro Santilli wrote: 

</pre>

          <blockquote type="cite">

            <pre wrap="">On Wed, Mar 30, 2011 at 10:58:57AM +0100, Ioannis Anagnostopoulos wrote:

</pre>

            <blockquote type="cite">

              <pre wrap="">I am involved in a heavy database design initiative where the only kind 

of geometries I am dealing with are points. I have recently hit a 

50million rows long table with those points and my default gist index on 

the points does not seem to be working very fast (if not at all to be 

honest). I have started now thinking that probably for "points" an index 

may not be the best option since in a 50million rows long table most of 

the points are unique so the index may just duplicate the actual table, 

of course I may be wrong and I may just missing a very important part of 

the concept. 

</pre>

            </blockquote>

            <pre wrap="">How many points from the 50M set does your tipical query hits ?

Does PostgreSQL selectivity estimator make a good guess about that ?

Use EXPLAIN ANALYZE <your query here> to see.

--strk; 

  ()   Free GIS & Flash consultant/developer

  /\   <a class="moz-txt-link-freetext" href="http://strk.keybit.net/services.html">http://strk.keybit.net/services.html</a>

</pre>

          </blockquote>

          <pre wrap="">

_______________________________________________

postgis-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:postgis-users@postgis.refractions.net">postgis-users@postgis.refractions.net</a>

<a class="moz-txt-link-freetext" href="http://postgis.refractions.net/mailman/listinfo/postgis-users">http://postgis.refractions.net/mailman/listinfo/postgis-users</a>

</pre>

        </blockquote>

        <pre wrap="">

_______________________________________________

postgis-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:postgis-users@postgis.refractions.net">postgis-users@postgis.refractions.net</a>

<a class="moz-txt-link-freetext" href="http://postgis.refractions.net/mailman/listinfo/postgis-users">http://postgis.refractions.net/mailman/listinfo/postgis-users</a>

</pre>

      </blockquote>

      <pre wrap="">

_______________________________________________

postgis-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:postgis-users@postgis.refractions.net">postgis-users@postgis.refractions.net</a>

<a class="moz-txt-link-freetext" href="http://postgis.refractions.net/mailman/listinfo/postgis-users">http://postgis.refractions.net/mailman/listinfo/postgis-users</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>