[postgis-users] Execution plan with spatial index

Fri Jul 16 22:33:33 PDT 2010

Chris,

Did you confirm that your btree index works fine without the spatial query
part?

Run the query without the spatial part at all.  If you don't have enough
data distinct categories, its often faster for PostgreSQL not to use the
btree index or at least it thinks so.  Its also sometimes some problem with
differences in data types used that it can't use an index.  I don't think
that is your issue here though.

One way to try to force it to use all indexes is to do a 

set enable_seqscan = off;

Before running the query and if that works, increasing the seq_page_cost

Hope that helps
Leo and Regina,
h <http://www.postgis.us> ttp://www.postgis.us

  _____  

From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Christian Beaumont
Sent: Friday, July 16, 2010 12:31 PM
To: postgis-users at postgis.refractions.net
Subject: [postgis-users] Execution plan with spatial index

Greetings, 

I have a table with around a million rows with each row containing a
geography POINT (I'm covering the entire world).  With these rows I need to
perform many nearest neighbor searches to locate other entities within a
range of 0 to 100km.  The location field has a GiST index

With 5,000 records in the table my average query took around 0.5ms; however,
now I have a million records the query time has gone up to around 4ms.  I
did a reindex/vacuum etc.  My goal is to try and make this query as fast as
possible since it underpins everything I'm doing and the target hardware
won't be as powerful as my development box.

As well as the location column, I also have a "entity_type" column that is a
simple integer - 1, 2, ... (n)... 

I figured I could improve performance by adding an index on the entity_type
column, then filter the rowset prior to nearest neighbor search.  My logic
was that it must be quicker to isolate a 10% subset of the records using a
simple integer index before feeding in to the expensive GiST index.

Unfortunately when I did this, PostgresSQL didn't use my entity_type_idx at
all.  Instead, it did the nearest neighbor search using the GiST index, then
did a simple filter on the collected records based on the entity_type.  I
tried a few tricks to make it use the index but no-luck.

Any ideas for speeding this up would be very much appreciated!  Right now my
best idea would be to have separate tables for each entity type, but that
wouldn't be fun as I don't know the entity types in advance.

cheers,
-chris

Here is pseudo-code of the query and execution plan/analysis.  CB_GetPlace()
is one of my helper functions that returns a geography from an entity id
(marked stable).

SELECT
entity_id, category_id,
ST_Distance(location, CB_GetPlace(someEntityID)) as arcLength
FROM entities WHERE
category_id = 1 AND
ST_DWithin(location, CB_GetPlace(someEntityID), someRadius) ORDER BY
arcLength;

"Sort  (cost=26.64..26.64 rows=1 width=140) (actual time=4.207..4.209
rows=16 loops=1)"
"  Sort Key: (_st_distance(location, cb_getplace(someEntityID::bigint),
0::double precision, true))"
"  Sort Method:  quicksort  Memory: 18kB"
"  ->  Index Scan using place_idx on "entities"  (cost=0.03..26.63 rows=1
width=140) (actual time=1.691..4.187 rows=16 loops=1)"
"        Index Cond: (location &&
_st_expand(cb_getplace(someEntityID::bigint), someRadius::double
precision))"
"        Filter: ((urt_id = 1) AND (cb_getplace(someEntityID::bigint) &&
_st_expand(location, someRadius::double precision)) AND
_st_dwithin(location, cb_getplace(someEntityID::bigint), someRadius::double
precision, true))"
"Total runtime: 4.242 ms"

If it matters, my test platform is PostGIS 1.5.1 with PostgreSQL 8.4.4-1
(Windows 32bit build) though my target platform is Ubuntu x64.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20100717/347ee4fc/attachment.html>