Greetings,<div><br></div><div>I have a table with around a million rows with each row containing a <b>geography</b> POINT (I'm covering the entire world). With these rows I need to perform many nearest neighbor searches to locate other entities within a range of 0 to 100km. The location field has a GiST index</div>
<div><br></div><div>With 5,000 records in the table my average query took around 0.5ms; however, now I have a million records the query time has gone up to around 4ms. I did a reindex/vacuum etc. My goal is to try and make this query as fast as possible since it underpins everything I'm doing and the target hardware won't be as powerful as my development box.</div>
<div><br></div><div>As well as the location column, I also have a "entity_type" column that is a simple integer - 1, 2, ... (n)... </div><div><br></div><div>I figured I could improve performance by adding an index on the entity_type column, then filter the rowset prior to nearest neighbor search. My logic was that it must be quicker to isolate a 10% subset of the records using a simple integer index before feeding in to the expensive GiST index.</div>
<div><br></div><div>Unfortunately when I did this, PostgresSQL didn't use my entity_type_idx at all. Instead, it did the nearest neighbor search using the GiST index, then did a simple filter on the collected records based on the entity_type. I tried a few tricks to make it use the index but no-luck.</div>
<div><br></div><div>Any ideas for speeding this up would be very much appreciated! Right now my best idea would be to have separate tables for each entity type, but that wouldn't be fun as I don't know the entity types in advance.</div>
<div><br></div><div>cheers,</div><div>-chris</div><div><br></div><div>Here is pseudo-code of the query and execution plan/analysis. CB_GetPlace() is one of my helper functions that returns a geography from an entity id (marked <b>stable</b>).</div>
<div><br></div><div><div><b>SELECT</b></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>entity_id, category_id,</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>ST_Distance(location, CB_GetPlace(someEntityID)) as arcLength</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span><b>FROM</b> entities <b>WHERE</b></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>category_id = 1 <b>AND</b></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>ST_DWithin(location, CB_GetPlace(someEntityID), someRadius) <b>ORDER BY</b> arcLength;</div>
</div><div><br></div><div><div>"Sort (cost=26.64..26.64 rows=1 width=140) (actual time=4.207..4.209 rows=16 loops=1)"</div><div>" Sort Key: (_st_distance(location, cb_getplace(someEntityID::bigint), 0::double precision, true))"</div>
<div>" Sort Method: quicksort Memory: 18kB"</div><div>" -> Index Scan using place_idx on "entities" (cost=0.03..26.63 rows=1 width=140) (actual time=1.691..4.187 rows=16 loops=1)"</div>
<div>" Index Cond: (location && _st_expand(cb_getplace(someEntityID::bigint), someRadius::double precision))"</div><div>" Filter: ((urt_id = 1) AND (cb_getplace(someEntityID::bigint) && _st_expand(location, someRadius::double precision)) AND _st_dwithin(location, cb_getplace(someEntityID::bigint), someRadius::double precision, true))"</div>
<div>"Total runtime: 4.242 ms"</div></div><div><br></div><div>If it matters, my test platform is PostGIS 1.5.1 with PostgreSQL 8.4.4-1 (Windows 32bit build) though my target platform is Ubuntu x64.</div><div><br>
</div>