[postgis-users] How to Lookup Nearest Locations from a Table

Sun Dec 19 14:20:41 PST 2010

On 12/16/2010 06:05 PM, Shlomi Fish wrote:

> Hi all.
>
> We have a table with a geography column. We'd like, for any given row, to
> return the N nearest *other* rows.  We saw ST_Distance, but is that the only
> approach? It feels like it will be slow and will not scale well.
>
> Is there anything more magical to use? We're pretty new to PostGIS.
>
Hi Shlomi:
You question piqued my curiosity so I did some testing.
First off, If you don't know anything about the data (i.e. density of 
features) and you have to rely just on ST_Distance, like any problem 
with complexity O(n2) it will be slow.

*If* you know a maximum distance within which you can always find N 
other features, then, as a previous poster mentioned, the ST_DWithin 
function speeds things up considerably. (It uses a quick bounding box 
check in advance of the actual intersection).

But, Suppose you want to take advantage of DWithin, without knowing the 
maximum distance? How about passing a starting distance, and checking if 
N features are found. If not then double the search distance, and loop 
along like this till you find the maximum distance needed. Once found, 
run the query using DWithin with this new max distance to find those 
closest features.

Attached is a set of functions I put together to test this idea. I made 
a scratch point layer with 500000 points, then did "explain analyze" on 
the three functions implementing each of the three above options. Here 
are the results:

1) First, just find 10 closed points to point # 500 using ST_Distance:
geodata=> explain analyze select * from distances_naive(10,500);
                                                        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
  Function Scan on distances_naive  (cost=0.00..260.00 rows=1000 
width=16) (actual time=766.631..766.642 rows=10 loops=1)
  Total runtime: 766.682 ms
(2 rows)

2) Now assuming I know a maximum distance (1000 meters in this case) 
within which I can always find 10 points:
geodata=> explain analyze select * from distances_within(10,500,1000);
                                                         QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
  Function Scan on distances_within  (cost=0.00..260.00 rows=1000 
width=16) (actual time=115.386..115.397 rows=10 loops=1)
  Total runtime: 115.437 ms
(2 rows)
Much faster!

3) Next the function which doubles a starting search distance until the 
N features are found. I tested with a starting distance of 100 m:
geodata=> explain analyze select * from distances_increment(10,500,100);
                                                          QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
  Function Scan on distances_increment  (cost=0.00..260.00 rows=1000 
width=16) (actual time=257.399..257.410 rows=10 loops=1)
  Total runtime: 257.451 ms
(2 rows)
Not as fast as the straight-forward DWIthin, but still 1/3 the time of 
the "naive" ST_Distance only function!

Best Regards,
Micha

> While brainstorming we've considered filtering according to nearby latitudes
> and/or longtitudes, because if they are too far, then, by necessity, the
> distance will also be too far, but we are not sure if this is the right
> approach.
>
> We'd be grateful for any suggestions on how to move forward.
>
> Regards,
>
> 	Shlomi Fish
>

-- 
Micha Silver
Arava Development Co. +972-52-3665918
http://surfaces.co.il

-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgis_functions.sql
Type: text/x-sql
Size: 2963 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20101220/f1ae2f4c/attachment.bin>