[postgis-users] Recursive CTE nearest neighbour function

Mark Wynter mark at dimensionaledge.com
Sat Feb 7 20:29:41 PST 2015


Hi List

I’ve been playing with different approaches to KNN problem, given the spatial density of features often varies dramatically across urban and rural settings.  I’ve written  a recursive CTE function, that uses ST_DWithin recursively,  expanding the distance threshold each time (but only if necessary).  The function can be found here at github:

https://raw.githubusercontent.com/dimensionaledge/cf_public/master/points/DE_KNN.sql

I’ve checked out the Boston GIS pgis_fn_nn example and write-up, but am increasingly erring toward the use of recursive CTE functions as my knowledge grows around how they can be applied to dynamic problems.

For the KNN recursive function, I’ve compared performance for 3 different use cases, each returning 10500 records.

Use Case:
1) 3500 origin points, 1.8 million Linestring features, K=3…. Query time = 20 seconds.
2) 3500 origin points, 3500 Point features, K=3…. Query time = 212 seconds or 3.5 minutes.
3) 3500 origin points, 3500 Polygon features (the same points as in (2) except buffered by 1 meter), K=3…. Query time = 7 seconds.

What surprised me is the results of (3) versus (2).  From 212 seconds down to 7 seconds.

There were geometry indexes on all use cases, but the only difference with (3)  is that I buffered the feature set in (2)  by 1 meter, in effect creating a Polygon representation of the points.

I sense this has something to do with the bounding box comparison used by ST_DWithin, as mentioned in the PostGIS manual.  I’ve also studied Paul’s tip here:
http://postgis.net/2013/08/26/tip_ST_DWithin

I’m keen to get the insights of others, and learn ways of further improving my code patterns … ?  How have others dealt with KNN problems in PostGIS when working with BIG point feature sets?

Kind regards

Mark

1) Query Form
WITH results AS (SELECT ogc_fid, DE_knn(wkb_geometry, 'prep.osm_new_roads_poa11_cleaned_full', 'wkb_geometry', 'lid', 0.0001, 10,3) as knn FROM jtw.nsw_tz_centroids)
SELECT ogc_fid, (knn).id as lid, (knn).distance, row_number() over (PARTITION BY ogc_fid ORDER BY 3 ASC) FROM results;

2) Query Form
WITH results AS (SELECT ogc_fid, DE_knn(wkb_geometry, 'jtw.nsw_tz_centroids', 'wkb_geometry', 'ogc_fid', 0.0001,20,8) as knn FROM jtw.nsw_tz_centroids)
SELECT ogc_fid, (knn).id as lid, (knn).distance, row_number() over (PARTITION BY ogc_fid ORDER BY 3 ASC) FROM results;

3) Query Form
WITH results AS (SELECT ogc_fid, DE_knn(wkb_geometry, 'jtw.nsw_tz_centroids_buffers', 'wkb_geometry', 'ogc_fid', 0.0001,20,3) as knn FROM jtw.nsw_tz_centroids)
SELECT ogc_fid, (knn).id as lid, (knn).distance, row_number() over (PARTITION BY ogc_fid ORDER BY 3 ASC) FROM results;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20150208/2b43a79f/attachment.html>


More information about the postgis-users mailing list