[postgis-users] Recursive CTE nearest neighbour function

Vincent Picavet - ML vincent.ml at oslandia.com
Sun Feb 8 07:12:27 PST 2015


Hello Mark,

Any reason for not using the standard KNN search mechanism ?
With the LATERAL keyword you can solve the reference issue of subqueries.

Like this :

-- find 2 closest bar from each bus stop
select
    bus.gid, bus.name, lat.gid, lat.name, lat.dist
from
    points as bus
    , lateral (
        select
            bar.gid
            , st_distance(bar.geom, bus.geom) as dist
            , bar.name
        from
            points as bar
        where
            bar.type = 'bar'
        order by
            bar.geom <-> bus.geom -- forbidden without lateral
        limit 2
    ) as lat
where
    bus.type = 'bus_stop'
order by
    bus.gid, lat.dist desc;


One good reason for not using this could be the <-> (and the <#> one) working 
only on bounding boxes and not real geometries, therefore sending only 
approximate results with lines and polygons. Rechecking by distance could be a 
good idea.

Vincent
Le dimanche 8 février 2015, 14:59:41 Mark Wynter a écrit :
> Hi List
> 
> I’ve been playing with different approaches to KNN problem, given the
> spatial density of features often varies dramatically across urban and
> rural settings.  I’ve written  a recursive CTE function, that uses
> ST_DWithin recursively,  expanding the distance threshold each time (but
> only if necessary).  The function can be found here at github:
> 
> https://raw.githubusercontent.com/dimensionaledge/cf_public/master/points/DE
> _KNN.sql
> 
> I’ve checked out the Boston GIS pgis_fn_nn example and write-up, but am
> increasingly erring toward the use of recursive CTE functions as my
> knowledge grows around how they can be applied to dynamic problems.
> 
> For the KNN recursive function, I’ve compared performance for 3 different
> use cases, each returning 10500 records.
> 
> Use Case:
> 1) 3500 origin points, 1.8 million Linestring features, K=3…. Query time =
> 20 seconds. 2) 3500 origin points, 3500 Point features, K=3…. Query time =
> 212 seconds or 3.5 minutes. 3) 3500 origin points, 3500 Polygon features
> (the same points as in (2) except buffered by 1 meter), K=3…. Query time =
> 7 seconds.
> 
> What surprised me is the results of (3) versus (2).  From 212 seconds down
> to 7 seconds.
> 
> There were geometry indexes on all use cases, but the only difference with
> (3)  is that I buffered the feature set in (2)  by 1 meter, in effect
> creating a Polygon representation of the points.
> 
> I sense this has something to do with the bounding box comparison used by
> ST_DWithin, as mentioned in the PostGIS manual.  I’ve also studied Paul’s
> tip here: http://postgis.net/2013/08/26/tip_ST_DWithin
> 
> I’m keen to get the insights of others, and learn ways of further improving
> my code patterns … ?  How have others dealt with KNN problems in PostGIS
> when working with BIG point feature sets?
> 
> Kind regards
> 
> Mark
> 
> 1) Query Form
> WITH results AS (SELECT ogc_fid, DE_knn(wkb_geometry,
> 'prep.osm_new_roads_poa11_cleaned_full', 'wkb_geometry', 'lid', 0.0001,
> 10,3) as knn FROM jtw.nsw_tz_centroids) SELECT ogc_fid, (knn).id as lid,
> (knn).distance, row_number() over (PARTITION BY ogc_fid ORDER BY 3 ASC)
> FROM results;
> 
> 2) Query Form
> WITH results AS (SELECT ogc_fid, DE_knn(wkb_geometry,
> 'jtw.nsw_tz_centroids', 'wkb_geometry', 'ogc_fid', 0.0001,20,8) as knn FROM
> jtw.nsw_tz_centroids) SELECT ogc_fid, (knn).id as lid, (knn).distance,
> row_number() over (PARTITION BY ogc_fid ORDER BY 3 ASC) FROM results;
> 
> 3) Query Form
> WITH results AS (SELECT ogc_fid, DE_knn(wkb_geometry,
> 'jtw.nsw_tz_centroids_buffers', 'wkb_geometry', 'ogc_fid', 0.0001,20,3) as
> knn FROM jtw.nsw_tz_centroids) SELECT ogc_fid, (knn).id as lid,
> (knn).distance, row_number() over (PARTITION BY ogc_fid ORDER BY 3 ASC)
> FROM results;

-- 
Vincent Picavet - Gérant Oslandia
www.oslandia.com


More information about the postgis-users mailing list