[postgis-users] Recursive CTE nearest neighbour function
Vincent Picavet - ML
vincent.ml at oslandia.com
Sun Feb 8 07:12:27 PST 2015
Hello Mark,
Any reason for not using the standard KNN search mechanism ?
With the LATERAL keyword you can solve the reference issue of subqueries.
Like this :
-- find 2 closest bar from each bus stop
select
bus.gid, bus.name, lat.gid, lat.name, lat.dist
from
points as bus
, lateral (
select
bar.gid
, st_distance(bar.geom, bus.geom) as dist
, bar.name
from
points as bar
where
bar.type = 'bar'
order by
bar.geom <-> bus.geom -- forbidden without lateral
limit 2
) as lat
where
bus.type = 'bus_stop'
order by
bus.gid, lat.dist desc;
One good reason for not using this could be the <-> (and the <#> one) working
only on bounding boxes and not real geometries, therefore sending only
approximate results with lines and polygons. Rechecking by distance could be a
good idea.
Vincent
Le dimanche 8 février 2015, 14:59:41 Mark Wynter a écrit :
> Hi List
>
> I’ve been playing with different approaches to KNN problem, given the
> spatial density of features often varies dramatically across urban and
> rural settings. I’ve written a recursive CTE function, that uses
> ST_DWithin recursively, expanding the distance threshold each time (but
> only if necessary). The function can be found here at github:
>
> https://raw.githubusercontent.com/dimensionaledge/cf_public/master/points/DE
> _KNN.sql
>
> I’ve checked out the Boston GIS pgis_fn_nn example and write-up, but am
> increasingly erring toward the use of recursive CTE functions as my
> knowledge grows around how they can be applied to dynamic problems.
>
> For the KNN recursive function, I’ve compared performance for 3 different
> use cases, each returning 10500 records.
>
> Use Case:
> 1) 3500 origin points, 1.8 million Linestring features, K=3…. Query time =
> 20 seconds. 2) 3500 origin points, 3500 Point features, K=3…. Query time =
> 212 seconds or 3.5 minutes. 3) 3500 origin points, 3500 Polygon features
> (the same points as in (2) except buffered by 1 meter), K=3…. Query time =
> 7 seconds.
>
> What surprised me is the results of (3) versus (2). From 212 seconds down
> to 7 seconds.
>
> There were geometry indexes on all use cases, but the only difference with
> (3) is that I buffered the feature set in (2) by 1 meter, in effect
> creating a Polygon representation of the points.
>
> I sense this has something to do with the bounding box comparison used by
> ST_DWithin, as mentioned in the PostGIS manual. I’ve also studied Paul’s
> tip here: http://postgis.net/2013/08/26/tip_ST_DWithin
>
> I’m keen to get the insights of others, and learn ways of further improving
> my code patterns … ? How have others dealt with KNN problems in PostGIS
> when working with BIG point feature sets?
>
> Kind regards
>
> Mark
>
> 1) Query Form
> WITH results AS (SELECT ogc_fid, DE_knn(wkb_geometry,
> 'prep.osm_new_roads_poa11_cleaned_full', 'wkb_geometry', 'lid', 0.0001,
> 10,3) as knn FROM jtw.nsw_tz_centroids) SELECT ogc_fid, (knn).id as lid,
> (knn).distance, row_number() over (PARTITION BY ogc_fid ORDER BY 3 ASC)
> FROM results;
>
> 2) Query Form
> WITH results AS (SELECT ogc_fid, DE_knn(wkb_geometry,
> 'jtw.nsw_tz_centroids', 'wkb_geometry', 'ogc_fid', 0.0001,20,8) as knn FROM
> jtw.nsw_tz_centroids) SELECT ogc_fid, (knn).id as lid, (knn).distance,
> row_number() over (PARTITION BY ogc_fid ORDER BY 3 ASC) FROM results;
>
> 3) Query Form
> WITH results AS (SELECT ogc_fid, DE_knn(wkb_geometry,
> 'jtw.nsw_tz_centroids_buffers', 'wkb_geometry', 'ogc_fid', 0.0001,20,3) as
> knn FROM jtw.nsw_tz_centroids) SELECT ogc_fid, (knn).id as lid,
> (knn).distance, row_number() over (PARTITION BY ogc_fid ORDER BY 3 ASC)
> FROM results;
--
Vincent Picavet - Gérant Oslandia
www.oslandia.com
More information about the postgis-users
mailing list