[postgis-users] Adaptive k-nearest neigbourh in PostGIS

Vilem Ded Ded.V at seznam.cz
Wed Dec 20 03:17:32 PST 2017


I have big table of spatial points in 2D (or 3D) (PostGIS) and distance 
defined between them (let say just real spatial distance). When I create 
GIST index on these points, I can efficiently get k-nearest neighbors for 
each point using `<->` operator in ORDER BY statement with constant k (e.g. 
20) specified in `LIMIT` statement([Postgis-official][1]). 

    SELECT id, st_distance(geom,'SRID=32633;POINT(404419.4 5599450.3)'::
geometry ) as dist, geom 
    FROM positions 
    ORDER BY geom <-> 'SRID=32633;POINT(404419.4 5599450.3)'::geometry limit

Explain analyze for that: 

     Limit  (cost=0.28..2.64 rows=10 width=36) (actual time=0.243..0.313 
rows=10 loops=1) 
       ->  Index Scan using positons2_gix on positions2 (cost=0.28..12059.82
rows=51146 width=36) (actual time=0.243..0.310 rows=10 loops=1) 
             Order By: (up_geom <-> '0101000020797F00009A9999990DAF
     Planning time: 0.109 ms 
     Execution time: 0.368 ms 

The thing is, that I would like to get different (dynamic) number k of 
neighbors for each point such that cumulative sum ([How to compute 
cumulative sum][2]) of distances to these points is less than parameter a 
(let say 500m) of course maximizing k. So called adaptive k-nearest neighbor
algorithm. I could compute distance to all positions and filter it by WHERE 

    select * from 
    (select  *, sum(dist) over(order by dist) distcum from 
         (select id, st_distance(geom,'SRID=32633;POINT(404419.4 
5599450.3)'::geometry ) as dist,  geom from 
        positions  ORDER BY geom <-> 'SRID=32633;POINT(404419.4 
5599450.3)'::geometry) a 
    ) b where distcum < 500; 

explain analyze for that: 

     Subquery Scan on b  (cost=14073.70..15608.08 rows=17049 width=52) 
(actual time=155.427..186.656 rows=26 loops=1) 
       Filter: (b.distcum < '500'::double precision) 
       Rows Removed by Filter: 51120 
       ->  WindowAgg  (cost=14073.70..14968.76 rows=51146 width=44) (actual 
time=155.421..181.934 rows=51146 loops=1) 
             ->  Sort  (cost=14073.70..14201.57 rows=51146 width=44) (actual
time=155.410..157.985 rows=51146 loops=1) 
                   Sort Key: a.dist 
                   Sort Method: quicksort  Memory: 5532kB 
                   ->  Subquery Scan on a (cost=9434.16..10073.49 rows=51146
width=44) (actual time=130.931..145.989 rows=51146 loops=1) 
                         ->  Sort  (cost=9434.16..9562.03 rows=51146 width=
36) (actual time=130.930..136.711 rows=51146 loops=1) 
                               Sort Key: ((positions.geom <-> '0101000020797
                               Sort Method: quicksort  Memory: 8729kB 
                               ->  Seq Scan on positions (cost=0.00..5433.95
rows=51146 width=36) (actual time=0.023..95.801 rows=51146 loops=1) 
     Planning time: 0.146 ms 
     Execution time: 186.705 ms 

but if I am not mistaken, this is sorting all points ( Sort Key: (
(positions.geom <->....) using GIST index, runs window function on all 
points to get cumulative sum and then uses filter. 

I would like to apply this adaptive knn for all points in my table with 
about 2 000 000 rows. My proposed query is too slow for that. 

Is there any way how to implement adaptive knn in PotgreSQL/PostGIS so the 
computation is faster than my solution? (probably by avoiding sorting of all
points for each point) 

Thank you 

  [1]: https://postgis.net/docs/geometry_distance_knn.html
  [2]: https://stackoverflow.com/questions/22841206/calculating-cumulative-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20171220/493f82b2/attachment.html>

More information about the postgis-users mailing list