[postgis-users] Adaptive k-nearest neigbourh in PostGIS

Vilem Ded Ded.V at seznam.cz
Mon Jan 8 07:26:15 PST 2018

very nice indeed! I will definitely give that a try. 
But this is really very dependent on data distribution. My dataset is 
clustered so it will be better, but probably not good enough:/
As I understood KNN in Postgis is implemented with priority queue?
If so, there must be condition of type while i < k {next} , where k is our K
in KNN. I was hoping for some way how to alter this part of code - "just" 
change those few lines..
Thank you for any other hints

---------- Původní e-mail ----------
Od: Darafei "Komяpa" Praliaskouski <me at komzpa.net>
Komu: PostGIS Users Discussion <postgis-users at lists.osgeo.org>
Datum: 20. 12. 2017 12:44:01
Předmět: Re: [postgis-users] Adaptive k-nearest neigbourh in PostGIS 

it seems to me that cumulative distance is guaranteed to always be more or 
equal to straight line distance, thus transforming your KNN query to 
essentially KNN-in-box query.

You can add `where ST_DWithin(geom, 'SRID=32633;POINT(404419.4 5599450.3)'::
geometry, distcum)` to your select clause, and you'll get a box-based index 
scan for your non-index-using query. Depending on your data distribution it 
might give enough performance already.

ср, 20 дек. 2017 г. в 14:30, Vilem Ded <Ded.V at seznam.cz
(mailto:Ded.V at seznam.cz)>:


I have big table of spatial points in 2D (or 3D) (PostGIS) and distance 
defined between them (let say just real spatial distance). When I create 
GIST index on these points, I can efficiently get k-nearest neighbors for 
each point using `<->` operator in ORDER BY statement with constant k (e.g. 
20) specified in `LIMIT` statement([Postgis-official][1]). 

    SELECT id, st_distance(geom,'SRID=32633;POINT(404419.4 5599450.3)'::
geometry ) as dist, geom 
    FROM positions 
    ORDER BY geom <-> 'SRID=32633;POINT(404419.4 5599450.3)'::geometry limit

Explain analyze for that: 

     Limit  (cost=0.28..2.64 rows=10 width=36) (actual time=0.243..0.313 
rows=10 loops=1) 
       ->  Index Scan using positons2_gix on positions2 (cost=0.28..12059.82
rows=51146 width=36) (actual time=0.243..0.310 rows=10 loops=1) 
             Order By: (up_geom <-> '0101000020797F00009A9999990DAF
     Planning time: 0.109 ms 
     Execution time: 0.368 ms 

The thing is, that I would like to get different (dynamic) number k of 
neighbors for each point such that cumulative sum ([How to compute 
cumulative sum][2]) of distances to these points is less than parameter a 
(let say 500m) of course maximizing k. So called adaptive k-nearest neighbor
algorithm. I could compute distance to all positions and filter it by WHERE 

    select * from 
    (select  *, sum(dist) over(order by dist) distcum from 
         (select id, st_distance(geom,'SRID=32633;POINT(404419.4 
5599450.3)'::geometry ) as dist,  geom from 
        positions  ORDER BY geom <-> 'SRID=32633;POINT(404419.4 
5599450.3)'::geometry) a 
    ) b where distcum < 500; 

explain analyze for that: 

     Subquery Scan on b  (cost=14073.70..15608.08 rows=17049 width=52) 
(actual time=155.427..186.656 rows=26 loops=1) 
       Filter: (b.distcum < '500'::double precision) 
       Rows Removed by Filter: 51120 
       ->  WindowAgg  (cost=14073.70..14968.76 rows=51146 width=44) (actual 
time=155.421..181.934 rows=51146 loops=1) 
             ->  Sort  (cost=14073.70..14201.57 rows=51146 width=44) (actual
time=155.410..157.985 rows=51146 loops=1) 
                   Sort Key: a.dist 
                   Sort Method: quicksort  Memory: 5532kB 
                   ->  Subquery Scan on a (cost=9434.16..10073.49 rows=51146
width=44) (actual time=130.931..145.989 rows=51146 loops=1) 
                         ->  Sort  (cost=9434.16..9562.03 rows=51146 width=
36) (actual time=130.930..136.711 rows=51146 loops=1) 
                               Sort Key: ((positions.geom <-> '0101000020797
                               Sort Method: quicksort  Memory: 8729kB 
                               ->  Seq Scan on positions (cost=0.00..5433.95
rows=51146 width=36) (actual time=0.023..95.801 rows=51146 loops=1) 
     Planning time: 0.146 ms 
     Execution time: 186.705 ms 

but if I am not mistaken, this is sorting all points ( Sort Key: (
(positions.geom <->....) using GIST index, runs window function on all 
points to get cumulative sum and then uses filter. 

I would like to apply this adaptive knn for all points in my table with 
about 2 000 000 rows. My proposed query is too slow for that. 

Is there any way how to implement adaptive knn in PotgreSQL/PostGIS so the 
computation is faster than my solution? (probably by avoiding sorting of all
points for each point) 

Thank you 

  [1]: https://postgis.net/docs/geometry_distance_knn.html
  [2]: https://stackoverflow.com/questions/22841206/calculating-cumulative-
postgis-users mailing list
postgis-users at lists.osgeo.org(mailto:postgis-users at lists.osgeo.org)
postgis-users mailing list
postgis-users at lists.osgeo.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20180108/8fc7d0c7/attachment.html>

More information about the postgis-users mailing list