<html><body>Thanks, <br>very nice indeed! I will definitely give that a try. <br>But this is really very dependent on data distribution. My dataset is clustered so it will be better, but probably not good enough:/<br>As I understood KNN in Postgis is implemented with priority queue?<br>If so, there must be condition of type while i < k {next} , where k is our K in KNN. I was hoping for some way how to alter this part of code - "just" change those few lines..<br>Thank you for any other hints<br>Vil<br><br><aside>

---------- Původní e-mail ----------<br>

Od: Darafei "Komяpa" Praliaskouski <me@komzpa.net><br>

Komu: PostGIS Users Discussion <postgis-users@lists.osgeo.org><br>

Datum: 20. 12. 2017 12:44:01<br>

Předmět: Re: [postgis-users] Adaptive k-nearest neigbourh in PostGIS

</aside><br><blockquote data-email="me@komzpa.net"><div dir="ltr">Hi,<div><br></div><div>it seems to me that cumulative distance is guaranteed to always be more or equal to straight line distance, thus transforming your KNN query to essentially KNN-in-box query.</div><div><br></div><div>You can add `where ST_DWithin(geom, 'SRID=32633;POINT(404419.4 5599450.3)'::geometry, distcum)` to your select clause, and you'll get a box-based index scan for your non-index-using query. Depending on your data distribution it might give enough performance already.</div></div><br><div><div dir="ltr">ср, 20 дек. 2017 г. в 14:30, Vilem Ded <<a href="mailto:Ded.V@seznam.cz">Ded.V@seznam.cz</a>>:<br></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px#ccc solid;padding-left:1ex"><div>Hi,<br>

<br>I have big table of spatial points in 2D (or 3D) (PostGIS) and distance 

defined between them (let say just real spatial distance). When I create 

GIST index on these points, I can efficiently get k-nearest neighbors 

for each point using `<->` operator in ORDER BY statement with constant 

k (e.g. 20) specified in `LIMIT` statement([Postgis-official][1]).

<br>

<br><i>    SELECT id, st_distance(geom,'SRID=32633;POINT(404419.4 

5599450.3)'::geometry ) as dist, geom

<br>    FROM positions

<br>    ORDER BY geom <-> 'SRID=32633;POINT(404419.4 5599450.3)'::geometry 

limit 20;

</i><br>

<br>Explain analyze for that:

<br>

<br><i>QUERY PLAN

<br>------------------------------------------------------------------------------------------------------------------------------------------

<br>     Limit  (cost=0.28..2.64 rows=10 width=36) (actual 

time=0.243..0.313 rows=10 loops=1)

<br>       ->  Index Scan using positons2_gix on positions2 

(cost=0.28..12059.82 rows=51146 width=36) (actual time=0.243..0.310 

rows=10 loops=1)

<br>             Order By: (up_geom <-> 

'0101000020797F00009A9999990DAF184133333393365C5541'::geometry)

<br>     Planning time: 0.109 ms

<br>     Execution time: 0.368 ms

</i><br>

<br>

<br>The thing is, that I would like to get different (dynamic) number k of neighbors 

for each point such that cumulative sum ([How to compute cumulative 

sum][2]) of distances to these points is less than parameter a (let say 

500m) of course maximizing k. So called adaptive k-nearest neighbor 

algorithm. I could compute distance to all positions and filter it by 

WHERE clause:

<br>

<br><i>    select * from

<br>    (select  *, sum(dist) over(order by dist) distcum from

<br>         (select id, st_distance(geom,'SRID=32633;POINT(404419.4 

5599450.3)'::geometry ) as dist,  geom from

<br>        positions  ORDER BY geom <-> 'SRID=32633;POINT(404419.4 

5599450.3)'::geometry) a

<br>    ) b where distcum < 500;

</i><br>

<br>explain analyze for that:

<br>

<br><i>    QUERY PLAN

<br>-----------------------------------------------------------------------------------------------------------------------------------------------

<br>     Subquery Scan on b  (cost=14073.70..15608.08 rows=17049 width=52) 

(actual time=155.427..186.656 rows=26 loops=1)

<br>       Filter: (b.distcum < '500'::double precision)

<br>       Rows Removed by Filter: 51120

<br>       ->  WindowAgg  (cost=14073.70..14968.76 rows=51146 width=44) 

(actual time=155.421..181.934 rows=51146 loops=1)

<br>             ->  Sort  (cost=14073.70..14201.57 rows=51146 width=44) 

(actual time=155.410..157.985 rows=51146 loops=1)

<br>                   Sort Key: a.dist

<br>                   Sort Method: quicksort  Memory: 5532kB

<br>                   ->  Subquery Scan on a (cost=9434.16..10073.49 

rows=51146 width=44) (actual time=130.931..145.989 rows=51146 loops=1)

<br>                         ->  Sort  (cost=9434.16..9562.03 rows=51146 

width=36) (actual time=130.930..136.711 rows=51146 loops=1)

<br>                               Sort Key: ((positions.geom <-> 

'0101000020797F00009A9999990DAF184133333393365C5541'::geometry))

<br>                               Sort Method: quicksort  Memory: 8729kB

<br>                               ->  Seq Scan on positions 

(cost=0.00..5433.95 rows=51146 width=36) (actual time=0.023..95.801 

rows=51146 loops=1)

<br>     Planning time: 0.146 ms

<br>     Execution time: 186.705 ms

</i><br>

<br>

<br>but if I am not mistaken, this is sorting all points ( <i>Sort Key: 

((positions.geom <->...</i>.) using GIST index, runs window function on all 

points to get cumulative sum and then uses filter.

<br>

<br>I would like to apply this adaptive knn for all points in my table with 

about 2 000 000 rows. My proposed query is too slow for that.

<br>

<br>Is there any way how to implement adaptive knn in PotgreSQL/PostGIS so 

the computation is faster than my solution? (probably by avoiding 

sorting of all points for each point)

<br><br>

Thank you <br>Vil

<br>

<br>  [1]: <a href="https://postgis.net/docs/geometry_distance_knn.html" rel="noopener">https://postgis.net/docs/geometry_distance_knn.html</a>

<br>  [2]: 

<a href="https://stackoverflow.com/questions/22841206/calculating-cumulative-sum-in-postgresql" rel="noopener">https://stackoverflow.com/questions/22841206/calculating-cumulative-sum-in-postgresql</a></div>_______________________________________________<br>

postgis-users mailing list<br>

<a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br>

<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer">https://lists.osgeo.org/mailman/listinfo/postgis-users</a></blockquote></div>

_______________________________________________<br>postgis-users mailing list<br>postgis-users@lists.osgeo.org<br>https://lists.osgeo.org/mailman/listinfo/postgis-users</blockquote></body></html>