[postgis-users] Adaptive k-nearest neigbourh in PostGIS

Darafei "Komяpa" Praliaskouski me at komzpa.net
Wed Dec 20 03:43:30 PST 2017


Hi,

it seems to me that cumulative distance is guaranteed to always be more or
equal to straight line distance, thus transforming your KNN query to
essentially KNN-in-box query.

You can add `where ST_DWithin(geom, 'SRID=32633;POINT(404419.4
5599450.3)'::geometry, distcum)` to your select clause, and you'll get a
box-based index scan for your non-index-using query. Depending on your data
distribution it might give enough performance already.

ср, 20 дек. 2017 г. в 14:30, Vilem Ded <Ded.V at seznam.cz>:

> Hi,
>
> I have big table of spatial points in 2D (or 3D) (PostGIS) and distance
> defined between them (let say just real spatial distance). When I create
> GIST index on these points, I can efficiently get k-nearest neighbors for
> each point using `<->` operator in ORDER BY statement with constant k (e.g.
> 20) specified in `LIMIT` statement([Postgis-official][1]).
>
>
>
> *    SELECT id, st_distance(geom,'SRID=32633;POINT(404419.4
> 5599450.3)'::geometry ) as dist, geom     FROM positions     ORDER BY geom
> <-> 'SRID=32633;POINT(404419.4 5599450.3)'::geometry limit 20; *
>
> Explain analyze for that:
>
>
>
>
>
>
>
> *QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------
>      Limit  (cost=0.28..2.64 rows=10 width=36) (actual time=0.243..0.313
> rows=10 loops=1)        ->  Index Scan using positons2_gix on positions2
> (cost=0.28..12059.82 rows=51146 width=36) (actual time=0.243..0.310 rows=10
> loops=1)              Order By: (up_geom <->
> '0101000020797F00009A9999990DAF184133333393365C5541'::geometry)
> Planning time: 0.109 ms      Execution time: 0.368 ms *
>
>
> The thing is, that I would like to get different (dynamic) number k of
> neighbors for each point such that cumulative sum ([How to compute
> cumulative sum][2]) of distances to these points is less than parameter a
> (let say 500m) of course maximizing k. So called adaptive k-nearest
> neighbor algorithm. I could compute distance to all positions and filter it
> by WHERE clause:
>
>
>
>
>
> *    select * from     (select  *, sum(dist) over(order by dist) distcum
> from          (select id, st_distance(geom,'SRID=32633;POINT(404419.4
> 5599450.3)'::geometry ) as dist,  geom from         positions  ORDER BY
> geom <-> 'SRID=32633;POINT(404419.4 5599450.3)'::geometry) a     ) b where
> distcum < 500; *
>
> explain analyze for that:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *    QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------
>      Subquery Scan on b  (cost=14073.70..15608.08 rows=17049 width=52)
> (actual time=155.427..186.656 rows=26 loops=1)        Filter: (b.distcum <
> '500'::double precision)        Rows Removed by Filter: 51120        ->
> WindowAgg  (cost=14073.70..14968.76 rows=51146 width=44) (actual
> time=155.421..181.934 rows=51146 loops=1)              ->  Sort
> (cost=14073.70..14201.57 rows=51146 width=44) (actual time=155.410..157.985
> rows=51146 loops=1)                    Sort Key: a.dist
> Sort Method: quicksort  Memory: 5532kB                    ->  Subquery Scan
> on a (cost=9434.16..10073.49 rows=51146 width=44) (actual
> time=130.931..145.989 rows=51146 loops=1)                          ->
> Sort  (cost=9434.16..9562.03 rows=51146 width=36) (actual
> time=130.930..136.711 rows=51146 loops=1)
> Sort Key: ((positions.geom <->
> '0101000020797F00009A9999990DAF184133333393365C5541'::geometry))
>                                Sort Method: quicksort  Memory: 8729kB
>                                ->  Seq Scan on positions
> (cost=0.00..5433.95 rows=51146 width=36) (actual time=0.023..95.801
> rows=51146 loops=1)      Planning time: 0.146 ms      Execution time:
> 186.705 ms *
>
>
> but if I am not mistaken, this is sorting all points ( *Sort Key:
> ((positions.geom <->...*.) using GIST index, runs window function on all
> points to get cumulative sum and then uses filter.
>
> I would like to apply this adaptive knn for all points in my table with
> about 2 000 000 rows. My proposed query is too slow for that.
>
> Is there any way how to implement adaptive knn in PotgreSQL/PostGIS so the
> computation is faster than my solution? (probably by avoiding sorting of
> all points for each point)
>
> Thank you
> Vil
>
>   [1]: https://postgis.net/docs/geometry_distance_knn.html
>   [2]:
> https://stackoverflow.com/questions/22841206/calculating-cumulative-sum-in-postgresql
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20171220/40d79f6a/attachment.html>


More information about the postgis-users mailing list