[postgis-users] Adaptive k-nearest neigbourh in PostGIS
Vilem Ded
Ded.V at seznam.cz
Mon Jan 8 07:26:15 PST 2018
Thanks,
very nice indeed! I will definitely give that a try.
But this is really very dependent on data distribution. My dataset is
clustered so it will be better, but probably not good enough:/
As I understood KNN in Postgis is implemented with priority queue?
If so, there must be condition of type while i < k {next} , where k is our K
in KNN. I was hoping for some way how to alter this part of code - "just"
change those few lines..
Thank you for any other hints
Vil
---------- Původní e-mail ----------
Od: Darafei "Komяpa" Praliaskouski <me at komzpa.net>
Komu: PostGIS Users Discussion <postgis-users at lists.osgeo.org>
Datum: 20. 12. 2017 12:44:01
Předmět: Re: [postgis-users] Adaptive k-nearest neigbourh in PostGIS
"
Hi,
it seems to me that cumulative distance is guaranteed to always be more or
equal to straight line distance, thus transforming your KNN query to
essentially KNN-in-box query.
You can add `where ST_DWithin(geom, 'SRID=32633;POINT(404419.4 5599450.3)'::
geometry, distcum)` to your select clause, and you'll get a box-based index
scan for your non-index-using query. Depending on your data distribution it
might give enough performance already.
ср, 20 дек. 2017 г. в 14:30, Vilem Ded <Ded.V at seznam.cz
(mailto:Ded.V at seznam.cz)>:
"
Hi,
I have big table of spatial points in 2D (or 3D) (PostGIS) and distance
defined between them (let say just real spatial distance). When I create
GIST index on these points, I can efficiently get k-nearest neighbors for
each point using `<->` operator in ORDER BY statement with constant k (e.g.
20) specified in `LIMIT` statement([Postgis-official][1]).
SELECT id, st_distance(geom,'SRID=32633;POINT(404419.4 5599450.3)'::
geometry ) as dist, geom
FROM positions
ORDER BY geom <-> 'SRID=32633;POINT(404419.4 5599450.3)'::geometry limit
20;
Explain analyze for that:
QUERY PLAN
----------------------------------------------------------------------------
--------------------------------------------------------------
Limit (cost=0.28..2.64 rows=10 width=36) (actual time=0.243..0.313
rows=10 loops=1)
-> Index Scan using positons2_gix on positions2 (cost=0.28..12059.82
rows=51146 width=36) (actual time=0.243..0.310 rows=10 loops=1)
Order By: (up_geom <-> '0101000020797F00009A9999990DAF
184133333393365C5541'::geometry)
Planning time: 0.109 ms
Execution time: 0.368 ms
The thing is, that I would like to get different (dynamic) number k of
neighbors for each point such that cumulative sum ([How to compute
cumulative sum][2]) of distances to these points is less than parameter a
(let say 500m) of course maximizing k. So called adaptive k-nearest neighbor
algorithm. I could compute distance to all positions and filter it by WHERE
clause:
select * from
(select *, sum(dist) over(order by dist) distcum from
(select id, st_distance(geom,'SRID=32633;POINT(404419.4
5599450.3)'::geometry ) as dist, geom from
positions ORDER BY geom <-> 'SRID=32633;POINT(404419.4
5599450.3)'::geometry) a
) b where distcum < 500;
explain analyze for that:
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------------
Subquery Scan on b (cost=14073.70..15608.08 rows=17049 width=52)
(actual time=155.427..186.656 rows=26 loops=1)
Filter: (b.distcum < '500'::double precision)
Rows Removed by Filter: 51120
-> WindowAgg (cost=14073.70..14968.76 rows=51146 width=44) (actual
time=155.421..181.934 rows=51146 loops=1)
-> Sort (cost=14073.70..14201.57 rows=51146 width=44) (actual
time=155.410..157.985 rows=51146 loops=1)
Sort Key: a.dist
Sort Method: quicksort Memory: 5532kB
-> Subquery Scan on a (cost=9434.16..10073.49 rows=51146
width=44) (actual time=130.931..145.989 rows=51146 loops=1)
-> Sort (cost=9434.16..9562.03 rows=51146 width=
36) (actual time=130.930..136.711 rows=51146 loops=1)
Sort Key: ((positions.geom <-> '0101000020797
F00009A9999990DAF184133333393365C5541'::geometry))
Sort Method: quicksort Memory: 8729kB
-> Seq Scan on positions (cost=0.00..5433.95
rows=51146 width=36) (actual time=0.023..95.801 rows=51146 loops=1)
Planning time: 0.146 ms
Execution time: 186.705 ms
but if I am not mistaken, this is sorting all points ( Sort Key: (
(positions.geom <->....) using GIST index, runs window function on all
points to get cumulative sum and then uses filter.
I would like to apply this adaptive knn for all points in my table with
about 2 000 000 rows. My proposed query is too slow for that.
Is there any way how to implement adaptive knn in PotgreSQL/PostGIS so the
computation is faster than my solution? (probably by avoiding sorting of all
points for each point)
Thank you
Vil
[1]: https://postgis.net/docs/geometry_distance_knn.html
(https://postgis.net/docs/geometry_distance_knn.html)
[2]: https://stackoverflow.com/questions/22841206/calculating-cumulative-
sum-in-postgresql
(https://stackoverflow.com/questions/22841206/calculating-cumulative-sum-in-postgresql)
_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org(mailto:postgis-users at lists.osgeo.org)
https://lists.osgeo.org/mailman/listinfo/postgis-users
(https://lists.osgeo.org/mailman/listinfo/postgis-users)"
_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20180108/8fc7d0c7/attachment.html>
More information about the postgis-users
mailing list