[postgis-users] Increase KNN efficiency

Regina Obe lr at pcorp.us
Sat Jun 2 18:59:50 PDT 2018


Side note if you are using PostGIS >= 2.2 + PostgreSQL 9.5+, then no need to compute distance twice because <-> would give you true distance.  So you can shorten a bit (and should improve speed slightly) with

 

SELECT

  p1.id <http://p1.id> ,

  t.id <http://t.id>  AS id2,

  t.dist

FROM parc p1

  CROSS JOIN LATERAL

             (SELECT

                p2.id <http://p2.id> ,

               p1.geom <-> p2.geom AS dist

              FROM parc p2

              WHERE p1.id <http://p1.id>  <> p2.id <http://p2.id> 

              ORDER BY dist

              LIMIT 1) AS t

ORDER BY p1.id <http://p1.id> , t.dist;

 

 

From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Vispo Leblanc
Sent: Thursday, May 31, 2018 12:05 PM
To: postgis-users at lists.osgeo.org
Subject: Re: [postgis-users] Increase KNN efficiency

 

Thanks Nicolas, this is way better, I guess my previous query was calculating all distance between all elements. 

 

Thanks again! 

 

On Thu, May 31, 2018 at 2:47 AM Nicolas Ribot <nicolas.ribot at gmail.com <mailto:nicolas.ribot at gmail.com> > wrote:

Hi,

 

Put the KNN operator in a LATERAL subquery, using LIMIT x to limit NN to x elements:

 

SELECT

  p1.id <http://p1.id> ,

  t.id <http://t.id>  AS id2,

  t.dist

FROM parc p1

  CROSS JOIN LATERAL

             (SELECT

                p2.id <http://p2.id> ,

                st_distance(p1.geom, p2.geom) AS dist

              FROM parc p2

              WHERE p1.id <http://p1.id>  <> p2.id <http://p2.id> 

              ORDER BY p1.geom <-> p2.geom

              LIMIT 1) AS t

ORDER BY p1.id <http://p1.id> , t.dist;

 

Nicolas

 

On 30 May 2018 at 23:07, Vispo Leblanc <vispoleblanc at gmail.com <mailto:vispoleblanc at gmail.com> > wrote:

Hello PostGIS users, 

 

I am looking for a way to increase the speed of my query with KNN. 

 

I have two tables with millions of points in each table. I would like to find the nearest neighbor (only one) of my points in table A with my points in table B.

 

I wrote the following query: 

 

WITH index_query as (
  SELECT
    t1.id <http://t1.id> ,
    t1.geom AS geom,
    t2.geom AS geom2, 
    t1.geom <-> t2.geom AS distance
  FROM research_scrapper.table1 t1, 
      research_scrapper.table2 t2
  ORDER BY t1.geom <-> t2.geom
)
SELECT tbl.cas_id, tbl.geom 
FROM index_query tbl
INNER JOIN
    (
        SELECT geom2, 
               MIN(distance) as distance
        FROM index_query 
        GROUP BY geom2
    )tbl1
    ON tbl1.geom_generated=tbl.geom_generated
    AND tbl1.distance = tbl.distance

 

This is taking forever, would you have any suggestions on how to increase the efficiency?

 

Thanks for your help,

 


_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> 
https://lists.osgeo.org/mailman/listinfo/postgis-users

 

_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> 
https://lists.osgeo.org/mailman/listinfo/postgis-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20180602/165811f7/attachment.html>


More information about the postgis-users mailing list