[postgis-devel] Wrong answer from new knn with recheck

Paragon Corporation lr at pcorp.us
Sun May 24 22:19:04 PDT 2015


Nicklas,

I think you are right.  I put in a knn_recheck that exercises the issue. 
http://trac.osgeo.org/postgis/ticket/2703

I think I got the postgresql geometry types to fail in a similar fashion, so might be a PostgreSQL issue -- to exercise the PostgreSQL types, I'll report upstream

I did this:
-- create table
DROP TABLE IF EXISTS knn_recheck_point;
CREATE TABLE knn_recheck_point(gid serial , geom point);
INSERT INTO knn_recheck_point(gid,geom)
SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid, point(x*0.777,y*0.777) As geom
FROM generate_series(-100,100, 1) AS x CROSS JOIN generate_series(-300,10000,10) As y;

DROP TABLE IF EXISTS knn_recheck_poly;
CREATE TABLE knn_recheck_poly(gid serial , geom polygon);

INSERT INTO knn_recheck_poly(gid,geom)
SELECT 600000 + ROW_NUMBER() OVER(ORDER BY gid) AS gid, circle(geom,1000)::polygon As geom
FROM knn_recheck_point
WHERE gid IN(1000, 10000, 2000, 40000);


SELECT a.gid, b.gid As match, a.geom <-> b.geom As dist
FROM knn_recheck_point As a 
	LEFT JOIN 
		LATERAL ( SELECT  gid, geom		
            FROM knn_recheck_poly As g  ORDER BY g.geom <-> a.geom LIMIT 5) As b ON true
WHERE a.gid IN(50001,70000)
ORDER BY a.gid ,dist;


Returns:
  gid  | match  |       dist
-------+--------+------------------
 50001 | 600003 | 616.904706911043
 50001 | 600004 | 1400.95154727064
 50001 | 600002 | 2543.38219144528
 50001 | 600001 | 2784.23980858618
 70000 | 600001 |                0
 70000 | 600002 |                0
 70000 | 600004 |                0
 70000 | 600003 |  571.32018689698
(8 rows)

-- now theirs really doesn't want to use an index so I got to do this --
DROP INDEX IF EXISTS idx_knn_recheck_point  ;
CREATE INDEX idx_knn_recheck_point ON knn_recheck_point USING gist(geom);


DROP INDEX IF EXISTS idx_knn_recheck_poly  ;
CREATE INDEX idx_knn_recheck_poly ON knn_recheck_poly USING gist(geom);

SET enable_seqscan = false;
SELECT a.gid, b.gid As match, a.geom <-> b.geom As dist
FROM knn_recheck_point As a 
	LEFT JOIN 
		LATERAL ( SELECT  gid, geom		
            FROM knn_recheck_poly As g  ORDER BY g.geom <-> a.geom LIMIT 5) As b ON true
WHERE a.gid IN(50001,70000)
ORDER BY a.gid ,dist;


  gid  | match  |       dist
-------+--------+------------------
 50001 | 600003 | 616.904706911043
 50001 | 600004 | 1400.95154727064
 50001 | 600002 | 2543.38219144528
 50001 | 600001 | 2784.23980858618
 70000 |        |
(5 rows)

Thanks,
Regina


-----Original Message-----
From: postgis-devel-bounces at lists.osgeo.org [mailto:postgis-devel-bounces at lists.osgeo.org] On Behalf Of Nicklas Avén
Sent: Sunday, May 24, 2015 4:30 AM
To: PostGIS Development Discussion
Subject: Re: [postgis-devel] Wrong answer from new knn with recheck

The same thing happens for more houses in the table. So it is not that particular house it is about.

This might be a PostgreSQL problem. I think that I can prove that what happens is this (I will try to explain it):

Sometimes, not every time, the first unused item that is rechecked appears as the closest item on the next rechecked comparison.

Look at the two examples below. At the first run I return only the closest road to each house. The problem is both at row 2 (t_gid=2) and row 6 (t_gid = 6). note the r_gid values on the problematic rows

Then at the second run I ask for the 2 closest roads to each house.
Then the r_gid values from the problematic rows in run1 shows as the second closest road in run 2 for the previous house. 

RUN 1: 

SELECT t.navn,t.gid t_gid, r.gid r_gid, st_distance(r.geom, t.geom) FROM (select * from turisthytte) t CROSS JOIN LATERAL (SELECT r.gid, r.geom FROM roads r ORDER BY t.geom <->r.geom LIMIT 1) r limit 10;

               navn               | t_gid | r_gid  |   st_distance    
----------------------------------+-------+--------+------------------
 Gramstad                         |     1 | 988075 | 90.2225986190226
 Sidhusgohpi                      |     2 | 986169 |  1379886.0666003
 G rden Li                        |     3 | 942867 | 968.288199312332
 Haukenest rnet                   |     4 |  76588 | 404.810783875252
 Hannevoldhytta                   |     5 | 759879 | 246.152262737727
 Deatnomuotki                     |     6 | 759880 | 1221715.24537982
 Fogdehytta                       |     7 |  76347 | 136082.340278048
 Mule Varde                       |     8 |  76348 | 135425.235637218
 Fjellvassbu                      |     9 | 941387 | 305599.118262984
 Krokbua                          |    10 | 423774 | 198.915851325086


RUN 2:

SELECT t.navn,t.gid t_gid, r.gid r_gid, st_distance(r.geom, t.geom) FROM (select * from turisthytte) t CROSS JOIN LATERAL (SELECT r.gid, r.geom FROM roads r ORDER BY t.geom <->r.geom LIMIT 2) r limit 12;


               navn               | t_gid |  r_gid  |   st_distance    
----------------------------------+-------+---------+------------------
 Gramstad                         |     1 |  988075 | 90.2225986190226
 Gramstad                         |     1 |  986169 | 128.356279805393
 Sidhusgohpi                      |     2 |  982379 | 1379790.96016902
 Sidhusgohpi                      |     2 | 1724598 |  27839.424840718
 G rden Li                        |     3 |  942867 | 968.288199312332
 G rden Li                        |     3 |  941387 | 1083.96363188574
 Haukenest rnet                   |     4 |   76588 | 404.810783875252
 Haukenest rnet                   |     4 |   76347 | 541.305515942029
 Hannevoldhytta                   |     5 |  759879 | 246.152262737727
 Hannevoldhytta                   |     5 |  759880 | 292.257309134795
 Deatnomuotki                     |     6 |  759871 | 1221757.34981649
 Deatnomuotki                     |     6 |  760212 | 1222588.13809164


Is PostgreSQL to blame?

/Nicklas




On Sun, 2015-05-24 at 07:02 +0200, Nicklas Av n wrote:
> Then i gives the right answer.
> 
> /Nicklas
> 
> 
> Sent from my Cat  phone.
> 
> Den 24 maj 2015 01:57 skrev Paul Ramsey <pramsey at cleverelephant.ca>:
> >
> > What happens when you run the query using the suspect house as a literal instead of doing the fun lateral query? 
> >
> > P.  
> >
> >
> > --  
> > http://postgis.net  
> > http://cleverelephant.ca 
> >
> >
> > On May 23, 2015 at 1:36:55 PM, Nicklas Av n (nicklas.aven at jordogskog.no) wrote: 
> > > Since I cannot reach trac I post it here. 
> > >  
> > > I have been playing some with latest trunk and latest postgresql. 
> > >  
> > > Since I cannot reach trac I cannot check that it is absolute latest 
> > > trunk but I have r13544 
> > >  
> > > PostgreSQL I pulled from git yesterday. 
> > >  
> > > I have 2 data sets. one with roads and one with houses. 
> > >  
> > > Attached is data that reproduces the problem. 
> > >  
> > > table r is the roads and table t is the houses. 
> > >  
> > >  
> > > If I then run this query: 
> > >  
> > > SELECT t.navn, r.gid, st_distance(r.geom, t.geom) 
> > > FROM t CROSS JOIN LATERAL 
> > > (SELECT r.gid, r.geom FROM r ORDER BY t.geom <->r.geom LIMIT 2) r; 
> > >  
> > > I get this result: 
> > > navn | gid | st_distance 
> > > ----------------------------------+---------+------------------ 
> > > Gramstad | 988075 | 90.2225986190226 
> > > Gramstad | 986169 | 128.356279805393 
> > > Sidhusgohpi | 982379 | 1379790.96016902 
> > > Sidhusgohpi | 1724598 | 27839.424840718 
> > >  
> > >  
> > > What is totally wrong here is the third row. That road is 1379 km away. 
> > > The last row gives the right answer. The closest road to that house is 
> > > 27 km away. 
> > >  
> > > With the full data sets I can play with the limit and see how roads from 
> > > the first house in some way is following to the second house. road nr 
> > > 982379 is just 215 meters away from house nr 1. So something is not 
> > > letting go. 
> > >  
> > >  
> > > Thanks 
> > >  
> > > Nicklas Av n 
> > >  
> > >  
> > >  
> > > _______________________________________________ 
> > > postgis-devel mailing list 
> > > postgis-devel at lists.osgeo.org 
> > > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel 
> >
> >
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
> 


_______________________________________________
postgis-devel mailing list
postgis-devel at lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel





More information about the postgis-devel mailing list