[postgis-devel] Wrong answer from new knn with recheck

Paul Ramsey pramsey at cleverelephant.ca
Mon May 25 05:14:14 PDT 2015


Phew. Thinking about it, I couldn’t figure any way I could get a wrong answer in my implementation that wouldn’t also show up if you used a literal value; glad you were able to test and find it in the native types.

P 


--  
http://postgis.net  
http://cleverelephant.ca


On May 24, 2015 at 10:19:16 PM, Paragon Corporation (lr at pcorp.us) wrote:
> Nicklas,
>  
> I think you are right. I put in a knn_recheck that exercises the issue.
> http://trac.osgeo.org/postgis/ticket/2703
>  
> I think I got the postgresql geometry types to fail in a similar fashion, so might be a PostgreSQL  
> issue -- to exercise the PostgreSQL types, I'll report upstream
>  
> I did this:
> -- create table
> DROP TABLE IF EXISTS knn_recheck_point;
> CREATE TABLE knn_recheck_point(gid serial , geom point);
> INSERT INTO knn_recheck_point(gid,geom)
> SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid, point(x*0.777,y*0.777) As geom
> FROM generate_series(-100,100, 1) AS x CROSS JOIN generate_series(-300,10000,10)  
> As y;
>  
> DROP TABLE IF EXISTS knn_recheck_poly;
> CREATE TABLE knn_recheck_poly(gid serial , geom polygon);
>  
> INSERT INTO knn_recheck_poly(gid,geom)
> SELECT 600000 + ROW_NUMBER() OVER(ORDER BY gid) AS gid, circle(geom,1000)::polygon  
> As geom
> FROM knn_recheck_point
> WHERE gid IN(1000, 10000, 2000, 40000);
>  
>  
> SELECT a.gid, b.gid As match, a.geom <-> b.geom As dist
> FROM knn_recheck_point As a
> LEFT JOIN
> LATERAL ( SELECT gid, geom
> FROM knn_recheck_poly As g ORDER BY g.geom <-> a.geom LIMIT 5) As b ON true
> WHERE a.gid IN(50001,70000)
> ORDER BY a.gid ,dist;
>  
>  
> Returns:
> gid | match | dist
> -------+--------+------------------
> 50001 | 600003 | 616.904706911043
> 50001 | 600004 | 1400.95154727064
> 50001 | 600002 | 2543.38219144528
> 50001 | 600001 | 2784.23980858618
> 70000 | 600001 | 0
> 70000 | 600002 | 0
> 70000 | 600004 | 0
> 70000 | 600003 | 571.32018689698
> (8 rows)
>  
> -- now theirs really doesn't want to use an index so I got to do this --
> DROP INDEX IF EXISTS idx_knn_recheck_point ;
> CREATE INDEX idx_knn_recheck_point ON knn_recheck_point USING gist(geom);
>  
>  
> DROP INDEX IF EXISTS idx_knn_recheck_poly ;
> CREATE INDEX idx_knn_recheck_poly ON knn_recheck_poly USING gist(geom);
>  
> SET enable_seqscan = false;
> SELECT a.gid, b.gid As match, a.geom <-> b.geom As dist
> FROM knn_recheck_point As a
> LEFT JOIN
> LATERAL ( SELECT gid, geom
> FROM knn_recheck_poly As g ORDER BY g.geom <-> a.geom LIMIT 5) As b ON true
> WHERE a.gid IN(50001,70000)
> ORDER BY a.gid ,dist;
>  
>  
> gid | match | dist
> -------+--------+------------------
> 50001 | 600003 | 616.904706911043
> 50001 | 600004 | 1400.95154727064
> 50001 | 600002 | 2543.38219144528
> 50001 | 600001 | 2784.23980858618
> 70000 | |
> (5 rows)
>  
> Thanks,
> Regina
>  
>  
> -----Original Message-----
> From: postgis-devel-bounces at lists.osgeo.org [mailto:postgis-devel-bounces at lists.osgeo.org]  
> On Behalf Of Nicklas Avén
> Sent: Sunday, May 24, 2015 4:30 AM
> To: PostGIS Development Discussion
> Subject: Re: [postgis-devel] Wrong answer from new knn with recheck
>  
> The same thing happens for more houses in the table. So it is not that particular house  
> it is about.
>  
> This might be a PostgreSQL problem. I think that I can prove that what happens is this (I  
> will try to explain it):
>  
> Sometimes, not every time, the first unused item that is rechecked appears as the closest  
> item on the next rechecked comparison.
>  
> Look at the two examples below. At the first run I return only the closest road to each house.  
> The problem is both at row 2 (t_gid=2) and row 6 (t_gid = 6). note the r_gid values on the  
> problematic rows
>  
> Then at the second run I ask for the 2 closest roads to each house.
> Then the r_gid values from the problematic rows in run1 shows as the second closest road  
> in run 2 for the previous house.
>  
> RUN 1:
>  
> SELECT t.navn,t.gid t_gid, r.gid r_gid, st_distance(r.geom, t.geom) FROM (select  
> * from turisthytte) t CROSS JOIN LATERAL (SELECT r.gid, r.geom FROM roads r ORDER BY t.geom  
> <->r.geom LIMIT 1) r limit 10;
>  
> navn | t_gid | r_gid | st_distance
> ----------------------------------+-------+--------+------------------
> Gramstad | 1 | 988075 | 90.2225986190226
> Sidhusgohpi | 2 | 986169 | 1379886.0666003
> G rden Li | 3 | 942867 | 968.288199312332
> Haukenest rnet | 4 | 76588 | 404.810783875252
> Hannevoldhytta | 5 | 759879 | 246.152262737727
> Deatnomuotki | 6 | 759880 | 1221715.24537982
> Fogdehytta | 7 | 76347 | 136082.340278048
> Mule Varde | 8 | 76348 | 135425.235637218
> Fjellvassbu | 9 | 941387 | 305599.118262984
> Krokbua | 10 | 423774 | 198.915851325086
>  
>  
> RUN 2:
>  
> SELECT t.navn,t.gid t_gid, r.gid r_gid, st_distance(r.geom, t.geom) FROM (select  
> * from turisthytte) t CROSS JOIN LATERAL (SELECT r.gid, r.geom FROM roads r ORDER BY t.geom  
> <->r.geom LIMIT 2) r limit 12;
>  
>  
> navn | t_gid | r_gid | st_distance
> ----------------------------------+-------+---------+------------------  
> Gramstad | 1 | 988075 | 90.2225986190226
> Gramstad | 1 | 986169 | 128.356279805393
> Sidhusgohpi | 2 | 982379 | 1379790.96016902
> Sidhusgohpi | 2 | 1724598 | 27839.424840718
> G rden Li | 3 | 942867 | 968.288199312332
> G rden Li | 3 | 941387 | 1083.96363188574
> Haukenest rnet | 4 | 76588 | 404.810783875252
> Haukenest rnet | 4 | 76347 | 541.305515942029
> Hannevoldhytta | 5 | 759879 | 246.152262737727
> Hannevoldhytta | 5 | 759880 | 292.257309134795
> Deatnomuotki | 6 | 759871 | 1221757.34981649
> Deatnomuotki | 6 | 760212 | 1222588.13809164
>  
>  
> Is PostgreSQL to blame?
>  
> /Nicklas
>  
>  
>  
>  
> On Sun, 2015-05-24 at 07:02 +0200, Nicklas Av n wrote:
> > Then i gives the right answer.
> >
> > /Nicklas
> >
> >
> > Sent from my Cat phone.
> >
> > Den 24 maj 2015 01:57 skrev Paul Ramsey :
> > >
> > > What happens when you run the query using the suspect house as a literal instead of doing  
> the fun lateral query?
> > >
> > > P.
> > >
> > >
> > > --
> > > http://postgis.net
> > > http://cleverelephant.ca
> > >
> > >
> > > On May 23, 2015 at 1:36:55 PM, Nicklas Av n (nicklas.aven at jordogskog.no) wrote:
> > > > Since I cannot reach trac I post it here.
> > > >
> > > > I have been playing some with latest trunk and latest postgresql.
> > > >
> > > > Since I cannot reach trac I cannot check that it is absolute latest
> > > > trunk but I have r13544
> > > >
> > > > PostgreSQL I pulled from git yesterday.
> > > >
> > > > I have 2 data sets. one with roads and one with houses.
> > > >
> > > > Attached is data that reproduces the problem.
> > > >
> > > > table r is the roads and table t is the houses.
> > > >
> > > >
> > > > If I then run this query:
> > > >
> > > > SELECT t.navn, r.gid, st_distance(r.geom, t.geom)
> > > > FROM t CROSS JOIN LATERAL
> > > > (SELECT r.gid, r.geom FROM r ORDER BY t.geom <->r.geom LIMIT 2) r;
> > > >
> > > > I get this result:
> > > > navn | gid | st_distance
> > > > ----------------------------------+---------+------------------
> > > > Gramstad | 988075 | 90.2225986190226
> > > > Gramstad | 986169 | 128.356279805393
> > > > Sidhusgohpi | 982379 | 1379790.96016902
> > > > Sidhusgohpi | 1724598 | 27839.424840718
> > > >
> > > >
> > > > What is totally wrong here is the third row. That road is 1379 km away.
> > > > The last row gives the right answer. The closest road to that house is
> > > > 27 km away.
> > > >
> > > > With the full data sets I can play with the limit and see how roads from
> > > > the first house in some way is following to the second house. road nr
> > > > 982379 is just 215 meters away from house nr 1. So something is not
> > > > letting go.
> > > >
> > > >
> > > > Thanks
> > > >
> > > > Nicklas Av n
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > postgis-devel mailing list
> > > > postgis-devel at lists.osgeo.org
> > > > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
> > >
> > >
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
> >
>  
>  
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>  
>  
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel




More information about the postgis-devel mailing list