[postgis-devel] Wrong answer from new knn with recheck
Paragon Corporation
lr at pcorp.us
Sun May 24 22:19:04 PDT 2015
Nicklas,
I think you are right. I put in a knn_recheck that exercises the issue.
http://trac.osgeo.org/postgis/ticket/2703
I think I got the postgresql geometry types to fail in a similar fashion, so might be a PostgreSQL issue -- to exercise the PostgreSQL types, I'll report upstream
I did this:
-- create table
DROP TABLE IF EXISTS knn_recheck_point;
CREATE TABLE knn_recheck_point(gid serial , geom point);
INSERT INTO knn_recheck_point(gid,geom)
SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid, point(x*0.777,y*0.777) As geom
FROM generate_series(-100,100, 1) AS x CROSS JOIN generate_series(-300,10000,10) As y;
DROP TABLE IF EXISTS knn_recheck_poly;
CREATE TABLE knn_recheck_poly(gid serial , geom polygon);
INSERT INTO knn_recheck_poly(gid,geom)
SELECT 600000 + ROW_NUMBER() OVER(ORDER BY gid) AS gid, circle(geom,1000)::polygon As geom
FROM knn_recheck_point
WHERE gid IN(1000, 10000, 2000, 40000);
SELECT a.gid, b.gid As match, a.geom <-> b.geom As dist
FROM knn_recheck_point As a
LEFT JOIN
LATERAL ( SELECT gid, geom
FROM knn_recheck_poly As g ORDER BY g.geom <-> a.geom LIMIT 5) As b ON true
WHERE a.gid IN(50001,70000)
ORDER BY a.gid ,dist;
Returns:
gid | match | dist
-------+--------+------------------
50001 | 600003 | 616.904706911043
50001 | 600004 | 1400.95154727064
50001 | 600002 | 2543.38219144528
50001 | 600001 | 2784.23980858618
70000 | 600001 | 0
70000 | 600002 | 0
70000 | 600004 | 0
70000 | 600003 | 571.32018689698
(8 rows)
-- now theirs really doesn't want to use an index so I got to do this --
DROP INDEX IF EXISTS idx_knn_recheck_point ;
CREATE INDEX idx_knn_recheck_point ON knn_recheck_point USING gist(geom);
DROP INDEX IF EXISTS idx_knn_recheck_poly ;
CREATE INDEX idx_knn_recheck_poly ON knn_recheck_poly USING gist(geom);
SET enable_seqscan = false;
SELECT a.gid, b.gid As match, a.geom <-> b.geom As dist
FROM knn_recheck_point As a
LEFT JOIN
LATERAL ( SELECT gid, geom
FROM knn_recheck_poly As g ORDER BY g.geom <-> a.geom LIMIT 5) As b ON true
WHERE a.gid IN(50001,70000)
ORDER BY a.gid ,dist;
gid | match | dist
-------+--------+------------------
50001 | 600003 | 616.904706911043
50001 | 600004 | 1400.95154727064
50001 | 600002 | 2543.38219144528
50001 | 600001 | 2784.23980858618
70000 | |
(5 rows)
Thanks,
Regina
-----Original Message-----
From: postgis-devel-bounces at lists.osgeo.org [mailto:postgis-devel-bounces at lists.osgeo.org] On Behalf Of Nicklas Avén
Sent: Sunday, May 24, 2015 4:30 AM
To: PostGIS Development Discussion
Subject: Re: [postgis-devel] Wrong answer from new knn with recheck
The same thing happens for more houses in the table. So it is not that particular house it is about.
This might be a PostgreSQL problem. I think that I can prove that what happens is this (I will try to explain it):
Sometimes, not every time, the first unused item that is rechecked appears as the closest item on the next rechecked comparison.
Look at the two examples below. At the first run I return only the closest road to each house. The problem is both at row 2 (t_gid=2) and row 6 (t_gid = 6). note the r_gid values on the problematic rows
Then at the second run I ask for the 2 closest roads to each house.
Then the r_gid values from the problematic rows in run1 shows as the second closest road in run 2 for the previous house.
RUN 1:
SELECT t.navn,t.gid t_gid, r.gid r_gid, st_distance(r.geom, t.geom) FROM (select * from turisthytte) t CROSS JOIN LATERAL (SELECT r.gid, r.geom FROM roads r ORDER BY t.geom <->r.geom LIMIT 1) r limit 10;
navn | t_gid | r_gid | st_distance
----------------------------------+-------+--------+------------------
Gramstad | 1 | 988075 | 90.2225986190226
Sidhusgohpi | 2 | 986169 | 1379886.0666003
G rden Li | 3 | 942867 | 968.288199312332
Haukenest rnet | 4 | 76588 | 404.810783875252
Hannevoldhytta | 5 | 759879 | 246.152262737727
Deatnomuotki | 6 | 759880 | 1221715.24537982
Fogdehytta | 7 | 76347 | 136082.340278048
Mule Varde | 8 | 76348 | 135425.235637218
Fjellvassbu | 9 | 941387 | 305599.118262984
Krokbua | 10 | 423774 | 198.915851325086
RUN 2:
SELECT t.navn,t.gid t_gid, r.gid r_gid, st_distance(r.geom, t.geom) FROM (select * from turisthytte) t CROSS JOIN LATERAL (SELECT r.gid, r.geom FROM roads r ORDER BY t.geom <->r.geom LIMIT 2) r limit 12;
navn | t_gid | r_gid | st_distance
----------------------------------+-------+---------+------------------
Gramstad | 1 | 988075 | 90.2225986190226
Gramstad | 1 | 986169 | 128.356279805393
Sidhusgohpi | 2 | 982379 | 1379790.96016902
Sidhusgohpi | 2 | 1724598 | 27839.424840718
G rden Li | 3 | 942867 | 968.288199312332
G rden Li | 3 | 941387 | 1083.96363188574
Haukenest rnet | 4 | 76588 | 404.810783875252
Haukenest rnet | 4 | 76347 | 541.305515942029
Hannevoldhytta | 5 | 759879 | 246.152262737727
Hannevoldhytta | 5 | 759880 | 292.257309134795
Deatnomuotki | 6 | 759871 | 1221757.34981649
Deatnomuotki | 6 | 760212 | 1222588.13809164
Is PostgreSQL to blame?
/Nicklas
On Sun, 2015-05-24 at 07:02 +0200, Nicklas Av n wrote:
> Then i gives the right answer.
>
> /Nicklas
>
>
> Sent from my Cat phone.
>
> Den 24 maj 2015 01:57 skrev Paul Ramsey <pramsey at cleverelephant.ca>:
> >
> > What happens when you run the query using the suspect house as a literal instead of doing the fun lateral query?
> >
> > P.
> >
> >
> > --
> > http://postgis.net
> > http://cleverelephant.ca
> >
> >
> > On May 23, 2015 at 1:36:55 PM, Nicklas Av n (nicklas.aven at jordogskog.no) wrote:
> > > Since I cannot reach trac I post it here.
> > >
> > > I have been playing some with latest trunk and latest postgresql.
> > >
> > > Since I cannot reach trac I cannot check that it is absolute latest
> > > trunk but I have r13544
> > >
> > > PostgreSQL I pulled from git yesterday.
> > >
> > > I have 2 data sets. one with roads and one with houses.
> > >
> > > Attached is data that reproduces the problem.
> > >
> > > table r is the roads and table t is the houses.
> > >
> > >
> > > If I then run this query:
> > >
> > > SELECT t.navn, r.gid, st_distance(r.geom, t.geom)
> > > FROM t CROSS JOIN LATERAL
> > > (SELECT r.gid, r.geom FROM r ORDER BY t.geom <->r.geom LIMIT 2) r;
> > >
> > > I get this result:
> > > navn | gid | st_distance
> > > ----------------------------------+---------+------------------
> > > Gramstad | 988075 | 90.2225986190226
> > > Gramstad | 986169 | 128.356279805393
> > > Sidhusgohpi | 982379 | 1379790.96016902
> > > Sidhusgohpi | 1724598 | 27839.424840718
> > >
> > >
> > > What is totally wrong here is the third row. That road is 1379 km away.
> > > The last row gives the right answer. The closest road to that house is
> > > 27 km away.
> > >
> > > With the full data sets I can play with the limit and see how roads from
> > > the first house in some way is following to the second house. road nr
> > > 982379 is just 215 meters away from house nr 1. So something is not
> > > letting go.
> > >
> > >
> > > Thanks
> > >
> > > Nicklas Av n
> > >
> > >
> > >
> > > _______________________________________________
> > > postgis-devel mailing list
> > > postgis-devel at lists.osgeo.org
> > > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
> >
> >
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
_______________________________________________
postgis-devel mailing list
postgis-devel at lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
More information about the postgis-devel
mailing list