[postgis-devel] Wrong answer from new knn with recheck

Mark Cave-Ayland mark.cave-ayland at ilande.co.uk
Sun May 24 02:47:56 PDT 2015


On 24/05/15 09:29, Nicklas Avén wrote:

> The same thing happens for more houses in the table. So it is not that
> particular house it is about.
> 
> This might be a PostgreSQL problem. I think that I can prove that what
> happens is this (I will try to explain it):
> 
> Sometimes, not every time, the first unused item that is rechecked
> appears as the closest item on the next rechecked comparison.
> 
> Look at the two examples below. At the first run I return only the
> closest road to each house. The problem is both at row 2 (t_gid=2) and
> row 6 (t_gid = 6). note the r_gid values on the problematic rows
> 
> Then at the second run I ask for the 2 closest roads to each house.
> Then the r_gid values from the problematic rows in run1 shows as the
> second closest road in run 2 for the previous house. 
> 
> RUN 1: 
> 
> SELECT t.navn,t.gid t_gid, r.gid r_gid, st_distance(r.geom, t.geom) 
> FROM (select * from turisthytte) t CROSS JOIN LATERAL
> (SELECT r.gid, r.geom FROM roads r ORDER BY t.geom <->r.geom LIMIT 1) r
> limit 10;
> 
>                navn               | t_gid | r_gid  |   st_distance    
> ----------------------------------+-------+--------+------------------
>  Gramstad                         |     1 | 988075 | 90.2225986190226
>  Sidhusgohpi                      |     2 | 986169 |  1379886.0666003
>  Gården Li                        |     3 | 942867 | 968.288199312332
>  Haukenestårnet                   |     4 |  76588 | 404.810783875252
>  Hannevoldhytta                   |     5 | 759879 | 246.152262737727
>  Deatnomuotki                     |     6 | 759880 | 1221715.24537982
>  Fogdehytta                       |     7 |  76347 | 136082.340278048
>  Mule Varde                       |     8 |  76348 | 135425.235637218
>  Fjellvassbu                      |     9 | 941387 | 305599.118262984
>  Krokbua                          |    10 | 423774 | 198.915851325086
> 
> 
> RUN 2:
> 
> SELECT t.navn,t.gid t_gid, r.gid r_gid, st_distance(r.geom, t.geom) 
> FROM (select * from turisthytte) t CROSS JOIN LATERAL
> (SELECT r.gid, r.geom FROM roads r ORDER BY t.geom <->r.geom LIMIT 2) r
> limit 12;
> 
> 
>                navn               | t_gid |  r_gid  |   st_distance    
> ----------------------------------+-------+---------+------------------
>  Gramstad                         |     1 |  988075 | 90.2225986190226
>  Gramstad                         |     1 |  986169 | 128.356279805393
>  Sidhusgohpi                      |     2 |  982379 | 1379790.96016902
>  Sidhusgohpi                      |     2 | 1724598 |  27839.424840718
>  Gården Li                        |     3 |  942867 | 968.288199312332
>  Gården Li                        |     3 |  941387 | 1083.96363188574
>  Haukenestårnet                   |     4 |   76588 | 404.810783875252
>  Haukenestårnet                   |     4 |   76347 | 541.305515942029
>  Hannevoldhytta                   |     5 |  759879 | 246.152262737727
>  Hannevoldhytta                   |     5 |  759880 | 292.257309134795
>  Deatnomuotki                     |     6 |  759871 | 1221757.34981649
>  Deatnomuotki                     |     6 |  760212 | 1222588.13809164
> 
> 
> Is PostgreSQL to blame?

Before posting on -hackers, it's always good to find the smallest test
case you can to reproduce the problem. As a starting point, try changing
the join type to confirm if it's just the LATERAL or something else that
causes the issue, and also test your case using the native PostgreSQL
KNN operators rather than the PostGIS ones (PG developers don't tend to
install PostGIS).

FWIW several times in the past I thought I'd found a bug in the database
and by going through this process realised that it was either my own
mistake or a bug in the PostGIS operator implementations instead...


HTH,

Mark.




More information about the postgis-devel mailing list