[postgis-tickets] [PostGIS] #3127: KNN geog distance doesn't match ST_Distance spheroid

PostGIS trac at osgeo.org
Sat May 30 18:08:59 PDT 2015


#3127: KNN geog distance doesn't match ST_Distance spheroid
----------------------+---------------------------
  Reporter:  robe     |      Owner:  robe
      Type:  defect   |     Status:  reopened
  Priority:  medium   |  Milestone:  PostGIS 2.2.0
 Component:  postgis  |    Version:  trunk
Resolution:           |   Keywords:
----------------------+---------------------------

Old description:

> I'm not sure if this is expected or not, but I was trying to write up
> regress tests for KNN geog and found inconsistencies between what <->
> returns and what ST_Distance geography returns which changes the ranking.
> It seems the <-> is returning the sphere distance instead of spheroid
> (e.g. I get same answer if I do
>

> {{{
>
> ST_Distance(geog1,geog2, false) )
>

> }}}
>

> If that is the case, and by design, we just need to make a note of it in
> the docs, so people don't falsely report its bugged.
>
> e.g.:
>

> {{{
> CREATE TABLE knn_recheck_geog(gid serial primary key, geog geography);
> INSERT INTO knn_recheck_geog(gid,geog)
> SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid,
> ST_Point(x*1.11,y*0.95)::geography As geog
> FROM generate_series(-100,100, 1) AS x CROSS JOIN
> generate_series(-90,90,1) As y;
>
> SELECT a.gid, b.gid As match, RANK() OVER(PARTITION BY a.gid ORDER BY
> ST_Distance(a.geog, b.geog) ) As true_rn, b.rn  As knn_rn,
> ST_Distance(a.geog, b.geog) As dist_spheroid, ST_Distance(a.geog, b.geog,
> false) As dist_sphere,
> a.geog <-> b.geog As knn_dist
> FROM knn_recheck_geog As a
>         LEFT JOIN
>                 LATERAL ( SELECT  gid, geog, RANK() OVER(ORDER BY a.geog
> <-> g.geog) As rn
>                         FROM knn_recheck_geog As g WHERE a.gid <> g.gid
> ORDER BY 'POINT(-105.45 3.8)'::Geography <-> g.geog LIMIT 5) As b ON true
>         WHERE a.gid = 1000
> ORDER BY a.gid, b.rn;
>

> }}}
>
> yields:
>
> {{{
>  gid  | match | true_rn | knn_rn |  dist_spheroid   |   dist_sphere    |
> knn_dist
> ------+-------+---------+--------+------------------+------------------+------------------
>  1000 |  1001 |       2 |      1 | 105051.447591582 |   105635.3257479 |
> 105635.3257479
>  1000 |   999 |       1 |      2 | 105049.134514627 |   105635.3257479 |
> 105635.3257479
>  1000 |   819 |       3 |      3 | 123294.778176043 | 123155.172887459 |
> 123155.172887459
>  1000 |  1181 |       4 |      4 | 123294.778176044 |  123155.17288746 |
> 123155.17288746
>  1000 |   820 |       5 |      5 | 161924.039877904 | 162197.178690157 |
> 162197.178690157
> (5 rows)
>
> }}}

New description:

 I'm not sure if this is expected or not, but I was trying to write up
 regress tests for KNN geog and found inconsistencies between what <->
 returns and what ST_Distance geography returns which changes the ranking.
 It seems the <-> is returning the sphere distance instead of spheroid
 (e.g. I get same answer if I do


 {{{

 ST_Distance(geog1,geog2, false) )


 }}}


 If that is the case, and by design, we just need to make a note of it in
 the docs, so people don't falsely report its bugged.

 e.g.:


 {{{
 CREATE TABLE knn_recheck_geog(gid serial primary key, geog geography);
 INSERT INTO knn_recheck_geog(gid,geog)
 SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid,
 ST_Point(x*1.11,y*0.95)::geography As geog
 FROM generate_series(-100,100, 1) AS x CROSS JOIN
 generate_series(-90,90,1) As y;

 SELECT a.gid, b.gid As match, RANK() OVER(PARTITION BY a.gid ORDER BY
 ST_Distance(a.geog, b.geog) ) As true_rn, b.rn  As knn_rn,
 ST_Distance(a.geog, b.geog) As dist_spheroid, ST_Distance(a.geog, b.geog,
 false) As dist_sphere,
 a.geog <-> b.geog As knn_dist
 FROM knn_recheck_geog As a
         LEFT JOIN
                 LATERAL ( SELECT  gid, geog, RANK() OVER(ORDER BY a.geog
 <-> g.geog) As rn
                         FROM knn_recheck_geog As g WHERE a.gid <> g.gid
 ORDER BY 'POINT(-105.45 3.8)'::Geography <-> g.geog LIMIT 5) As b ON true
         WHERE a.gid = 1000
 ORDER BY a.gid, b.rn;


 }}}

 yields:

 {{{
  gid  | match | true_rn | knn_rn |  dist_spheroid   |   dist_sphere    |
 knn_dist
 ------+-------+---------+--------+------------------+------------------+------------------
  1000 |  1001 |       2 |      1 | 105051.447591582 |   105635.3257479 |
 105635.3257479
  1000 |   999 |       1 |      2 | 105049.134514627 |   105635.3257479 |
 105635.3257479
  1000 |   819 |       3 |      3 | 123294.778176043 | 123155.172887459 |
 123155.172887459
  1000 |  1181 |       4 |      4 | 123294.778176044 |  123155.17288746 |
 123155.17288746
  1000 |   820 |       5 |      5 | 161924.039877904 | 162197.178690157 |
 162197.178690157
 (5 rows)

 }}}


 Hmm and I should add -- I can't get this to use an index (or a simple
 query with a constant anymore so guess I may need to reopen that other
 ticket.

--

Comment (by robe):

 Okay how far off are these supposed to be, I'm still getting cases where
 the distances don't line up with or without an index.  Case in point:


 {{{
 CREATE TABLE knn_recheck_geog_small_2 (
     gid integer,
     geog geography
 );



 INSERT INTO knn_recheck_geog_small_2 VALUES (818,
 '0101000020E6100000A4703D0AD7A35AC0CDCCCCCCCCCC0640');
 INSERT INTO knn_recheck_geog_small_2 VALUES (819,
 '0101000020E6100000A4703D0AD7A35AC06666666666660E40');
 INSERT INTO knn_recheck_geog_small_2 VALUES (820,
 '0101000020E6100000A4703D0AD7A35AC00000000000001340');
 INSERT INTO knn_recheck_geog_small_2 VALUES (998,
 '0101000020E6100000CDCCCCCCCC5C5AC0666666666666FE3F');
 INSERT INTO knn_recheck_geog_small_2 VALUES (999,
 '0101000020E6100000CDCCCCCCCC5C5AC0CDCCCCCCCCCC0640');
 INSERT INTO knn_recheck_geog_small_2 VALUES (1000,
 '0101000020E6100000CDCCCCCCCC5C5AC06666666666660E40');
 INSERT INTO knn_recheck_geog_small_2 VALUES (1001,
 '0101000020E6100000CDCCCCCCCC5C5AC00000000000001340');
 INSERT INTO knn_recheck_geog_small_2 VALUES (1180,
 '0101000020E6100000F6285C8FC2155AC0CDCCCCCCCCCC0640');
 INSERT INTO knn_recheck_geog_small_2 VALUES (1181,
 '0101000020E6100000F6285C8FC2155AC06666666666660E40');
 INSERT INTO knn_recheck_geog_small_2 VALUES (1182,
 '0101000020E6100000F6285C8FC2155AC00000000000001340');

 }}}

 Now without even creating an index, these distances don't match up:


 {{{
 SELECT a.gid,  ARRAY(SELECT  (gid,a.geog <-> g.geog)
                         FROM knn_recheck_geog_small_2 As g WHERE a.gid <>
 g.gid ORDER BY a.geog <-> g.geog, g.gid LIMIT 2) As knn ,
         ARRAY(SELECT  (gid, ST_Distance(a.geog, g.geog ) )
                         FROM knn_recheck_geog_small_2 As g WHERE a.gid <>
 g.gid ORDER BY ST_Distance(a.geog, g.geog), g.gid LIMIT 2) As dist
 FROM knn_recheck_geog_small_2 As a
 WHERE a.gid = 1000
 ORDER BY a.gid;
 }}}
 --gives

 {{{
  gid  |                       knn                        |   dist
 ------+--------------------------------------------------+-----------------------------
  1000 | {"(1001,105635.3257479)","(999,105635.3257479)"} |
 {"(999,105049.134514627)",(1001,105051.447591582)"}
 }}}

 That could be expected behavior I guess

--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/3127#comment:3>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list