[postgis-tickets] [PostGIS] #3127: KNN geog distance doesn't match ST_Distance spheroid
PostGIS
trac at osgeo.org
Sat May 30 18:08:59 PDT 2015
#3127: KNN geog distance doesn't match ST_Distance spheroid
----------------------+---------------------------
Reporter: robe | Owner: robe
Type: defect | Status: reopened
Priority: medium | Milestone: PostGIS 2.2.0
Component: postgis | Version: trunk
Resolution: | Keywords:
----------------------+---------------------------
Old description:
> I'm not sure if this is expected or not, but I was trying to write up
> regress tests for KNN geog and found inconsistencies between what <->
> returns and what ST_Distance geography returns which changes the ranking.
> It seems the <-> is returning the sphere distance instead of spheroid
> (e.g. I get same answer if I do
>
> {{{
>
> ST_Distance(geog1,geog2, false) )
>
> }}}
>
> If that is the case, and by design, we just need to make a note of it in
> the docs, so people don't falsely report its bugged.
>
> e.g.:
>
> {{{
> CREATE TABLE knn_recheck_geog(gid serial primary key, geog geography);
> INSERT INTO knn_recheck_geog(gid,geog)
> SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid,
> ST_Point(x*1.11,y*0.95)::geography As geog
> FROM generate_series(-100,100, 1) AS x CROSS JOIN
> generate_series(-90,90,1) As y;
>
> SELECT a.gid, b.gid As match, RANK() OVER(PARTITION BY a.gid ORDER BY
> ST_Distance(a.geog, b.geog) ) As true_rn, b.rn As knn_rn,
> ST_Distance(a.geog, b.geog) As dist_spheroid, ST_Distance(a.geog, b.geog,
> false) As dist_sphere,
> a.geog <-> b.geog As knn_dist
> FROM knn_recheck_geog As a
> LEFT JOIN
> LATERAL ( SELECT gid, geog, RANK() OVER(ORDER BY a.geog
> <-> g.geog) As rn
> FROM knn_recheck_geog As g WHERE a.gid <> g.gid
> ORDER BY 'POINT(-105.45 3.8)'::Geography <-> g.geog LIMIT 5) As b ON true
> WHERE a.gid = 1000
> ORDER BY a.gid, b.rn;
>
> }}}
>
> yields:
>
> {{{
> gid | match | true_rn | knn_rn | dist_spheroid | dist_sphere |
> knn_dist
> ------+-------+---------+--------+------------------+------------------+------------------
> 1000 | 1001 | 2 | 1 | 105051.447591582 | 105635.3257479 |
> 105635.3257479
> 1000 | 999 | 1 | 2 | 105049.134514627 | 105635.3257479 |
> 105635.3257479
> 1000 | 819 | 3 | 3 | 123294.778176043 | 123155.172887459 |
> 123155.172887459
> 1000 | 1181 | 4 | 4 | 123294.778176044 | 123155.17288746 |
> 123155.17288746
> 1000 | 820 | 5 | 5 | 161924.039877904 | 162197.178690157 |
> 162197.178690157
> (5 rows)
>
> }}}
New description:
I'm not sure if this is expected or not, but I was trying to write up
regress tests for KNN geog and found inconsistencies between what <->
returns and what ST_Distance geography returns which changes the ranking.
It seems the <-> is returning the sphere distance instead of spheroid
(e.g. I get same answer if I do
{{{
ST_Distance(geog1,geog2, false) )
}}}
If that is the case, and by design, we just need to make a note of it in
the docs, so people don't falsely report its bugged.
e.g.:
{{{
CREATE TABLE knn_recheck_geog(gid serial primary key, geog geography);
INSERT INTO knn_recheck_geog(gid,geog)
SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid,
ST_Point(x*1.11,y*0.95)::geography As geog
FROM generate_series(-100,100, 1) AS x CROSS JOIN
generate_series(-90,90,1) As y;
SELECT a.gid, b.gid As match, RANK() OVER(PARTITION BY a.gid ORDER BY
ST_Distance(a.geog, b.geog) ) As true_rn, b.rn As knn_rn,
ST_Distance(a.geog, b.geog) As dist_spheroid, ST_Distance(a.geog, b.geog,
false) As dist_sphere,
a.geog <-> b.geog As knn_dist
FROM knn_recheck_geog As a
LEFT JOIN
LATERAL ( SELECT gid, geog, RANK() OVER(ORDER BY a.geog
<-> g.geog) As rn
FROM knn_recheck_geog As g WHERE a.gid <> g.gid
ORDER BY 'POINT(-105.45 3.8)'::Geography <-> g.geog LIMIT 5) As b ON true
WHERE a.gid = 1000
ORDER BY a.gid, b.rn;
}}}
yields:
{{{
gid | match | true_rn | knn_rn | dist_spheroid | dist_sphere |
knn_dist
------+-------+---------+--------+------------------+------------------+------------------
1000 | 1001 | 2 | 1 | 105051.447591582 | 105635.3257479 |
105635.3257479
1000 | 999 | 1 | 2 | 105049.134514627 | 105635.3257479 |
105635.3257479
1000 | 819 | 3 | 3 | 123294.778176043 | 123155.172887459 |
123155.172887459
1000 | 1181 | 4 | 4 | 123294.778176044 | 123155.17288746 |
123155.17288746
1000 | 820 | 5 | 5 | 161924.039877904 | 162197.178690157 |
162197.178690157
(5 rows)
}}}
Hmm and I should add -- I can't get this to use an index (or a simple
query with a constant anymore so guess I may need to reopen that other
ticket.
--
Comment (by robe):
Okay how far off are these supposed to be, I'm still getting cases where
the distances don't line up with or without an index. Case in point:
{{{
CREATE TABLE knn_recheck_geog_small_2 (
gid integer,
geog geography
);
INSERT INTO knn_recheck_geog_small_2 VALUES (818,
'0101000020E6100000A4703D0AD7A35AC0CDCCCCCCCCCC0640');
INSERT INTO knn_recheck_geog_small_2 VALUES (819,
'0101000020E6100000A4703D0AD7A35AC06666666666660E40');
INSERT INTO knn_recheck_geog_small_2 VALUES (820,
'0101000020E6100000A4703D0AD7A35AC00000000000001340');
INSERT INTO knn_recheck_geog_small_2 VALUES (998,
'0101000020E6100000CDCCCCCCCC5C5AC0666666666666FE3F');
INSERT INTO knn_recheck_geog_small_2 VALUES (999,
'0101000020E6100000CDCCCCCCCC5C5AC0CDCCCCCCCCCC0640');
INSERT INTO knn_recheck_geog_small_2 VALUES (1000,
'0101000020E6100000CDCCCCCCCC5C5AC06666666666660E40');
INSERT INTO knn_recheck_geog_small_2 VALUES (1001,
'0101000020E6100000CDCCCCCCCC5C5AC00000000000001340');
INSERT INTO knn_recheck_geog_small_2 VALUES (1180,
'0101000020E6100000F6285C8FC2155AC0CDCCCCCCCCCC0640');
INSERT INTO knn_recheck_geog_small_2 VALUES (1181,
'0101000020E6100000F6285C8FC2155AC06666666666660E40');
INSERT INTO knn_recheck_geog_small_2 VALUES (1182,
'0101000020E6100000F6285C8FC2155AC00000000000001340');
}}}
Now without even creating an index, these distances don't match up:
{{{
SELECT a.gid, ARRAY(SELECT (gid,a.geog <-> g.geog)
FROM knn_recheck_geog_small_2 As g WHERE a.gid <>
g.gid ORDER BY a.geog <-> g.geog, g.gid LIMIT 2) As knn ,
ARRAY(SELECT (gid, ST_Distance(a.geog, g.geog ) )
FROM knn_recheck_geog_small_2 As g WHERE a.gid <>
g.gid ORDER BY ST_Distance(a.geog, g.geog), g.gid LIMIT 2) As dist
FROM knn_recheck_geog_small_2 As a
WHERE a.gid = 1000
ORDER BY a.gid;
}}}
--gives
{{{
gid | knn | dist
------+--------------------------------------------------+-----------------------------
1000 | {"(1001,105635.3257479)","(999,105635.3257479)"} |
{"(999,105049.134514627)",(1001,105051.447591582)"}
}}}
That could be expected behavior I guess
--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/3127#comment:3>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list