[postgis-devel] Why doesn't knn work on table1.geom <#>table2.geom

Nicklas Avén nicklas.aven at jordogskog.no
Sun Oct 16 13:59:20 PDT 2011


Hallo

I have done some more testings and found som interesting things.

Headlines
1) Postgresql natively behaves exactly lie Pauls implementation of knn.
The index seems to need one of the geometries described directly in the
operator and not referenced from a table or subquery.

2) PostGIS <-> operator doesn't give the same answer as st_distance and
postgresql <-> operator does. The difference is probably from finding
the centroid of the bbox in PostGIS <-> case.

3) Postgresql <-> functin without indexes is more than twice as fast as
PostGIS st_distance and <-> function/oprerator. I have only compared
points.

What I have done is making a test_table like this of 1000000 points:

CREATE TABLE points AS
SELECT ST_Point(x, y) AS p1 , POINT(x,y) AS p2 
FROM 
(SELECT 50000*random() x, 50000*random() y FROM
generate_series(1,1000000) s) koord;

alter table points add column gid serial primary key;

I also added indexes on both p1 and p2 columns and ran vaccum analyze.


Details

1)
Both PostGIS and PostgreSQL seems to ignore the index if not one of the
arguments in the distance functions is defined directly

So:


SELECT STPoint(1,1)<->a.p1 FROM points a ORDER BY a.p1<->ST_Point(1,1)
limit 1;

usues index but not:

SELECT a.p1<->b.p1 FROM points a, (SELECT ST_Point(1,1) AS p1) b ORDER
BY a.p1<->b.p1 LIMIT 1;

and in PostgreSQL native example:


SELECT POINT(1,1)<->a.p2 FROM points a ORDER BY a.p2<->POINT(1,1) limit
1;

usues index but not:

SELECT a.p2<->b.p2 FROM points a, (SELECT POINT(1,1) AS p2) b ORDER BY
a.p2<->b.p2 LIMIT 1;



That behavior limits the use of the knn-index quite a lot in my opinion.
I have not been following any discussions on postgresql lists. Have
anyone heard that this is something that have to be or is it fixable?



2)
The biggest differences I found by running:

SELECT a.gid, b.gid, ABS((a.p1<->b.p1)-(a.p2<->b.p2)) FROM points a,
points b WHERE b.gid =1 ORDER BY ABS((a.p1<->b.p1)-(a.p2<->b.p2)) DESC
limit 1;

In my case the biggest difference between PostGIS and Postgresql <->
operators was about 0.008 units. When I did the same tst between
PostgreSQl <-> operator and PostGIS ST_Distance the biggest difference
was about 1e-11

I guess the reason is the centroid calculation in PostGIS <->
implemantattion.
Does the difference matter?

3)
When comparing the speed of the raw distance calculations PostgreSQL is
twice as fast as PostGIS no matter i f I use <-> or ST_Distance in
PostGIS. 

Using the table created above:

PostgreSQL type:
SELECT SUM(a.p2<->b.p2) FROM points a, points b WHERE b.gid = 1;
gives answer 22077251381.5178 in about 350 ms

PostGIS type <-> operator:
SELECT SUM(a.p1<->b.p1) FROM points a, points b whereWHERE b.gid = 1;
gives answer 22077252720.6913 in about 850 ms

PostGIS type ST_Distance function: 
SELECT SUM(ST_Distance(a.p1,b.p1)) FROM points a, points b WHERE b.gid =
1;
gives answer 22077251381.5178 (same as PostgreSQL) in about 830 ms


It suprises me a little since a thought the binary storage would make
the access faster. Could it be that we have some bottleneck in our
deserialization that could be tuned, or is it a natural behavior since
PostGIS type is much more complex and nor just two coordinates?



Thanks

Nicklas




On Sun, 2011-10-02 at 08:49 -0700, Paul Ramsey wrote:
> You'll have to trace but probably it plans out as impossible (ie, the
> knn code can't handle it, so it goes down a different path). Unless
> you see it working on internal native pgsql points, in which case my
> implementation is faulty.
> 
> P.
> 
> On Sun, Oct 2, 2011 at 3:58 AM, Nicklas Avén
<nicklas.aven at jordogskog.no> wrote:
> > I have been playing some with the new knn-implementation.
> >
> > If I write the query as Paul illustrates like
> >
> >
> >
> > SELECT id, table1.geom<->ST_SetSrid('POINT(1,2)'::geometry, 3021) as
> > distance
> > FROM table1
> > ORDER BY distance
> > LIMIT 10;
> >
> > Then the index kicks in and works as expected.
> >
> > But if I take the second geometry, the point directlu from a table
it
> > doesn't work like:
> >
> > SELECT id, table1.geom<->table2.geom as distance
> > FROM table1, (SELECT geom FROM table1 where gid=1;) AS table2
> > ORDER BY distance
> > LIMIT 10;
> >
> > /Nicklas
> >
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >
> 





More information about the postgis-devel mailing list