[postgis-devel] Why doesn't knn work on table1.geom <#>table2.geom

Sun Oct 16 15:25:56 PDT 2011

> 1)
> Both PostGIS and PostgreSQL seems to ignore the index if not 
> one of the arguments in the distance functions is defined directly
> 
> So:
> 
> 
> SELECT STPoint(1,1)<->a.p1 FROM points a ORDER BY 
> a.p1<->ST_Point(1,1) limit 1;
> 
> usues index but not:
> 
> SELECT a.p1<->b.p1 FROM points a, (SELECT ST_Point(1,1) AS 
> p1) b ORDER BY a.p1<->b.p1 LIMIT 1;
> 
> and in PostgreSQL native example:
> 
> 
> SELECT POINT(1,1)<->a.p2 FROM points a ORDER BY 
> a.p2<->POINT(1,1) limit 1;
> 
> usues index but not:
> 
> SELECT a.p2<->b.p2 FROM points a, (SELECT POINT(1,1) AS p2) b ORDER BY
> a.p2<->b.p2 LIMIT 1;
> 
> 
> 
> That behavior limits the use of the knn-index quite a lot in 
> my opinion.
> I have not been following any discussions on postgresql 
> lists. Have anyone heard that this is something that have to 
> be or is it fixable?
> 
I haven't seen anything about in discussions
except all the examples demonstrated always have a constant.

I just assumed its a known limitation.

Yes I agree it limits its utility quite a bit except for user input points,
which agreeably is still a useful use case.

I'm more looking forward to the new work that Paul mentioned and been trying
to get some of my clients to fund that.  Unfortunately all dot coms seem
pretty strapped for cash.
So no convincing them unless there is a demonstrated savings such as:

If they can shave off x% for cloud hosting per month because they can get by
with cheaper less powerful servers
and still get the same performance or better than what they have now.

Then that is more of a compelling reason.  Also there is also one of the
general opinions
that if someone does not fund such things someone else will.

> 
> 3)
> When comparing the speed of the raw distance calculations 
> PostgreSQL is twice as fast as PostGIS no matter i f I use 
> <-> or ST_Distance in PostGIS. 
> 
> Using the table created above:
> 
> PostgreSQL type:
> SELECT SUM(a.p2<->b.p2) FROM points a, points b WHERE b.gid = 
> 1; gives answer 22077251381.5178 in about 350 ms
> 
> PostGIS type <-> operator:
> SELECT SUM(a.p1<->b.p1) FROM points a, points b whereWHERE 
> b.gid = 1; gives answer 22077252720.6913 in about 850 ms
> 
> PostGIS type ST_Distance function: 
> SELECT SUM(ST_Distance(a.p1,b.p1)) FROM points a, points b 
> WHERE b.gid = 1; gives answer 22077251381.5178 (same as 
> PostgreSQL) in about 830 ms
> 
> 
> It suprises me a little since a thought the binary storage 
> would make the access faster. Could it be that we have some 
> bottleneck in our deserialization that could be tuned, or is 
> it a natural behavior since PostGIS type is much more complex 
> and nor just two coordinates?
> 

Nicklas -- have you tried your tests on the <#>.  That one doesn't do any
centroid calculations
so should give the same answer for point and point but different for others.
I'm not sure if we are always doing a centroid calc even for points,
but if we are then we might see a difference in performance speed between

<-> and <#>

Thanks,
Regina