[postgis-devel] Some questions about knn gist

Nicklas Avén nicklas.aven at jordogskog.no
Sun Jan 9 03:40:01 PST 2011


Hallo

This knn gist thing seems too fantastic to not ask some questions about.

As I have understand it knn gist can return points ordered by distance,
directly from an index scan. Is that right?

As long as we are talking about points the result is not lossy and
doesn't need recheck, is that right?

Can it handle the bounding boxes too, or will we have to work with
representative points like the middle of the bounding box or something
like that?

Independent from the two latter questions above, we will have to do a
recheck for all other geometries than eventually points, right?

How will the recheck work?
Is there or will there be some feature that makes it possible for a
function to stop the query. I mean today the index handles over say 1000
geometries that meets some requirement like geomA && geomB 
Then we process all those 1000 geometries and comes up with an answer.
But knn gist will instead give us a list where the geometries is ordered
and in the beginning more likely to meet our requirements (to be the
closest) and further down less likely to be the one we are searching. 
Then we will need a way to say "ok, now we know we havwe found our
candidate, stop searching". Is that possible. If not it will only be
usable in cases that don't need recheck.

If it is possible to stop the query, I have a whole bunch of new
questions about how that can happen, but maybe I could just get some
link that explains things.

But from the perspective of the distance functions I think some of the
ideas from 2009 can be reused about ordering the bounding boxes inside a
collection before doing the distance calculation, discussed here:
http://trac.osgeo.org/postgis/wiki/NewDistCalcSubGeom

What we will need in our distance functions is a mechanism that tells
when the exact distance calculated by the distance function have to be
shorter than the distance to the next geometry in the ordered list from
the index. 

I see that will be a little more tricky and costly if the knn index can
not give us the geometries ordered by the min distance between the
bounding boxes. But we can get a lot of gain from it anyway, if that is
the only problem.

Hmm, this became unordered.

The main question is.
What have to be done to gain from this fantastic new feature as much as
possible. 

/Nicklas




More information about the postgis-devel mailing list