[postgis-devel] KNN and Semantics

Nicklas Avén nicklas.aven at jordogskog.no
Mon Sep 26 23:53:46 PDT 2011


Hallo

I have been quiet for a long time. I hope to get more involved again but
time is not enough.

About KNN, I have spent quite a lot of time trying to think about it.

There is a lot of everything that I don't know enough about, but I have
to ask if it not really is possible to do a recheck to get the "real"
answer of for example the closest lake to a recreation area.

If I understand this right, what we get is a list, ordered by distance
between bounding boxes. Bounding boxes instead of points is great news.

What we then need to do is iterate the list, send the real geometries to
the ordinary distance function and stop at once the next incoming
bounding box is more far away than the closest real distance ( or n:ths
closest real distance if limit is more than 1)

When I wrote my last comment at #701 I thought that the biggest problem
was to save values between rows. But I have understood that we have
mechanisms for that like when we are reusing the prepared geometries.

So if we could save the n last real values (we need all n of them, not
just the last) then we can determine when we have the complete list of n
closest geometries.

Maybe we have problems getting them in the right order instead of the
bbox order, but we would still get the n closest geometries.


I have followed the list and I see a lot of exciting development is
going on. 

Thanks

Nicklas


On Mon, 2011-09-26 at 15:20 -0700, Chris Hodgson wrote:
> It seems to me that this sort of query is the only real use for the 
> imperfect non-point KNN semantics, in which case, it seems that it 
> hardly matters which approach (box or centroid) is used.
> 
> In the case of very large boxes, where it is likely that many of them 
> overlap with the query point/box (equal distances of 0), the centroid at 
> least gives you a good way to sort between them. In the case of very 
> small boxes, the centroid distance would not be much different from the 
> box distance. So perhaps I would lean towards the centroids.
> 
> Chris
> 
> Paul Ramsey wrote:
> > Bear in mind the syntax at play here:
> >
> > select
> >   id,
> >   geom <-> 'SRID=4326;POINT(-90 40)' as idist,
> >   st_distance(geom,'SRID=4326;POINT(-90 40)') as d,
> >   name
> > from geonames
> > order by geom <-> 'SRID=4326;POINT(-90 40)' asc
> > limit 10;
> >
> > The KNN gets thrown into play by the ORDER BY clause, and gets show
> > down by the LIMIT.
> >
> > So the syntax to try and get the nearest thing wrapping a subquery
> > would end up being
> >
> > with index_query as
> >   (select * from geonames
> >    order by geom <-> 'SRID=4326;POINT(-90 40)' asc
> >    limit 100)
> > select * from index_query
> > order by st_distance(geom, 'SRID=4326;POINT(-90 40)') asc
> > limit 10;
> >
> > But the first limit is magic, hopefully large enough to capture all
> > the things wich truly are in the 10 closest, even given the inaccurate
> > box or centroid semantic.
> >
> > Paul
> >
> >
> > On Mon, Sep 26, 2011 at 2:59 PM, Paragon Corporation <lr at pcorp.us> wrote:
> >   
> >> Good point +++ for boxes.  You still might miss some but you are more likely
> >> to miss with centroids
> >>
> >> ________________________________
> >> From: postgis-devel-bounces at postgis.refractions.net
> >> [mailto:postgis-devel-bounces at postgis.refractions.net] On Behalf Of David
> >> William Bitner
> >> Sent: Monday, September 26, 2011 5:52 PM
> >> To: PostGIS Development Discussion
> >> Subject: Re: [postgis-devel] KNN and Semantics
> >>
> >> I don't think you actually have to deal with the magic number.
> >> Something like
> >> select * from foo where
> >> st_within(st_extent(st_knnbox(geoma,geomb,10),geoma)) order by
> >> st_distance(geoma,geomb) limit 10
> >> would still be able to use the KNN index to find the magic extent that will
> >> include all the features that are possible to be in the closest 10 as from
> >> the furthest extent of the 10 closest boxes will certainly include all of
> >> the "actual" 10 closest.
> >>
> >> On Mon, Sep 26, 2011 at 4:40 PM, Paul Ramsey <pramsey at opengeo.org> wrote:
> >>     
> >>> Trade-offs abound.
> >>> centroid-vs-centroid has the advantage of ease of understanding
> >>> box-vs-box has the advantage of being over-determined. you could at
> >>> least theoretically do an index-assisted knn pull wrapped as a
> >>> subquery inside a standard st_distance test. however, that goes back
> >>> to magic numbers again (how many items should one pull in the indexed
> >>> query) so...
> >>> There don't seem to be any great soln's here.
> >>> P.
> >>>
> >>> On Mon, Sep 26, 2011 at 2:35 PM, Paragon Corporation <lr at pcorp.us> wrote:
> >>>       
> >>>> Okay I get the issue now forgot it was for order by stuff. I guess from
> >>>> my
> >>>> perspective bounding box would be better though that's not ideal either.
> >>>> for long linestrings which is what I'm usually dealing with centroid is
> >>>> useless and for small polygons the bounding box is a better
> >>>> approximation
> >>>> than centroid
> >>>> for thumbnail check.
> >>>>
> >>>> It's ST_Distance BTW -- get with the program Paul.  Your beloved
> >>>> distance is
> >>>> gone. No more. history vamush. :)
> >>>>
> >>>> Thanks,
> >>>> Regina
> >>>>
> >>>>
> >>>>         
> >>>>> -----Original Message-----
> >>>>> From: postgis-devel-bounces at postgis.refractions.net
> >>>>> [mailto:postgis-devel-bounces at postgis.refractions.net] On
> >>>>> Behalf Of Paul Ramsey
> >>>>> Sent: Monday, September 26, 2011 4:25 PM
> >>>>> To: PostGIS Development Discussion
> >>>>> Subject: Re: [postgis-devel] KNN and Semantics
> >>>>>
> >>>>> KNNGist walks the index tree to provide ordered results.
> >>>>> Great.
> >>>>> But it's walking the tree, so the results have to be based on boxes.
> >>>>> In the case of points, the box == the point, so the ambiguity
> >>>>> collapses.
> >>>>> In the case of everything else, it does not.
> >>>>>
> >>>>> So, when someone does
> >>>>>
> >>>>> select * from mytable order by geom <-> 'polygon()'::geometry;
> >>>>>
> >>>>> what should they get back to provide the minimum surprise?
> >>>>> Because they *will* get back results that differ from
> >>>>>
> >>>>> select * from mytable order by distance(geom, 'polygon()'::geometry);
> >>>>>
> >>>>> sometimes substantially.
> >>>>>
> >>>>> P.
> >>>>>
> >>>>> On Mon, Sep 26, 2011 at 1:19 PM, Paragon Corporation
> >>>>> <lr at pcorp.us> wrote:
> >>>>>           
> >>>>>>> So, here's the deal, the KNN search works exclusively against the
> >>>>>>> index. So, it only has boxes available to make decisions (not
> >>>>>>> entirely true, it actually has the full geometry of the query key,
> >>>>>>> but not of the index keys). That means it can return an
> >>>>>>>               
> >>>>> exact answer
> >>>>>           
> >>>>>>> for point-on-point queries, but for everything else it'll be a box
> >>>>>>> approximation. So the n-nearest-boxes.
> >>>>>>>
> >>>>>>> There are lots of ways to attack the problem... we can do pure
> >>>>>>> nearest-boxes. We could also convert all the boxes to
> >>>>>>>               
> >>>>> points, and do
> >>>>>           
> >>>>>>> nearest-centroids. This might be easiest to explain,
> >>>>>>>               
> >>>>> potentially. The
> >>>>>           
> >>>>>>> trouble is, we're going to be returning an approximation for
> >>>>>>> everything except points, so the question is (in my mind) which
> >>>>>>> approximation is easiest to visualize and work with?
> >>>>>>>
> >>>>>>>               
> >>>>>> Not sure I understand your question Paul?   I don't see how nearest
> >>>>>> centroids helps much when you are talking about largish
> >>>>>>             
> >>>>> polygons.  I
> >>>>>           
> >>>>>> was thinking this would just make the ST_Expand like stuff
> >>>>>>             
> >>>>> faster? Oh
> >>>>>           
> >>>>>> perhaps I misunderstood that.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Regina
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> postgis-devel mailing list
> >>>>>> postgis-devel at postgis.refractions.net
> >>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>>>>
> >>>>>>             
> >>>>> _______________________________________________
> >>>>> postgis-devel mailing list
> >>>>> postgis-devel at postgis.refractions.net
> >>>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>>>
> >>>>>           
> >>>> _______________________________________________
> >>>> postgis-devel mailing list
> >>>> postgis-devel at postgis.refractions.net
> >>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>>
> >>>>         
> >>> _______________________________________________
> >>> postgis-devel mailing list
> >>> postgis-devel at postgis.refractions.net
> >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>       
> >>
> >> --
> >> ************************************
> >> David William Bitner
> >>
> >> _______________________________________________
> >> postgis-devel mailing list
> >> postgis-devel at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>
> >>
> >>     
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >   
> 
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> 





More information about the postgis-devel mailing list