[postgis-devel] Distance Madness

Wed Dec 2 10:34:50 PST 2009

This sounds a bit like the "a-star with heuristics" algorithm in  
pgrouting.

/p-o

2 dec 2009 kl. 07.37 skrev "Nicklas Avén" <nicklas.aven at jordogskog.no>:

> Yes, I see what you mean. I thought about it too.
>
> But what about mixing all ideas in one.
> I think I maybe was struck by an idea about it this morning, worth  
> considering.
> What about giving every parent 4 children. And then in the optimal  
> node have one child with maxX, one with maxY, one minX and one minY.  
> Probably it will be the same child that in reality has two or more  
> positions, but then identify that it is the same and force the  
> fourth child to that place, so the place for , say maxX is hold by a  
> child with no internal max or min value (good enough)
>
> Then to get a overall direction between the geometries, compare the  
> center of the boundingboxes and decide if the direction falls under  
> nort, south, west or east.
> So for two geometries that is west and east of each other we start  
> deepdiving in the west respectively east corner of the nodes all the  
> way domn to the leaf and get a first fairly good mindistance very  
> fast. Then we can do the rest with a better min distance to sort  
> away the ones too far away for considering.
>
> Does it make any sense?
>
> I'm in a hurrry so maybe not to structured.
>
> /Nicklas
>
> 2009-12-01 Paul Ramsey wrote:
>
> I'm thinking that the worst-case behavior of the tree traversal could
> >also be a lot worse than the worse case behavior of the sort solution
> >(the recursion seems to provide a depth-first search, and if the  
> first
> >stab is fairly far from the closest distance, it could take a while  
> to
> >prune back the tree towards the optimum). Getting the most out of the
> >tree is almost certainly going to require caching.
> >
> >P.
> >
> >On Tue, Dec 1, 2009 at 11:12 AM, Nicklas Avén
> > wrote:
> >> Yes, I think you are right that my mistake was to use malloc.
> >> /Nicklas
> >>
> >> 2009-12-01 Paul Ramsey wrote:
> >>
> >> Nick,
> >>>
> >>>You asked if maybe you did something naive and the answer is  
> "possibly
> >>>yes". For example, if you did your allocations using malloc() you  
> got
> >>>a much more expensive heap allocation than if you used palloc() (or
> >>>lwalloc() in the context of a pgsql build).
> >>>
> >>>But anyhow, yes, knowing the number of edges before-hand, and  
> knowing
> >>>our tree building scheme, we could easily allocate the whole  
> segment
> >>>in one go at the start (~1.5 * npoints * sizeof(node)).
> >>>
> >>>For 2.0, I've been thinking about the way to make this stuff  
> "real",
> >>>and I'm pretty sure the answer is to stick an extra handle on
> >>>POINTARRAY for holding these optional structures. Then, an
> >>>lwgeom_index(LWGEOM *g) function would walk into an LWGEOM,  
> ensure all
> >>>the *bbox pointers are filled in, and build the trees on all the
> >>>POINTARRAYs. Then the whole thing could be neatly cached, if
> >>>necessary, etc.
> >>>
> >>>P.
> >>>
> >>>On Tue, Dec 1, 2009 at 10:16 AM, Nicklas Avén
> >>> wrote:
> >>>> aul
> >>>> If it is as I suspect that the number of memory allocations is  
> the most
> >>>> costly part of this, wouldn't it be possible to allocate all of  
> it once
> >>>> and
> >>>> for all. If you allocate one list of nodes for each level, the  
> whole
> >>>> structure of the tree is given. You know where the children and  
> parents
> >>>> is
> >>>> related to the start of the "level lists" the only thing  
> missing is the
> >>>> min
> >>>> and max x any y for each level.
> >>>>
> >>>> /Nicklas
> >>>>
> >>>> 2009-12-01 Nicklas Avén wrote:
> >>>>
> >>>> :-) >
> >>>>
> >>>>
> >>>>>
> >>>> I think I see what you are doing
> >>>>>
> >>>> Had to take another look...
> >>>>>
> >>>>
> >>>>>
> >>>> The comarasion part is the same as I used for subgeometries and  
> as you
> >>>> say
> >>>> (I saw the blog too) the natural ordering of the edges should  
> make the
> >>>> tree
> >>>> very effective.
> >>>>>
> >>>>
> >>>>>
> >>>> But I think you have to cache the tree to beat my calculation :-)
> >>>>>
> >>>> That belive comes from some trying I did when writing the code,  
> before I
> >>>> had
> >>>> found the qsort function. Then I build a tree as described in  
> that
> >>>> C-bible I
> >>>> can't remember the name, just to get the measure-values  
> ordered. I think
> >>>> it
> >>>> was the number of memory-allocations that showed to be very  
> expensive. I
> >>>> might have done som elementary mistake but when I tried that  
> tree with
> >>>> one
> >>>> node per vertex on the largest polygon of Alaska (about 70000  
> vertexes)
> >>>> it
> >>>> used almost 7 seconds just to build the tree and read it  
> ordered. The
> >>>> whole
> >>>> calculation for distance between the largest polygons in Alaska  
> and Texas
> >>>> now uses less than 100 ms. But it is quite likely that I had  
> done som
> >>>> mistake in my treebuilding.
> >>>>>
> >>>> (That calculation Alaska Texas is extremly fast because Texas  
> has that
> >>>> sharp
> >>>> edge aginst Alaska which is easy for my algoritm to catch.)
> >>>>>
> >>>>
> >>>>>
> >>>> /Nicklas
> >>>>>
> >>>>>
> >>>>
> >>>>>
> >>>>
> >>>>>
> >>>>> 2009-12-01 Nicklas Avén wrote:
> >>>>>
> >>>>> >
> >>>>>
> >>>> wow, I think postgis will be hard to beat in the long run :-)
> >>>>> >
> >>>>
> >>>>> >
> >>>> I'm not used to study those tree things, but it will be very  
> interesting
> >>>> to
> >>>> follow this and try to understand it.
> >>>>> >
> >>>> Will it work on overlapping bboxes too?
> >>>>> >
> >>>>
> >>>>> >
> >>>> One thing I think is nice about "my" way of doing it, is that  
> there is no
> >>>> indexes included. The gain is that it can be nested in subqueries
> >>>> and CTE's
> >>>> without loosing the speed. But I haven't really understood when  
> that
> >>>> problem
> >>>> occur of not using the indexes.
> >>>>> >
> >>>>
> >>>>> >
> >>>> I have been thinking some in 3D terms too, totally without  
> knowledge
> >>>> about
> >>>> reality. But it shouldn't be too impossible to do the same  
> tricks in 3D
> >>>> and
> >>>> calculating point to surface instead of point to segment. But  
> that would
> >>>> need a 3D type. I don't know how that use to be built, but I have
> >>>> imagined a
> >>>> list of 3D points and a related list with pointidentifiers in  
> groups of
> >>>> three, defining triangels. Then it would be quite fast to do  
> like my
> >>>> distance trick on the points and finding the surfaces to  
> calculate
> >>>> distances
> >>>> to with some index on the pointidentifiers list...
> >>>>> >
> >>>>
> >>>>> >
> >>>> /Nicklas
> >>>>> >
> >>>>
> >>>>> >
> >>>>
> >>>>> >
> >>>>> >
> >>>>> > 2009-12-01 Paul Ramsey wrote:
> >>>>> >
> >>>>> > Niklas,
> >>>>> > >
> >>>>> > >Your cunning sorting trick got into my head and I thought  
> and thought
> >>>>> > >on it and ended up borrowing the magic tree idea from  
> Mark's prepared
> >>>>> > >p-i-p code, and added another dimension to that, so the  
> trees are
> >>>>> > >actually in 2d space, and then starting thinking about how,  
> given two
> >>>>> > >of these trees you could quickly find things like  
> intersections or
> >>>>> > >even (*gasp*) distances...
> >>>>> > >
> >>>>> > >geog=# select distance(buffer('POINT(3 3)', 2),buffer('POINT 
> (0 0)',
> >>>>> > > 2));
> >>>>> > > distance
> >>>>> > >-------------------
> >>>>> > > 0.242640687119285
> >>>>> > >(1 row)
> >>>>> > >
> >>>>> > >geog=# select npoints(buffer('POINT(3 3)', 2));
> >>>>> > > npoints
> >>>>> > >---------
> >>>>> > > 33
> >>>>> > >(1 row)
> >>>>> > >
> >>>>> > >geog=# select 33*33;
> >>>>> > > ?column?
> >>>>> > >----------
> >>>>> > > 1089
> >>>>> > >(1 row)
> >>>>> > >
> >>>>> > >These two circles have 33 points each, so the brute force  
> method
> >>>>> > >requires 1089 distance calculations to come up with an  
> answer. The
> >>>>> > >magic trees took 29. It's also possible to *cache* the  
> trees and
> >>>>> > >re-use them, both for distance calculations and  
> intersections and
> >>>>> > >containments.
> >>>>> > >
> >>>>> > >Very exciting stuffs!
> >>>>> > >
> >>>>> > >However, we're almost to 1.5 and adding all the features  
> you already
> >>>>> > >have in your code (maximums, points and lines of shortness,  
> etc)
> >>>>> > > would
> >>>>> > >take way too long. And I'm not sure my code is actually  
> faster. Just
> >>>>> > >that it's much faster than the brute force ways.
> >>>>> > >
> >>>>> > >It's all in lwtree.c in liblwgeom if you want to look.
> >>>>> > >
> >>>>> > >P.
> >>>>> > >_______________________________________________
> >>>>> > >postgis-devel mailing list
> >>>>> > >postgis-devel at postgis.refractions.net
> >>>>> > >postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>>> > >
> >>>>> > >
> >>>> _______________________________________________
> >>>> postgis-devel mailing list
> >>>> postgis-devel at postgis.refractions.net
> >>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>>
> >>>>
> >>>_______________________________________________
> >>>postgis-devel mailing list
> >>>postgis-devel at postgis.refractions.net
> >>>http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>
> >>>
> >> _______________________________________________
> >> postgis-devel mailing list
> >> postgis-devel at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>
> >>
> >_______________________________________________
> >postgis-devel mailing list
> >postgis-devel at postgis.refractions.net
> >http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >
> >
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel