[postgis-devel] Distance Madness

Wed Dec 2 00:09:53 PST 2009

The car trip to work gave me some more rounds of thinking. The idea have to be developed. It shouldn't matter how many childen we find optimal. The thing is to find the best way of storing which one of the children that represents maxX, maxY and so on, so we can do the right diving.It shouldn't cost anything extra (almost), the information is there in rect_node_internal_new(), we just have to store it. /Nicklas    2009-12-02 Nicklas Avén wrote:

Yes, I see what you mean. I thought about it too. > >But what about mixing all ideas in one.>I think I maybe was struck by an idea about it this morning, worth considering.>What about giving every parent 4 children. And then in the optimal node have one child with maxX, one with maxY, one minX and one minY. Probably it will be the same child that in reality has two or more positions, but then identify that it is the same and force the fourth child to that place, so the place for , say maxX is hold by a child with no internal max or min value (good enough)> >Then to get a overall direction between the geometries, compare the center of the boundingboxes and decide if the direction falls under nort, south, west or east.>So for two geometries that is west and east of each other we start deepdiving in the west respectively east corner of the nodes all the way domn to the leaf and get a first fairly good mindistance very fast. Then we can do the rest with a better min distance to !
 sort away the ones too far away for considering.> >Does it make any sense?> >I'm in a hurrry so maybe not to structured.> >/Nicklas
>
> 2009-12-01 Paul Ramsey wrote:
>
> I'm thinking that the worst-case behavior of the tree traversal could
> >also be a lot worse than the worse case behavior of the sort solution
> >(the recursion seems to provide a depth-first search, and if the first
> >stab is fairly far from the closest distance, it could take a while to
> >prune back the tree towards the optimum). Getting the most out of the
> >tree is almost certainly going to require caching.
> >
> >P.
> >
> >On Tue, Dec 1, 2009 at 11:12 AM, Nicklas Avén
> > .aven at jordogskog.no> wrote:
> >> Yes, I think you are right that my mistake was to use malloc.
> >> /Nicklas
> >>
> >> 2009-12-01 Paul Ramsey wrote:
> >>
> >> Nick,
> >>>
> >>>You asked if maybe you did something naive and the answer is "possibly
> >>>yes". For example, if you did your allocations using malloc() you got
> >>>a much more expensive heap allocation than if you used palloc() (or
> >>>lwalloc() in the context of a pgsql build).
> >>>
> >>>But anyhow, yes, knowing the number of edges before-hand, and knowing
> >>>our tree building scheme, we could easily allocate the whole segment
> >>>in one go at the start (~1.5 * npoints * sizeof(node)).
> >>>
> >>>For 2.0, I've been thinking about the way to make this stuff "real",
> >>>and I'm pretty sure the answer is to stick an extra handle on
> >>>POINTARRAY for holding these optional structures. Then, an
> >>>lwgeom_index(LWGEOM *g) function would walk into an LWGEOM, ensure all
> >>>the *bbox pointers are filled in, and build the trees on all the
> >>>POINTARRAYs. Then the whole thing could be neatly cached, if
> >>>necessary, etc.
> >>>
> >>>P.
> >>>
> >>>On Tue, Dec 1, 2009 at 10:16 AM, Nicklas Avén
> >>> wrote:
> >>>> aul
> >>>> If it is as I suspect that the number of memory allocations is the most
> >>>> costly part of this, wouldn't it be possible to allocate all of it once
> >>>> and
> >>>> for all. If you allocate one list of nodes for each level, the whole
> >>>> structure of the tree is given. You know where the children and parents
> >>>> is
> >>>> related to the start of the "level lists" the only thing missing is the
> >>>> min
> >>>> and max x any y for each level.
> >>>>
> >>>> /Nicklas
> >>>>
> >>>> 2009-12-01 Nicklas Avén wrote:
> >>>>
> >>>> :-) >
> >>>>
> >>>>
> >>>>>
> >>>> I think I see what you are doing
> >>>>>
> >>>> Had to take another look...
> >>>>>
> >>>>
> >>>>>
> >>>> The comarasion part is the same as I used for subgeometries and as you
> >>>> say
> >>>> (I saw the blog too) the natural ordering of the edges should make the
> >>>> tree
> >>>> very effective.
> >>>>>
> >>>>
> >>>>>
> >>>> But I think you have to cache the tree to beat my calculation :-)
> >>>>>
> >>>> That belive comes from some trying I did when writing the code, before I
> >>>> had
> >>>> found the qsort function. Then I build a tree as described in that
> >>>> C-bible I
> >>>> can't remember the name, just to get the measure-values ordered. I think
> >>>> it
> >>>> was the number of memory-allocations that showed to be very expensive. I
> >>>> might have done som elementary mistake but when I tried that tree with
> >>>> one
> >>>> node per vertex on the largest polygon of Alaska (about 70000 vertexes)
> >>>> it
> >>>> used almost 7 seconds just to build the tree and read it ordered. The
> >>>> whole
> >>>> calculation for distance between the largest polygons in Alaska and Texas
> >>>> now uses less than 100 ms. But it is quite likely that I had done som
> >>>> mistake in my treebuilding.
> >>>>>
> >>>> (That calculation Alaska Texas is extremly fast because Texas has that
> >>>> sharp
> >>>> edge aginst Alaska which is easy for my algoritm to catch.)
> >>>>>
> >>>>
> >>>>>
> >>>> /Nicklas
> >>>>>
> >>>>>
> >>>>
> >>>>>
> >>>>
> >>>>>
> >>>>> 2009-12-01 Nicklas Avén wrote:
> >>>>>
> >>>>> >
> >>>>>
> >>>> wow, I think postgis will be hard to beat in the long run :-)
> >>>>> >
> >>>>
> >>>>> >
> >>>> I'm not used to study those tree things, but it will be very interesting
> >>>> to
> >>>> follow this and try to understand it.
> >>>>> >
> >>>> Will it work on overlapping bboxes too?
> >>>>> >
> >>>>
> >>>>> >
> >>>> One thing I think is nice about "my" way of doing it, is that there is no
> >>>> indexes included. The gain is that it can be nested in subqueries
> >>>> and CTE's
> >>>> without loosing the speed. But I haven't really understood when that
> >>>> problem
> >>>> occur of not using the indexes.
> >>>>> >
> >>>>
> >>>>> >
> >>>> I have been thinking some in 3D terms too, totally without knowledge
> >>>> about
> >>>> reality. But it shouldn't be too impossible to do the same tricks in 3D
> >>>> and
> >>>> calculating point to surface instead of point to segment. But that would
> >>>> need a 3D type. I don't know how that use to be built, but I have
> >>>> imagined a
> >>>> list of 3D points and a related list with pointidentifiers in groups of
> >>>> three, defining triangels. Then it would be quite fast to do like my
> >>>> distance trick on the points and finding the surfaces to calculate
> >>>> distances
> >>>> to with some index on the pointidentifiers list...
> >>>>> >
> >>>>
> >>>>> >
> >>>> /Nicklas
> >>>>> >
> >>>>
> >>>>> >
> >>>>
> >>>>> >
> >>>>> >
> >>>>> > 2009-12-01 Paul Ramsey wrote:
> >>>>> >
> >>>>> > Niklas,
> >>>>> > >
> >>>>> > >Your cunning sorting trick got into my head and I thought and thought
> >>>>> > >on it and ended up borrowing the magic tree idea from Mark's prepared
> >>>>> > >p-i-p code, and added another dimension to that, so the trees are
> >>>>> > >actually in 2d space, and then starting thinking about how, given two
> >>>>> > >of these trees you could quickly find things like intersections or
> >>>>> > >even (*gasp*) distances...
> >>>>> > >
> >>>>> > >geog=# select distance(buffer('POINT(3 3)', 2),buffer('POINT(0 0)',
> >>>>> > > 2));
> >>>>> > > distance
> >>>>> > >-------------------
> >>>>> > > 0.242640687119285
> >>>>> > >(1 row)
> >>>>> > >
> >>>>> > >geog=# select npoints(buffer('POINT(3 3)', 2));
> >>>>> > > npoints
> >>>>> > >---------
> >>>>> > > 33
> >>>>> > >(1 row)
> >>>>> > >
> >>>>> > >geog=# select 33*33;
> >>>>> > > ?column?
> >>>>> > >----------
> >>>>> > > 1089
> >>>>> > >(1 row)
> >>>>> > >
> >>>>> > >These two circles have 33 points each, so the brute force method
> >>>>> > >requires 1089 distance calculations to come up with an answer. The
> >>>>> > >magic trees took 29. It's also possible to *cache* the trees and
> >>>>> > >re-use them, both for distance calculations and intersections and
> >>>>> > >containments.
> >>>>> > >
> >>>>> > >Very exciting stuffs!
> >>>>> > >
> >>>>> > >However, we're almost to 1.5 and adding all the features you already
> >>>>> > >have in your code (maximums, points and lines of shortness, etc)
> >>>>> > > would
> >>>>> > >take way too long. And I'm not sure my code is actually faster. Just
> >>>>> > >that it's much faster than the brute force ways.
> >>>>> > >
> >>>>> > >It's all in lwtree.c in liblwgeom if you want to look.
> >>>>> > >
> >>>>> > >P.
> >>>>> > >_______________________________________________
> >>>>> > >postgis-devel mailing list
> >>>>> > >postgis-devel at postgis.refractions.net
> >>>>> > >postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>>> > >
> >>>>> > >
> >>>> _______________________________________________
> >>>> postgis-devel mailing list
> >>>> postgis-devel at postgis.refractions.net
> >>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>>
> >>>>
> >>>_______________________________________________
> >>>postgis-devel mailing list
> >>>postgis-devel at postgis.refractions.net
> >>>postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>
> >>>
> >> _______________________________________________
> >> postgis-devel mailing list
> >> postgis-devel at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>
> >>
> >_______________________________________________
> >postgis-devel mailing list
> >postgis-devel at postgis.refractions.net
> >postgis.refractions.net/mailman/listinfo/postgis-devel
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20091202/0e03acc9/attachment.html>