[postgis-devel] Distance Madness

Tue Dec 1 13:44:44 PST 2009

I'm thinking that the worst-case behavior of the tree traversal could
also be a lot worse than the worse case behavior of the sort solution
(the recursion seems to provide a depth-first search, and if the first
stab is fairly far from the closest distance, it could take a while to
prune back the tree towards the optimum). Getting the most out of the
tree is almost certainly going to require caching.

P.

On Tue, Dec 1, 2009 at 11:12 AM, Nicklas Avén
<nicklas.aven at jordogskog.no> wrote:
> Yes, I think you are right that my mistake was to use malloc.
> /Nicklas
>
> 2009-12-01 Paul Ramsey wrote:
>
> Nick,
>>
>>You asked if maybe you did something naive and the answer is "possibly
>>yes". For example, if you did your allocations using malloc() you got
>>a much more expensive heap allocation than if you used palloc() (or
>>lwalloc() in the context of a pgsql build).
>>
>>But anyhow, yes, knowing the number of edges before-hand, and knowing
>>our tree building scheme, we could easily allocate the whole segment
>>in one go at the start (~1.5 * npoints * sizeof(node)).
>>
>>For 2.0, I've been thinking about the way to make this stuff "real",
>>and I'm pretty sure the answer is to stick an extra handle on
>>POINTARRAY for holding these optional structures. Then, an
>>lwgeom_index(LWGEOM *g) function would walk into an LWGEOM, ensure all
>>the *bbox pointers are filled in, and build the trees on all the
>>POINTARRAYs. Then the whole thing could be neatly cached, if
>>necessary, etc.
>>
>>P.
>>
>>On Tue, Dec 1, 2009 at 10:16 AM, Nicklas Avén
>> wrote:
>>> aul
>>> If it is as I suspect that the number of memory allocations is the most
>>> costly part of this, wouldn't it be possible to allocate all of it once
>>> and
>>> for all. If you allocate one list of nodes for each level, the whole
>>> structure of the tree is given. You know where the children and parents
>>> is
>>> related to the start of the "level lists" the only thing missing is the
>>> min
>>> and max x any y for each level.
>>>
>>> /Nicklas
>>>
>>> 2009-12-01 Nicklas Avén wrote:
>>>
>>> :-) >
>>>
>>>
>>>>
>>> I think I see what you are doing
>>>>
>>> Had to take another look...
>>>>
>>>
>>>>
>>> The comarasion part is the same as I used for subgeometries and as you
>>> say
>>> (I saw the blog too) the natural ordering of the edges should make the
>>> tree
>>> very effective.
>>>>
>>>
>>>>
>>> But I think you have to cache the tree to beat my calculation :-)
>>>>
>>> That belive comes from some trying I did when writing the code, before I
>>> had
>>> found the qsort function. Then I build a tree as described in that
>>> C-bible I
>>> can't remember the name, just to get the measure-values ordered. I think
>>> it
>>> was the number of memory-allocations that showed to be very expensive. I
>>> might have done som elementary mistake but when I tried that tree with
>>> one
>>> node per vertex on the largest polygon of Alaska (about 70000 vertexes)
>>> it
>>> used almost 7 seconds just to build the tree and read it ordered. The
>>> whole
>>> calculation for distance between the largest polygons in Alaska and Texas
>>> now uses less than 100 ms. But it is quite likely that I had done som
>>> mistake in my treebuilding.
>>>>
>>> (That calculation Alaska Texas is extremly fast because Texas has that
>>> sharp
>>> edge aginst Alaska which is easy for my algoritm to catch.)
>>>>
>>>
>>>>
>>> /Nicklas
>>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>> 2009-12-01 Nicklas Avén wrote:
>>>>
>>>> >
>>>>
>>> wow, I think postgis will be hard to beat in the long run :-)
>>>> >
>>>
>>>> >
>>> I'm not used to study those tree things, but it will be very interesting
>>> to
>>> follow this and try to understand it.
>>>> >
>>> Will it work on overlapping bboxes too?
>>>> >
>>>
>>>> >
>>> One thing I think is nice about "my" way of doing it, is that there is no
>>> indexes included. The gain is that it can be nested in subqueries
>>> and CTE's
>>> without loosing the speed. But I haven't really understood when that
>>> problem
>>> occur of not using the indexes.
>>>> >
>>>
>>>> >
>>> I have been thinking some in 3D terms too, totally without knowledge
>>> about
>>> reality. But it shouldn't be too impossible to do the same tricks in 3D
>>> and
>>> calculating point to surface instead of point to segment. But that would
>>> need a 3D type. I don't know how that use to be built, but I have
>>> imagined a
>>> list of 3D points and a related list with pointidentifiers in groups of
>>> three, defining triangels. Then it would be quite fast to do like my
>>> distance trick on the points and finding the surfaces to calculate
>>> distances
>>> to with some index on the pointidentifiers list...
>>>> >
>>>
>>>> >
>>> /Nicklas
>>>> >
>>>
>>>> >
>>>
>>>> >
>>>> >
>>>> > 2009-12-01 Paul Ramsey wrote:
>>>> >
>>>> > Niklas,
>>>> > >
>>>> > >Your cunning sorting trick got into my head and I thought and thought
>>>> > >on it and ended up borrowing the magic tree idea from Mark's prepared
>>>> > >p-i-p code, and added another dimension to that, so the trees are
>>>> > >actually in 2d space, and then starting thinking about how, given two
>>>> > >of these trees you could quickly find things like intersections or
>>>> > >even (*gasp*) distances...
>>>> > >
>>>> > >geog=# select distance(buffer('POINT(3 3)', 2),buffer('POINT(0 0)',
>>>> > > 2));
>>>> > > distance
>>>> > >-------------------
>>>> > > 0.242640687119285
>>>> > >(1 row)
>>>> > >
>>>> > >geog=# select npoints(buffer('POINT(3 3)', 2));
>>>> > > npoints
>>>> > >---------
>>>> > > 33
>>>> > >(1 row)
>>>> > >
>>>> > >geog=# select 33*33;
>>>> > > ?column?
>>>> > >----------
>>>> > > 1089
>>>> > >(1 row)
>>>> > >
>>>> > >These two circles have 33 points each, so the brute force method
>>>> > >requires 1089 distance calculations to come up with an answer. The
>>>> > >magic trees took 29. It's also possible to *cache* the trees and
>>>> > >re-use them, both for distance calculations and intersections and
>>>> > >containments.
>>>> > >
>>>> > >Very exciting stuffs!
>>>> > >
>>>> > >However, we're almost to 1.5 and adding all the features you already
>>>> > >have in your code (maximums, points and lines of shortness, etc)
>>>> > > would
>>>> > >take way too long. And I'm not sure my code is actually faster. Just
>>>> > >that it's much faster than the brute force ways.
>>>> > >
>>>> > >It's all in lwtree.c in liblwgeom if you want to look.
>>>> > >
>>>> > >P.
>>>> > >_______________________________________________
>>>> > >postgis-devel mailing list
>>>> > >postgis-devel at postgis.refractions.net
>>>> > >postgis.refractions.net/mailman/listinfo/postgis-devel
>>>> > >
>>>> > >
>>> _______________________________________________
>>> postgis-devel mailing list
>>> postgis-devel at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>>
>>>
>>_______________________________________________
>>postgis-devel mailing list
>>postgis-devel at postgis.refractions.net
>>http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>