[postgis-devel] Distance Madness

Paul Ramsey pramsey at cleverelephant.ca
Tue Dec 1 10:21:45 PST 2009


Nick,

You asked if maybe you did something naive and the answer is "possibly
yes". For example, if you did your allocations using malloc() you got
a much more expensive heap allocation than if you used palloc() (or
lwalloc() in the context of a pgsql build).

But anyhow, yes, knowing the number of edges before-hand, and knowing
our tree building scheme, we could easily allocate the whole segment
in one go at the start (~1.5 * npoints * sizeof(node)).

For 2.0, I've been thinking about the way to make this stuff "real",
and I'm pretty sure the answer is to stick an extra handle on
POINTARRAY for holding these optional structures. Then, an
lwgeom_index(LWGEOM *g) function would walk into an LWGEOM, ensure all
the *bbox pointers are filled in, and build the trees on all the
POINTARRAYs. Then the whole thing could be neatly cached, if
necessary, etc.

P.

On Tue, Dec 1, 2009 at 10:16 AM, Nicklas Avén
<nicklas.aven at jordogskog.no> wrote:
> aul
> If it is as I suspect that the number of memory allocations is the most
> costly part of this, wouldn't it be possible to allocate all of it once and
> for all. If you allocate one list of nodes for each level, the whole
> structure of the tree is given. You know where the children and parents is
> related to the start of the "level lists" the only thing missing is the min
> and max x any y for each level.
>
> /Nicklas
>
> 2009-12-01 Nicklas Avén wrote:
>
> :-) >
>
>
>>
> I think I see what you are doing
>>
> Had to take another look...
>>
>
>>
> The comarasion part is the same as I used for subgeometries and as you say
> (I saw the blog too) the natural ordering of the edges should make the tree
> very effective.
>>
>
>>
> But I think you have to cache the tree to beat my calculation :-)
>>
> That belive comes from some trying I did when writing the code, before I had
> found the qsort function. Then I build a tree as described in that C-bible I
> can't remember the name, just to get the measure-values ordered. I think it
> was the number of memory-allocations that showed to be very expensive. I
> might have done som elementary mistake but when I tried that tree with one
> node per vertex on the largest polygon of Alaska (about 70000 vertexes) it
> used almost 7 seconds just to build the tree and read it ordered. The whole
> calculation for distance between the largest polygons in Alaska and Texas
> now uses less than 100 ms. But it is quite likely that I had done som
> mistake in my treebuilding.
>>
> (That calculation Alaska Texas is extremly fast because Texas has that sharp
> edge aginst Alaska which is easy for my algoritm to catch.)
>>
>
>>
> /Nicklas
>>
>>
>
>>
>
>>
>> 2009-12-01 Nicklas Avén wrote:
>>
>> >
>>
> wow, I think postgis will be hard to beat in the long run :-)
>> >
>
>> >
> I'm not used to study those tree things, but it will be very interesting to
> follow this and try to understand it.
>> >
> Will it work on overlapping bboxes too?
>> >
>
>> >
> One thing I think is nice about "my" way of doing it, is that there is no
> indexes included. The gain is that it can be nested in subqueries and CTE's
> without loosing the speed. But I haven't really understood when that problem
> occur of not using the indexes.
>> >
>
>> >
> I have been thinking some in 3D terms too, totally without knowledge about
> reality. But it shouldn't be too impossible to do the same tricks in 3D and
> calculating point to surface instead of point to segment. But that would
> need a 3D type. I don't know how that use to be built, but I have imagined a
> list of 3D points and a related list with pointidentifiers in groups of
> three, defining triangels. Then it would be quite fast to do like my
> distance trick on the points and finding the surfaces to calculate distances
> to with some index on the pointidentifiers list...
>> >
>
>> >
> /Nicklas
>> >
>
>> >
>
>> >
>> >
>> > 2009-12-01 Paul Ramsey wrote:
>> >
>> > Niklas,
>> > >
>> > >Your cunning sorting trick got into my head and I thought and thought
>> > >on it and ended up borrowing the magic tree idea from Mark's prepared
>> > >p-i-p code, and added another dimension to that, so the trees are
>> > >actually in 2d space, and then starting thinking about how, given two
>> > >of these trees you could quickly find things like intersections or
>> > >even (*gasp*) distances...
>> > >
>> > >geog=# select distance(buffer('POINT(3 3)', 2),buffer('POINT(0 0)',
>> > > 2));
>> > > distance
>> > >-------------------
>> > > 0.242640687119285
>> > >(1 row)
>> > >
>> > >geog=# select npoints(buffer('POINT(3 3)', 2));
>> > > npoints
>> > >---------
>> > > 33
>> > >(1 row)
>> > >
>> > >geog=# select 33*33;
>> > > ?column?
>> > >----------
>> > > 1089
>> > >(1 row)
>> > >
>> > >These two circles have 33 points each, so the brute force method
>> > >requires 1089 distance calculations to come up with an answer. The
>> > >magic trees took 29. It's also possible to *cache* the trees and
>> > >re-use them, both for distance calculations and intersections and
>> > >containments.
>> > >
>> > >Very exciting stuffs!
>> > >
>> > >However, we're almost to 1.5 and adding all the features you already
>> > >have in your code (maximums, points and lines of shortness, etc) would
>> > >take way too long. And I'm not sure my code is actually faster. Just
>> > >that it's much faster than the brute force ways.
>> > >
>> > >It's all in lwtree.c in liblwgeom if you want to look.
>> > >
>> > >P.
>> > >_______________________________________________
>> > >postgis-devel mailing list
>> > >postgis-devel at postgis.refractions.net
>> > >postgis.refractions.net/mailman/listinfo/postgis-devel
>> > >
>> > >
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>



More information about the postgis-devel mailing list