[postgis-devel] [PostGIS] #3949: Infinite (?) loop in distance calculation

Nicklas Aven nicklas.aven at jordogskog.no
Fri Dec 22 13:20:33 PST 2017


But I didn't study it. It might be the large geometry had most of it points turned away from the other geometries. That would gain todays code a lot.
I will look at it later.

That is actually probably tje case since the large polygon is probably including a shore line with many points turning away from the geometries not next to the see

/Nicklas

Sent from my Sony Xperia™ smartphone

---- Paul Ramsey wrote ----

>Very interesting. I'm surprised/sad that with the big geometry case
>the tree doesn't outperform. I was hoping the complexity overhead
>would be washed out by the size. However, maybe some profiling now
>would help find other areas of efficiency to makes things better.
>
>P.
>
>On Fri, Dec 22, 2017 at 12:11 PM, Nicklas Avén
><nicklas.aven at jordogskog.no> wrote:
>> Copy here from irc
>>
>> I have done some small tests
>>
>> I get identical results when testing 1000000 distance calculations in
>> 1:50000 Norwegian areal map
>>
>>
>> For that geometry in your ticket your code totally outperforms bothe
>> brute force and the other algorithm
>>
>> But for more real data as you said it is slower
>> I get about double time for ST_DistanceRectTree compared to ST_Distance
>> with average of 64 vertex points in the geometries
>>
>> I also put a large geometry as first argument and the first 1000
>> geometries in the Norwegian dataset as second argument. The large
>> geometry have approx 190000 points
>>
>> Still double time for ST_DistanceRectTree
>> approx 10 sec for ST_Distance and 19 sec for ST_DistanceRectTree
>> Then the other geometry has 39 points in average
>>
>> If there s no low hanging fruits for making it faster, I agree that we
>> should find the breaking points where it is faster and differentiate in
>> some way
>>
>> It will not be possible to do 100 % since the performance of the
>> current code depends also on the distance between the geometries
>>
>> Well, maybe an expanded box is worth the effort to find what is close
>> enough
>>
>> Something like anything inside a box expanded 5 times and x number of
>> points goes the tree distance route, and maybe most important all
>> overlapping bboxes
>>
>>
>> /Nicklas
>>
>>
>> On Fri, 2017-12-22 at 06:43 -0800, Paul Ramsey wrote:
>>> On Fri, Dec 22, 2017 at 5:46 AM, Nicklas Avén
>>> <nicklas.aven at jordogskog.no> wrote:
>>> > Sounds very good.
>>> >
>>> > And I think it is worth to sacrify 10 % speed in some occations for
>>> > the
>>> > predictability.
>>> >
>>> > And caching the tree must be a very good thing.
>>> >
>>> > When you say 100x faster, do you then mean in cases like the one in
>>> > the
>>> > ticket?
>>>
>>> No 100x faster for objects of that size (thousands of edges), but
>>> without the total awfulness. A set of 10 comparisons that take 14s in
>>> aggregate for current take 140ms for the tree.
>>>
>>> > If so, it isn't primary the amount of vertes points that is the
>>> > bottle
>>> > neck for todays code but other things like distance between the
>>> > geometries and if there is a lot of vertex points in the part of
>>> > the
>>> > border that is hard for the algorithm to filter on (the part facing
>>> > the
>>> > other geometry, specially concave cases)
>>> >
>>> > But I think it doesn't matter if todays code performs on pair or
>>> > even
>>> > better in some cases. Predictability is more important.
>>>
>>> I'm less sanguine. Lots-of-small-pairings is a more common case than
>>> a
>>> few large ones, so we have to not lose performance in that baseline
>>> case.
>>>
>>> P
>>>
>>> >
>>> > I will give it a test run if the code is ready for that.
>>> >
>>> > /Nicklas
>>> >
>>> >
>>> >
>>> > On Fri, 2017-12-22 at 05:35 -0800, Paul Ramsey wrote:
>>> > > So, I now have tree distance code that seems to work, thanks to
>>> > > this
>>> > > little bug, which caused me to find the error in my tree code.
>>> > > I'd
>>> > > given up on it, figured the tree approach was a dead end.
>>> > >
>>> > > https://github.com/pramsey/postgis/tree/svn-trunk-distance
>>> > >
>>> > > In very limited testing, I see it running quite a bit faster
>>> > > (100x)
>>> > > than the existing code on large objects, but slightly slower on
>>> > > small
>>> > > objects (10%). I was thinking on merging it to trunk but as a
>>> > > completely independent function for now (ST_DistanceRectTree())
>>> > > so
>>> > > that it can be tested and the functional limits of it determined.
>>> > > Then
>>> > > we can discuss how we want to integrate it into generic
>>> > > ST_Distance(),
>>> > > maybe with a object size test, or maybe as a cached approach
>>> > > (skipping
>>> > > the tree built step in cases with repeated inputs would probably
>>> > > be a
>>> > > win).
>>> > >
>>> > > Thoughts?
>>> > >
>>> > > P
>>> > >
>>> > >
>>> > > On Fri, Dec 22, 2017 at 4:15 AM, PostGIS <trac at osgeo.org> wrote:
>>> > > >
>>> > > > Comment (by nicklas):
>>> > > >
>>> > > >  I agree.
>>> > > >
>>> > > >  But worst case is different for different methods.
>>> > > >  The example Paul provided here is among the worst cases for
>>> > > > the
>>> > > > algorithm
>>> > > >  we use now (when bboxes doesn't overlap, then it falls back to
>>> > > > brute
>>> > > >  force)
>>> > > >
>>> > > >  If the shoreline of both geometries would have been convex it
>>> > > > would have
>>> > > >  been quite a lot faster, but now one is concave and all points
>>> > > > in
>>> > > > the
>>> > > >  concave part needs to be tested more or less.
>>> > > >
>>> > > >  So for this case that is a pain for that algorithm it loses
>>> > > > the
>>> > > > battle
>>> > > >  with about 1.08 (79 sec vs 85 sec)
>>> > > >
>>> > > >  But for the case with the same geometries further away it wins
>>> > > > by
>>> > > > using
>>> > > >  approx 0.04 of the time brute force method uses. And with the
>>> > > > case
>>> > > > when we
>>> > > >  turn the other side of the geometries against each other it
>>> > > > wins
>>> > > > by using
>>> > > >  approx 0.0005 of the brute force method.
>>> > > >
>>> > > >  So in this case I think the 8 % worse result in some cases is
>>> > > > ok.
>>> > > > BUT, the
>>> > > >  problem is that it is unpredictable. Users will like Paul
>>> > > > think
>>> > > > that
>>> > > >  something is wrong.
>>> > > >
>>> > > >  But we have lived with this code now since 2010 and I don't
>>> > > > think
>>> > > > there
>>> > > >  have been many posts on the list about that issue. But I have
>>> > > > hit
>>> > > > that
>>> > > >  wall a few times myself making queries much slower than I
>>> > > > expect.
>>> > > > And it
>>> > > >  can be very frustrating when testing a sample to get an idea
>>> > > > of
>>> > > > the query
>>> > > >  time and then it takes 100 times longer.
>>> > > >
>>> > > > --
>>> > > > Ticket URL: <https://trac.osgeo.org/postgis/ticket/3949#comment
>>> > > > :9>
>>> > > > PostGIS <http://trac.osgeo.org/postgis/>
>>> > > > The PostGIS Trac is used for bug, enhancement & task tracking,
>>> > > > a
>>> > > > user and developer wiki, and a view into the subversion code
>>> > > > repository of PostGIS project.
>>> > >
>>> > > _______________________________________________
>>> > > postgis-devel mailing list
>>> > > postgis-devel at lists.osgeo.org
>>> > > https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>> >
>>> > _______________________________________________
>>> > postgis-devel mailing list
>>> > postgis-devel at lists.osgeo.org
>>> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>>
>>> _______________________________________________
>>> postgis-devel mailing list
>>> postgis-devel at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>_______________________________________________
>postgis-devel mailing list
>postgis-devel at lists.osgeo.org
>https://lists.osgeo.org/mailman/listinfo/postgis-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20171222/9995e884/attachment-0001.html>


More information about the postgis-devel mailing list