[postgis-devel] [PostGIS] #3949: Infinite (?) loop in distance calculation

Nicklas Avén nicklas.aven at jordogskog.no
Sat Dec 23 12:07:45 PST 2017


Paul

I think I found a fruit that should be cheap and low hanging that makes
your code faster for at least the larger objects.

The case with the large polygon I tested yesterday was that it had a
lot of holes. It seems like the current code is less hurt by those
holes than your code. It doesn't seem to filter away the holes, but
they fall out earlier in the process. 

So, when I test the exterior ring from the same polygon against the
same polygons as yesterday (but only 100 of them), the current code
uses 550 ms instead of 1 sec. But your code falls from 1.9 seconds to
370 ms !!!

That means that it is faster when they have the same work to do. 

So the low hanging fruit must be to check for overlapping bboxes. If
they have don't have overlapping bboxes, don't build a tree for the
holes (which I guess it does now from the numbers)

/Nicklas



On Fri, 2017-12-22 at 12:28 -0800, Paul Ramsey wrote:
> Very interesting. I'm surprised/sad that with the big geometry case
> the tree doesn't outperform. I was hoping the complexity overhead
> would be washed out by the size. However, maybe some profiling now
> would help find other areas of efficiency to makes things better.
> 
> P.
> 
> On Fri, Dec 22, 2017 at 12:11 PM, Nicklas Avén
> <nicklas.aven at jordogskog.no> wrote:
> > Copy here from irc
> > 
> > I have done some small tests
> > 
> > I get identical results when testing 1000000 distance calculations
> > in
> > 1:50000 Norwegian areal map
> > 
> > 
> > For that geometry in your ticket your code totally outperforms
> > bothe
> > brute force and the other algorithm
> > 
> > But for more real data as you said it is slower
> > I get about double time for ST_DistanceRectTree compared to
> > ST_Distance
> > with average of 64 vertex points in the geometries
> > 
> > I also put a large geometry as first argument and the first 1000
> > geometries in the Norwegian dataset as second argument. The large
> > geometry have approx 190000 points
> > 
> > Still double time for ST_DistanceRectTree
> > approx 10 sec for ST_Distance and 19 sec for ST_DistanceRectTree
> > Then the other geometry has 39 points in average
> > 
> > If there s no low hanging fruits for making it faster, I agree that
> > we
> > should find the breaking points where it is faster and
> > differentiate in
> > some way
> > 
> > It will not be possible to do 100 % since the performance of the
> > current code depends also on the distance between the geometries
> > 
> > Well, maybe an expanded box is worth the effort to find what is
> > close
> > enough
> > 
> > Something like anything inside a box expanded 5 times and x number
> > of
> > points goes the tree distance route, and maybe most important all
> > overlapping bboxes
> > 
> > 
> > /Nicklas
> > 
> > 
> > On Fri, 2017-12-22 at 06:43 -0800, Paul Ramsey wrote:
> > > On Fri, Dec 22, 2017 at 5:46 AM, Nicklas Avén
> > > <nicklas.aven at jordogskog.no> wrote:
> > > > Sounds very good.
> > > > 
> > > > And I think it is worth to sacrify 10 % speed in some occations
> > > > for
> > > > the
> > > > predictability.
> > > > 
> > > > And caching the tree must be a very good thing.
> > > > 
> > > > When you say 100x faster, do you then mean in cases like the
> > > > one in
> > > > the
> > > > ticket?
> > > 
> > > No 100x faster for objects of that size (thousands of edges), but
> > > without the total awfulness. A set of 10 comparisons that take
> > > 14s in
> > > aggregate for current take 140ms for the tree.
> > > 
> > > > If so, it isn't primary the amount of vertes points that is the
> > > > bottle
> > > > neck for todays code but other things like distance between the
> > > > geometries and if there is a lot of vertex points in the part
> > > > of
> > > > the
> > > > border that is hard for the algorithm to filter on (the part
> > > > facing
> > > > the
> > > > other geometry, specially concave cases)
> > > > 
> > > > But I think it doesn't matter if todays code performs on pair
> > > > or
> > > > even
> > > > better in some cases. Predictability is more important.
> > > 
> > > I'm less sanguine. Lots-of-small-pairings is a more common case
> > > than
> > > a
> > > few large ones, so we have to not lose performance in that
> > > baseline
> > > case.
> > > 
> > > P
> > > 
> > > > 
> > > > I will give it a test run if the code is ready for that.
> > > > 
> > > > /Nicklas
> > > > 
> > > > 
> > > > 
> > > > On Fri, 2017-12-22 at 05:35 -0800, Paul Ramsey wrote:
> > > > > So, I now have tree distance code that seems to work, thanks
> > > > > to
> > > > > this
> > > > > little bug, which caused me to find the error in my tree
> > > > > code.
> > > > > I'd
> > > > > given up on it, figured the tree approach was a dead end.
> > > > > 
> > > > > https://github.com/pramsey/postgis/tree/svn-trunk-distance
> > > > > 
> > > > > In very limited testing, I see it running quite a bit faster
> > > > > (100x)
> > > > > than the existing code on large objects, but slightly slower
> > > > > on
> > > > > small
> > > > > objects (10%). I was thinking on merging it to trunk but as a
> > > > > completely independent function for now
> > > > > (ST_DistanceRectTree())
> > > > > so
> > > > > that it can be tested and the functional limits of it
> > > > > determined.
> > > > > Then
> > > > > we can discuss how we want to integrate it into generic
> > > > > ST_Distance(),
> > > > > maybe with a object size test, or maybe as a cached approach
> > > > > (skipping
> > > > > the tree built step in cases with repeated inputs would
> > > > > probably
> > > > > be a
> > > > > win).
> > > > > 
> > > > > Thoughts?
> > > > > 
> > > > > P
> > > > > 
> > > > > 
> > > > > On Fri, Dec 22, 2017 at 4:15 AM, PostGIS <trac at osgeo.org>
> > > > > wrote:
> > > > > > 
> > > > > > Comment (by nicklas):
> > > > > > 
> > > > > >  I agree.
> > > > > > 
> > > > > >  But worst case is different for different methods.
> > > > > >  The example Paul provided here is among the worst cases
> > > > > > for
> > > > > > the
> > > > > > algorithm
> > > > > >  we use now (when bboxes doesn't overlap, then it falls
> > > > > > back to
> > > > > > brute
> > > > > >  force)
> > > > > > 
> > > > > >  If the shoreline of both geometries would have been convex
> > > > > > it
> > > > > > would have
> > > > > >  been quite a lot faster, but now one is concave and all
> > > > > > points
> > > > > > in
> > > > > > the
> > > > > >  concave part needs to be tested more or less.
> > > > > > 
> > > > > >  So for this case that is a pain for that algorithm it
> > > > > > loses
> > > > > > the
> > > > > > battle
> > > > > >  with about 1.08 (79 sec vs 85 sec)
> > > > > > 
> > > > > >  But for the case with the same geometries further away it
> > > > > > wins
> > > > > > by
> > > > > > using
> > > > > >  approx 0.04 of the time brute force method uses. And with
> > > > > > the
> > > > > > case
> > > > > > when we
> > > > > >  turn the other side of the geometries against each other
> > > > > > it
> > > > > > wins
> > > > > > by using
> > > > > >  approx 0.0005 of the brute force method.
> > > > > > 
> > > > > >  So in this case I think the 8 % worse result in some cases
> > > > > > is
> > > > > > ok.
> > > > > > BUT, the
> > > > > >  problem is that it is unpredictable. Users will like Paul
> > > > > > think
> > > > > > that
> > > > > >  something is wrong.
> > > > > > 
> > > > > >  But we have lived with this code now since 2010 and I
> > > > > > don't
> > > > > > think
> > > > > > there
> > > > > >  have been many posts on the list about that issue. But I
> > > > > > have
> > > > > > hit
> > > > > > that
> > > > > >  wall a few times myself making queries much slower than I
> > > > > > expect.
> > > > > > And it
> > > > > >  can be very frustrating when testing a sample to get an
> > > > > > idea
> > > > > > of
> > > > > > the query
> > > > > >  time and then it takes 100 times longer.
> > > > > > 
> > > > > > --
> > > > > > Ticket URL: <https://trac.osgeo.org/postgis/ticket/3949#com
> > > > > > ment
> > > > > > :9>
> > > > > > PostGIS <http://trac.osgeo.org/postgis/>
> > > > > > The PostGIS Trac is used for bug, enhancement & task
> > > > > > tracking,
> > > > > > a
> > > > > > user and developer wiki, and a view into the subversion
> > > > > > code
> > > > > > repository of PostGIS project.
> > > > > 
> > > > > _______________________________________________
> > > > > postgis-devel mailing list
> > > > > postgis-devel at lists.osgeo.org
> > > > > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> > > > 
> > > > _______________________________________________
> > > > postgis-devel mailing list
> > > > postgis-devel at lists.osgeo.org
> > > > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> > > 
> > > _______________________________________________
> > > postgis-devel mailing list
> > > postgis-devel at lists.osgeo.org
> > > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> > 
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> 
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel


More information about the postgis-devel mailing list