[postgis-devel] [PostGIS] #3949: Infinite (?) loop in distance calculation
Nicklas Avén
nicklas.aven at jordogskog.no
Fri Dec 22 12:11:07 PST 2017
Copy here from irc
I have done some small tests
I get identical results when testing 1000000 distance calculations in
1:50000 Norwegian areal map
For that geometry in your ticket your code totally outperforms bothe
brute force and the other algorithm
But for more real data as you said it is slower
I get about double time for ST_DistanceRectTree compared to ST_Distance
with average of 64 vertex points in the geometries
I also put a large geometry as first argument and the first 1000
geometries in the Norwegian dataset as second argument. The large
geometry have approx 190000 points
Still double time for ST_DistanceRectTree
approx 10 sec for ST_Distance and 19 sec for ST_DistanceRectTree
Then the other geometry has 39 points in average
If there s no low hanging fruits for making it faster, I agree that we
should find the breaking points where it is faster and differentiate in
some way
It will not be possible to do 100 % since the performance of the
current code depends also on the distance between the geometries
Well, maybe an expanded box is worth the effort to find what is close
enough
Something like anything inside a box expanded 5 times and x number of
points goes the tree distance route, and maybe most important all
overlapping bboxes
/Nicklas
On Fri, 2017-12-22 at 06:43 -0800, Paul Ramsey wrote:
> On Fri, Dec 22, 2017 at 5:46 AM, Nicklas Avén
> <nicklas.aven at jordogskog.no> wrote:
> > Sounds very good.
> >
> > And I think it is worth to sacrify 10 % speed in some occations for
> > the
> > predictability.
> >
> > And caching the tree must be a very good thing.
> >
> > When you say 100x faster, do you then mean in cases like the one in
> > the
> > ticket?
>
> No 100x faster for objects of that size (thousands of edges), but
> without the total awfulness. A set of 10 comparisons that take 14s in
> aggregate for current take 140ms for the tree.
>
> > If so, it isn't primary the amount of vertes points that is the
> > bottle
> > neck for todays code but other things like distance between the
> > geometries and if there is a lot of vertex points in the part of
> > the
> > border that is hard for the algorithm to filter on (the part facing
> > the
> > other geometry, specially concave cases)
> >
> > But I think it doesn't matter if todays code performs on pair or
> > even
> > better in some cases. Predictability is more important.
>
> I'm less sanguine. Lots-of-small-pairings is a more common case than
> a
> few large ones, so we have to not lose performance in that baseline
> case.
>
> P
>
> >
> > I will give it a test run if the code is ready for that.
> >
> > /Nicklas
> >
> >
> >
> > On Fri, 2017-12-22 at 05:35 -0800, Paul Ramsey wrote:
> > > So, I now have tree distance code that seems to work, thanks to
> > > this
> > > little bug, which caused me to find the error in my tree code.
> > > I'd
> > > given up on it, figured the tree approach was a dead end.
> > >
> > > https://github.com/pramsey/postgis/tree/svn-trunk-distance
> > >
> > > In very limited testing, I see it running quite a bit faster
> > > (100x)
> > > than the existing code on large objects, but slightly slower on
> > > small
> > > objects (10%). I was thinking on merging it to trunk but as a
> > > completely independent function for now (ST_DistanceRectTree())
> > > so
> > > that it can be tested and the functional limits of it determined.
> > > Then
> > > we can discuss how we want to integrate it into generic
> > > ST_Distance(),
> > > maybe with a object size test, or maybe as a cached approach
> > > (skipping
> > > the tree built step in cases with repeated inputs would probably
> > > be a
> > > win).
> > >
> > > Thoughts?
> > >
> > > P
> > >
> > >
> > > On Fri, Dec 22, 2017 at 4:15 AM, PostGIS <trac at osgeo.org> wrote:
> > > >
> > > > Comment (by nicklas):
> > > >
> > > > I agree.
> > > >
> > > > But worst case is different for different methods.
> > > > The example Paul provided here is among the worst cases for
> > > > the
> > > > algorithm
> > > > we use now (when bboxes doesn't overlap, then it falls back to
> > > > brute
> > > > force)
> > > >
> > > > If the shoreline of both geometries would have been convex it
> > > > would have
> > > > been quite a lot faster, but now one is concave and all points
> > > > in
> > > > the
> > > > concave part needs to be tested more or less.
> > > >
> > > > So for this case that is a pain for that algorithm it loses
> > > > the
> > > > battle
> > > > with about 1.08 (79 sec vs 85 sec)
> > > >
> > > > But for the case with the same geometries further away it wins
> > > > by
> > > > using
> > > > approx 0.04 of the time brute force method uses. And with the
> > > > case
> > > > when we
> > > > turn the other side of the geometries against each other it
> > > > wins
> > > > by using
> > > > approx 0.0005 of the brute force method.
> > > >
> > > > So in this case I think the 8 % worse result in some cases is
> > > > ok.
> > > > BUT, the
> > > > problem is that it is unpredictable. Users will like Paul
> > > > think
> > > > that
> > > > something is wrong.
> > > >
> > > > But we have lived with this code now since 2010 and I don't
> > > > think
> > > > there
> > > > have been many posts on the list about that issue. But I have
> > > > hit
> > > > that
> > > > wall a few times myself making queries much slower than I
> > > > expect.
> > > > And it
> > > > can be very frustrating when testing a sample to get an idea
> > > > of
> > > > the query
> > > > time and then it takes 100 times longer.
> > > >
> > > > --
> > > > Ticket URL: <https://trac.osgeo.org/postgis/ticket/3949#comment
> > > > :9>
> > > > PostGIS <http://trac.osgeo.org/postgis/>
> > > > The PostGIS Trac is used for bug, enhancement & task tracking,
> > > > a
> > > > user and developer wiki, and a view into the subversion code
> > > > repository of PostGIS project.
> > >
> > > _______________________________________________
> > > postgis-devel mailing list
> > > postgis-devel at lists.osgeo.org
> > > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> >
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
More information about the postgis-devel
mailing list