[postgis-devel] Optimizing contains/within
Robert W. Burgholzer
rburghol at vt.edu
Mon Jul 23 15:19:46 PDT 2007
Very interesting. Could go about testing the performance increase by breaking up
my multis into individual geometries, or would that break on multiple rings?
Quoting Mark Leslie <mark at refractions.net>:
> The current implementation is of the second form. An RTREE index
> is used to quickly determine candidate ring segments and is cached in
> the PostgreSQL function call context. Repeated requests against the
> same polygon will reuse the cached index. This is not yet implemented
> for MultiPolygons or MultiPoints.
> Chris Hodgson wrote:
> > I don't have a good understanding of the level of optimization done to
> > the geos code, although I feel there is probably room for improvement
> > (likely at the cost of simplicity/understandability/synchronization with
> > JTS). What I believe to be a bigger problem is the levels of abstraction
> > between the SQL language, the postGIS internal structures, and the GEOS
> > internal structures, and the fact that we aren't doing anything to
> > "bridge the gaps" in usage.
> > If one was going to code an optimal solution to the problem "for each of
> > these very many points, see if they are within this one polygon", one
> > would build an indexed representation of the polygon that is designed
> > for quickly finding the sides that intersect any given y-axis. However,
> > the "point_in_ring" function is designed to find if just one point is
> > within in this one ring, and so it doesn't build this indexed polygon
> > structure, or if it does, it throws it away afterward.
> > Even if geos had a "batch optimized" function like "points_in_ring()"
> > that was designed to use the more optimized strategy, it would be
> > impossible to use it because your SQL query would likely look something
> > like:
> > SELECT point_in_poly(a.geom_column, b.geom_column) from a join b;
> > and so the SQL plan would be to call point_in_poly() many times, not
> > points_in_poly().
> > I think there are two solutions to this problem. The possibly simpler
> > but less generic solution would be to create some new functions in geos
> > and postgis that are designed to work on arrays/lists/sets of objects,
> > so that you could use SQL like this:
> > SELECT points_in_polys( (select geom_col from a), (select geom_col from
> > b) )
> > And be returned an array of results. This would provide a more optimized
> > way to do the more commonly used "batch" functions. Eventually someone
> > would want to be able to do every sort of function in the same sort of
> > batch method, and we'd end up with a lot of functions with plurals in
> > the names, and people would likely complain about how difficult it is to
> > work with the results of these functions (big ugly arrays) and how it
> > isn't very flexible.
> > I think the more generic solution that would enable you to do queries
> > like the first one, but still taking advantage of "batch optimizations",
> > is to somehow cache any data structures that geos creates to do any
> > operations on geometries. The kind of structures that need to be cached
> > are, first of all, the geos version of the geometry (so that it doesn't
> > have to be converted from the postgis one each time), and then, any
> > indexes or graph structures that are created for performing operations.
> > If this was implemented, then when point_in_ring() is called, the ring
> > would either have attached to it, or have a way to look up in a cache
> > held somewhere in the geos session, the y-indexed sides of the ring. The
> > very first call would create the index and store it there, and then
> > future calls could reuse it.
> > I'm not entirely certain whether these objects would get cached on the
> > postgis/postgresql side, or just within the GEOS "session", and I don't
> > know exactly how it would work, but I really feel that something like
> > this needs to be implemented in order to provide reasonable performance
> > for queries involving a large number of spatial operations or
> > comparisons on the same inputs. And, we are getting more and more
> > complaints about this specific performance problem.
> > I'd welcome further discussion on this possible solution... and I still
> > agree that you should definitely profile your query to see where it is
> > spending it's time. But I suspect that the #1 spot will be in the loop
> > over the sides in the point_in_ring function, and the only way to reduce
> > the looping is to look in a (cached) index instead of the entire list.
> > Chris
> > Robert W. Burgholzer wrote:
> >> I am looking at the geos code for the contains calls for a point in a
> >> multipolygon, which appears to boil down to the function:
> >> point_in_ring(POINTARRAY *pts, POINT2D *point)
> >> I am a real green C-coder, so if anyone could provide some basic
> >> feedback to my
> >> questions, that would be great. Sorry if they are stupid or
> >> ridiculously basic.
> >> If anyone could simply tell me whether my assumptions below are true
> >> or false,
> >> I would really appreciate it:
> >> 1. If I am looking at the correct function,
> >> 2. The time-consuming aspect is related to the sheer number of rings
> >> that a
> >> given complicated polygon may contain. This little function,
> >> point_in_ring,
> >> could literally be called thousands of times for each shape -
> >> multiplied by the
> >> number of possible points. (can be a really HUGE number)
> >> 3. The code uses a fairly compact algorithm that first determine the
> >> number of
> >> sides of the ring that cross the y-axis defined by the y-coord of the
> >> point,
> >> then adding up the number to the right and left of the point, an odd
> >> number
> >> means the point is inside, an even number means the point is outside.
> >> (I got
> >> this explanation from http://alienryderflex.com/polygon/ )
> >> 4. Given that point_in_ring may be called millions of times for a
> >> complex shape
> >> with numerous points to evaluate, any small increase in efficiency in
> >> memory
> >> addressing or other could cause a significant performance improvement.
> >> This is where my knowledge of C comes to a grinding halt. I have seen
> >> some
> >> information related to pointers, structures and such, so I was
> >> wondering what
> >> degree of confidence the function writers have regarding the use of
> >> such things
> >> in this particular function.
> >> Thanks,
> >> r.b.
> >> --
> >> Robert W. Burgholzer
> >> --
> >> Finding the occasional straw of truth awash in a great ocean of
> >> confusion and
> >> bamboozle requires intelligence, vigilance, dedication and courage.
> >> But if we
> >> don't practice these tough habits of thought, we cannot hope to solve
> >> the truly
> >> serious problems that face us -- and we risk becoming a nation of
> >> suckers, up
> >> for grabs by the next charlatan who comes along.
> >> -- Carl Sagan, "The Fine Art of Baloney Detection," Parade, February
> >> 1, 1987
> >> Online Workout Editor:
> >> http://www.swat-swimming.org/workoutlog/
> >> http://soulswimmer.dynalias.net/
> >> _______________________________________________
> >> postgis-devel mailing list
> >> postgis-devel at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
Robert W. Burgholzer
Finding the occasional straw of truth awash in a great ocean of confusion and
bamboozle requires intelligence, vigilance, dedication and courage. But if we
don't practice these tough habits of thought, we cannot hope to solve the truly
serious problems that face us -- and we risk becoming a nation of suckers, up
for grabs by the next charlatan who comes along.
-- Carl Sagan, "The Fine Art of Baloney Detection," Parade, February 1, 1987
Online Workout Editor:
More information about the postgis-devel