[postgis-devel] Optimizing contains/within

Robert W. Burgholzer rburghol at vt.edu
Mon Jul 23 15:17:08 PDT 2007


Chris,
Thanks for the thoughtful tour through the interlocking pieces!

I will continue to stay tuned to this issue.

r.b.

Quoting Chris Hodgson <chodgson at refractions.net>:

> I don't have a good understanding of the level of optimization done to
> the geos code, although I feel there is probably room for improvement
> (likely at the cost of simplicity/understandability/synchronization with
> JTS). What I believe to be a bigger problem is the levels of abstraction
> between the SQL language, the postGIS internal structures, and the GEOS
> internal structures, and the fact that we aren't doing anything to
> "bridge the gaps" in usage.
>
> If one was going to code an optimal solution to the problem "for each of
> these very many points, see if they are within this one polygon", one
> would build an indexed representation of the polygon that is designed
> for quickly finding the sides that intersect any given y-axis. However,
> the "point_in_ring" function is designed to find if just one point is
> within in this one ring, and so it doesn't build this indexed polygon
> structure, or if it does, it throws it away afterward.
>
> Even if geos had a "batch optimized" function like "points_in_ring()"
> that was designed to use the more optimized strategy, it would be
> impossible to use it because your SQL query would likely look something
> like:
>
> SELECT point_in_poly(a.geom_column, b.geom_column) from a join b;
>
> and so the SQL plan would be to call point_in_poly() many times, not
> points_in_poly().
>
> I think there are two solutions to this problem. The possibly simpler
> but less generic solution would be to create some new functions in geos
> and postgis that are designed to work on arrays/lists/sets of objects,
> so that you could use SQL like this:
>
> SELECT points_in_polys( (select geom_col from a), (select geom_col from b) )
>
> And be returned an array of results. This would provide a more optimized
> way to do the more commonly used "batch" functions. Eventually someone
> would want to be able to do every sort of function in the same sort of
> batch method, and we'd end up with a lot of functions with plurals in
> the names, and people would likely complain about how difficult it is to
> work with the results of these functions (big ugly arrays) and how it
> isn't very flexible.
>
> I think the more generic solution that would enable you to do queries
> like the first one, but still taking advantage of "batch optimizations",
> is to somehow cache any data structures that geos creates to do any
> operations on geometries. The kind of structures that need to be cached
> are, first of all, the geos version of the geometry (so that it doesn't
> have to be converted from the postgis one each time), and then, any
> indexes or graph structures that are created for performing operations.
> If this was implemented, then when point_in_ring() is called, the ring
> would either have attached to it, or have a way to look up in a cache
> held somewhere in the geos session, the y-indexed sides of the ring. The
> very first call would create the index and store it there, and then
> future calls could reuse it.
>
> I'm not entirely certain whether these objects would get cached on the
> postgis/postgresql side, or just within the GEOS "session", and I don't
> know exactly how it would work, but I really feel that something like
> this needs to be implemented in order to provide reasonable performance
> for queries involving a large number of spatial operations or
> comparisons on the same inputs. And, we are getting more and more
> complaints about this specific performance problem.
>
> I'd welcome further discussion on this possible solution... and I still
> agree that you should definitely profile your query to see where it is
> spending it's time. But I suspect that the #1 spot will be in the loop
> over the sides in the point_in_ring function, and the only way to reduce
> the looping is to look in a (cached) index instead of the entire list.
>
> Chris
>

--
Robert W. Burgholzer
--
Finding the occasional straw of truth awash in a great ocean of confusion and
bamboozle requires intelligence, vigilance, dedication and courage.  But if we
don't practice these tough habits of thought, we cannot hope to solve the truly
serious problems that face us -- and we risk becoming a nation of suckers, up
for grabs by the next charlatan who comes along.
-- Carl Sagan, "The Fine Art of Baloney Detection," Parade, February 1, 1987

Online Workout Editor:
http://www.swat-swimming.org/workoutlog/
http://soulswimmer.dynalias.net/



More information about the postgis-devel mailing list