[postgis-devel] Optimizing contains/within

Mark Leslie mark at refractions.net
Mon Jul 23 15:44:26 PDT 2007


Any number of rings in a polygon is supported, just not multiple
polygons in a geometry.
Mark

Robert W. Burgholzer wrote:
> Very interesting. Could go about testing the performance increase by breaking up
> my multis into individual geometries, or would that break on multiple rings?
> 
> r.b.
> 
> 
> Quoting Mark Leslie <mark at refractions.net>:
> 
>> The current implementation is of the second form.  An RTREE index
>>
> (http://lin-ear-th-inking.blogspot.com/2007/06/packed-1-dimensional-r-tree.html)
>> is used to quickly determine candidate ring segments and is cached in
>> the PostgreSQL function call context.  Repeated requests against the
>> same polygon will reuse the cached index.  This is not yet implemented
>> for MultiPolygons or MultiPoints.
>> Mark
>>
>> Chris Hodgson wrote:
>>> I don't have a good understanding of the level of optimization done to
>>> the geos code, although I feel there is probably room for improvement
>>> (likely at the cost of simplicity/understandability/synchronization with
>>> JTS). What I believe to be a bigger problem is the levels of abstraction
>>> between the SQL language, the postGIS internal structures, and the GEOS
>>> internal structures, and the fact that we aren't doing anything to
>>> "bridge the gaps" in usage.
>>>
>>> If one was going to code an optimal solution to the problem "for each of
>>> these very many points, see if they are within this one polygon", one
>>> would build an indexed representation of the polygon that is designed
>>> for quickly finding the sides that intersect any given y-axis. However,
>>> the "point_in_ring" function is designed to find if just one point is
>>> within in this one ring, and so it doesn't build this indexed polygon
>>> structure, or if it does, it throws it away afterward.
>>>
>>> Even if geos had a "batch optimized" function like "points_in_ring()"
>>> that was designed to use the more optimized strategy, it would be
>>> impossible to use it because your SQL query would likely look something
>>> like:
>>>
>>> SELECT point_in_poly(a.geom_column, b.geom_column) from a join b;
>>>
>>> and so the SQL plan would be to call point_in_poly() many times, not
>>> points_in_poly().
>>>
>>> I think there are two solutions to this problem. The possibly simpler
>>> but less generic solution would be to create some new functions in geos
>>> and postgis that are designed to work on arrays/lists/sets of objects,
>>> so that you could use SQL like this:
>>>
>>> SELECT points_in_polys( (select geom_col from a), (select geom_col from
>>> b) )
>>>
>>> And be returned an array of results. This would provide a more optimized
>>> way to do the more commonly used "batch" functions. Eventually someone
>>> would want to be able to do every sort of function in the same sort of
>>> batch method, and we'd end up with a lot of functions with plurals in
>>> the names, and people would likely complain about how difficult it is to
>>> work with the results of these functions (big ugly arrays) and how it
>>> isn't very flexible.
>>>
>>> I think the more generic solution that would enable you to do queries
>>> like the first one, but still taking advantage of "batch optimizations",
>>> is to somehow cache any data structures that geos creates to do any
>>> operations on geometries. The kind of structures that need to be cached
>>> are, first of all, the geos version of the geometry (so that it doesn't
>>> have to be converted from the postgis one each time), and then, any
>>> indexes or graph structures that are created for performing operations.
>>> If this was implemented, then when point_in_ring() is called, the ring
>>> would either have attached to it, or have a way to look up in a cache
>>> held somewhere in the geos session, the y-indexed sides of the ring. The
>>> very first call would create the index and store it there, and then
>>> future calls could reuse it.
>>>
>>> I'm not entirely certain whether these objects would get cached on the
>>> postgis/postgresql side, or just within the GEOS "session", and I don't
>>> know exactly how it would work, but I really feel that something like
>>> this needs to be implemented in order to provide reasonable performance
>>> for queries involving a large number of spatial operations or
>>> comparisons on the same inputs. And, we are getting more and more
>>> complaints about this specific performance problem.
>>>
>>> I'd welcome further discussion on this possible solution... and I still
>>> agree that you should definitely profile your query to see where it is
>>> spending it's time. But I suspect that the #1 spot will be in the loop
>>> over the sides in the point_in_ring function, and the only way to reduce
>>> the looping is to look in a (cached) index instead of the entire list.
>>>
>>> Chris
>>>
>>> Robert W. Burgholzer wrote:
>>>> I am looking at the geos code for the contains calls for a point in a
>>>> multipolygon, which appears to boil down to the function:
>>>>
>>>> point_in_ring(POINTARRAY *pts, POINT2D *point)
>>>>
>>>> I am a real green C-coder, so if anyone could provide some basic
>>>> feedback to my
>>>> questions, that would be great. Sorry if they are stupid or
>>>> ridiculously basic.
>>>>  If anyone could simply tell me whether my assumptions below are true
>>>> or false,
>>>> I would really appreciate it:
>>>>
>>>> 1. If I am looking at the correct function,
>>>> 2. The time-consuming aspect is related to the sheer number of rings
>>>> that a
>>>> given complicated polygon may contain.  This little function,
>>>> point_in_ring,
>>>> could literally be called thousands of times for each shape -
>>>> multiplied by the
>>>> number of possible points. (can be a really HUGE number)
>>>> 3. The code uses a fairly compact algorithm that first determine the
>>>> number of
>>>> sides of the ring that cross the y-axis defined by the y-coord of the
>>>> point,
>>>> then adding up the number to the right and left of the point, an odd
>>>> number
>>>> means the point is inside, an even number means the point is outside.
>>>> (I got
>>>> this explanation from http://alienryderflex.com/polygon/ )
>>>> 4. Given that point_in_ring may be called millions of times for a
>>>> complex shape
>>>> with numerous points to evaluate, any small increase in efficiency in
>>>> memory
>>>> addressing or other could cause a significant performance improvement.
>>>>
>>>>
>>>> This is where my knowledge of C comes to a grinding halt. I have seen
>>>> some
>>>> information related to pointers, structures and such, so I was
>>>> wondering what
>>>> degree of confidence the function writers have regarding the use of
>>>> such things
>>>> in this particular function.
>>>>
>>>> Thanks,
>>>> r.b.
>>>>
>>>>
>>>> --
>>>> Robert W. Burgholzer
>>>> --
>>>> Finding the occasional straw of truth awash in a great ocean of
>>>> confusion and
>>>> bamboozle requires intelligence, vigilance, dedication and courage.
>>>> But if we
>>>> don't practice these tough habits of thought, we cannot hope to solve
>>>> the truly
>>>> serious problems that face us -- and we risk becoming a nation of
>>>> suckers, up
>>>> for grabs by the next charlatan who comes along.
>>>> -- Carl Sagan, "The Fine Art of Baloney Detection," Parade, February
>>>> 1, 1987
>>>>
>>>> Online Workout Editor:
>>>> http://www.swat-swimming.org/workoutlog/
>>>> http://soulswimmer.dynalias.net/
>>>> _______________________________________________
>>>> postgis-devel mailing list
>>>> postgis-devel at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>> _______________________________________________
>>> postgis-devel mailing list
>>> postgis-devel at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
> 
> 
> --
> Robert W. Burgholzer
> --
> Finding the occasional straw of truth awash in a great ocean of confusion and
> bamboozle requires intelligence, vigilance, dedication and courage.  But if we
> don't practice these tough habits of thought, we cannot hope to solve the truly
> serious problems that face us -- and we risk becoming a nation of suckers, up
> for grabs by the next charlatan who comes along.
> -- Carl Sagan, "The Fine Art of Baloney Detection," Parade, February 1, 1987
> 
> Online Workout Editor:
> http://www.swat-swimming.org/workoutlog/
> http://soulswimmer.dynalias.net/
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel




More information about the postgis-devel mailing list