[geos-devel] fastunion aggregate

David Blasby dblasby at refractions.net
Mon Oct 27 13:15:32 EST 2003


strk,

I assume your fast algorthim looks like this:

1. Define a type for "Array of Geometry" (already automatically defined 
by postgresql)

2. have a modified collect() that just builds a big array of geometries 
instead of building a single big geometry.   If you wanted to get really 
fancy, you could probably just store the geometry's subobjects (but 
thats probably extra effort for little gain).

4. Then do the actual union (foldl).  Be aggressive in memory deallocation.
     result = 1st geometry in array
     for each other geometry in array
        result = result union new geometry

Alternatively, we could define a pair-wise union (i.e. 1 union 2 = A, 3 
union 4 = B, 5 union 6 = C,7 union 8 = D... then A union B , C union 
D...)  This might work out better for certain types of data.

5. convert the resulting geometry to postgis.


This should be pretty fast since we're only converting postgis->GEOS 
once and only one GEOS->postgis.

Unfortunately, for large unites, we could have memory problems.
The above process will use memory equal to all the geometries.  During 
step #4, we should be holding about constant memory in the worst case.

In the worst case (uniting a set of disjoint polygons), this shouldnt 
use more memory than any other method.


dave
ps. Ideally it would be great to carry around GEOS geometries instead of 
postgis geometries, but I think we'll run into problem with events like 
aborted queries and the like.

pps.  The geometry-constructing routine used by collect is O(n^2), but I 
bet it would be much faster to write your own (O(n)).  Basically, you'll 
have to scan the Array of Geometry to find the number of sub-objects and 
their sizes (make sure you account for memory alignment).  Then you just 
create a big resulting geometry.

The current method looks like this:
	a. create a single-subobject geometry
	b. create a brand new 2-object geometry based on a + another sub-object
	c. create a brand new 3-object geometry based on b + another sub-object
	d. create a brand new 4-object geometry based on c + another sub-object

Each sub-object will be copied, on average, n/2 times.




More information about the geos-devel mailing list