[postgis-devel] More work on Cascade Union

Mark Cave-Ayland mark.cave-ayland at siriusit.co.uk
Fri Aug 22 02:12:53 PDT 2008


Obe, Regina wrote:

> Mark,
> 
> Actually I was thinking do less work in PostGIS and more in GEOS.  From
> what I'm looking at the accumulate passes to GEOS 1 million times since
> the GEOSUnion is done in a for loop.
> 
> True regardless
> of how you do it you need to convert 1,000,000 geometries to 1,000,000
> GEOS geometries.  I'm just thinking the penalty of flipping back and
> forth between GEOS is much higher than we think and becomes very
> apparent when you are dealing with large numbers and not to mention each
> pass is passing a larger geometry to GEOS.  So thinking is 
> 
> 1 big call to GEOS is cheaper than 1,000,000 small calls to GEOS (and
> those last calls are not that small actually) and also allows GEOS to do
> more.  A lot of the unite_garray stuff I suspect - e.g. iterating thru
> the collection and unioning is already done in GEOS and probably more
> efficiently so to me the work that unite_garray is doing is already
> redundant.  If we introduce Cascade Union into the mix that involves
> adding geometries to rtree indexes breaking out etc. and we try to do a
> lot of that in PostGIS so we can feed smaller sets to GEOS - it is even
> more redundant.
> 
> I'm just suggesting a change in order or rather introduction of another
> function or revision of the PostGIS2GEOS and making unite_garray look
> more like polygonize_garray which forms an array of geoms and hands it
> off to GEOSPolygonize except even there the code does a lot of what
> unite_garray is doing. 
> 
> Perhaps there is no way out of this repetitiveness, but to me it looks
> like the same pattern of work over and over again which seems to scream
> for a PostGISArray2GEOS function but maybe the way C pointers are
> handled this is not possible to consolidate into one function.
> 
> PostGISArray2GEOS --> which loops thru PG array and does all that silly
> checking to make sure SRIDs match that we are doing in unite_garray and
> looks to me more effiently done in polygonize_garray and constructs one
> whopping array/collection to hand to GEOS and then do one call to
> GeosUnion instead of 1,000,000 calls to GeosUnion we are doing in that
> for loop.  
> 
> I suspect we can pretty much copy the code in polygonize_garray which
> loops and forms an array of geos geoms and passes this off to 
> GEOSPolygonize capi function.
> 
> Haven't tested but I suspect 1 big call is cheaper than 1,000,000 small
> calls especially for as Martin mentioned something as small and stupid
> as points.
> 
> g1 = PostGISArray2GEOS(PGarraypointergoeshere);
> geos_result =  GEOSUnionGEOSArray(g1);
> GEOSSetSRID(geos_result, SRID);
> result = GEOS2POSTGIS(geos_result, is3d);
> GEOSGeom_destroy(geos_result);
> 
> As shown above I have introduced a currently imaginary Capi function
> called GEOSUnionGEOSArray  which I envision before Cascade union is in
> place - would do something like what GEOSPolygonize is doing 
> 
> 1) Create geometrycollection from passed in array
> 2) Then create an empty geometry - 
> 3) Then does a 
> BinaryOp(g1, empty, overlayOp(OverlayOp::opUNION));
> 
> Then when we introduce CascadedUnion in GEOS - we replace this function
> (signature remains the same) and - it will then do exactly what the JTS
> UnaryOp operation is doing and we only need to touch this GEOS Capi
> function.
> 
> What do you think about that idea?

Interesting. The hardest thing about these types of problem is that you 
can spend days analysing them, and then finally when you implement 
something, it behaves in a totally unexpected way.

I have absolutely no evidence for this other than intuition, however I 
suspect that caching will do a good job of minimising the effect of 
repeated calls, and there would be very little difference between the 
two approaches. I suspect that there may be some savings by trying to 
avoid constantly appending geometries to PostgreSQL arrays, but other 
than that I think we are limited in what we can do without caching 
conversion results.

It does sounds as if there could be some scope for combining 
unite_garray/polygonize_garray which could be interesting to do in terms 
of tidying up the code. But of course the total disclaimer is that I 
haven't analysed this in any detail, so feel free to experiment and 
prove me wrong :)


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063



More information about the postgis-devel mailing list