[postgis-devel] More work on Cascade Union

Thu Aug 21 05:19:13 PDT 2008

 >Hi Regina,

>I can see the way you're thinking, but I don't think it's quite right. 
>If you are trying to union 1 million polygons, whether you do that as 
>1,000 batches of 1,000 polygons (cascaded initial run) or as 1,000,000 
>polygons straight away (currently), then you are still required to 
>convert all 1,000,000 geometries to GEOS. Since the accumulation 
>functions store the complete array in memory before passing to GEOS 
>anyway (see unite_garray), I don't see that there would be much 
>difference in this.

>The interesting aspect of the problem, as you rightly point out, is to 
>reduce the number of conversions to the minimum number required.
Perhaps 
>it may make more sense to do more work in LWGEOM format first before 
>passing to GEOS? But it's difficult to know without a working 
>implementation. I'm sure it would be possible to come up with something

>completely within PostGIS as it stands, but I think it would make more 
>sense to add the functionality to GEOS and work on minimising the 
>conversions/adding caching to improve overall GEOS performance.

>ATB,

>Mark.

Mark,

Actually I was thinking do less work in PostGIS and more in GEOS.  From
what I'm looking at the accumulate passes to GEOS 1 million times since
the GEOSUnion is done in a for loop.

True regardless
of how you do it you need to convert 1,000,000 geometries to 1,000,000
GEOS geometries.  I'm just thinking the penalty of flipping back and
forth between GEOS is much higher than we think and becomes very
apparent when you are dealing with large numbers and not to mention each
pass is passing a larger geometry to GEOS.  So thinking is 

1 big call to GEOS is cheaper than 1,000,000 small calls to GEOS (and
those last calls are not that small actually) and also allows GEOS to do
more.  A lot of the unite_garray stuff I suspect - e.g. iterating thru
the collection and unioning is already done in GEOS and probably more
efficiently so to me the work that unite_garray is doing is already
redundant.  If we introduce Cascade Union into the mix that involves
adding geometries to rtree indexes breaking out etc. and we try to do a
lot of that in PostGIS so we can feed smaller sets to GEOS - it is even
more redundant.

I'm just suggesting a change in order or rather introduction of another
function or revision of the PostGIS2GEOS and making unite_garray look
more like polygonize_garray which forms an array of geoms and hands it
off to GEOSPolygonize except even there the code does a lot of what
unite_garray is doing. 

Perhaps there is no way out of this repetitiveness, but to me it looks
like the same pattern of work over and over again which seems to scream
for a PostGISArray2GEOS function but maybe the way C pointers are
handled this is not possible to consolidate into one function.

PostGISArray2GEOS --> which loops thru PG array and does all that silly
checking to make sure SRIDs match that we are doing in unite_garray and
looks to me more effiently done in polygonize_garray and constructs one
whopping array/collection to hand to GEOS and then do one call to
GeosUnion instead of 1,000,000 calls to GeosUnion we are doing in that
for loop.  

I suspect we can pretty much copy the code in polygonize_garray which
loops and forms an array of geos geoms and passes this off to 
GEOSPolygonize capi function.

Haven't tested but I suspect 1 big call is cheaper than 1,000,000 small
calls especially for as Martin mentioned something as small and stupid
as points.

g1 = PostGISArray2GEOS(PGarraypointergoeshere);
geos_result =  GEOSUnionGEOSArray(g1);
GEOSSetSRID(geos_result, SRID);
result = GEOS2POSTGIS(geos_result, is3d);
GEOSGeom_destroy(geos_result);

As shown above I have introduced a currently imaginary Capi function
called GEOSUnionGEOSArray  which I envision before Cascade union is in
place - would do something like what GEOSPolygonize is doing 

1) Create geometrycollection from passed in array
2) Then create an empty geometry - 
3) Then does a 
BinaryOp(g1, empty, overlayOp(OverlayOp::opUNION));

Then when we introduce CascadedUnion in GEOS - we replace this function
(signature remains the same) and - it will then do exactly what the JTS
UnaryOp operation is doing and we only need to touch this GEOS Capi
function.

What do you think about that idea?

Thanks,
Regina

-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.