[postgis-devel] More work on Cascade Union

Obe, Regina robe.dnd at cityofboston.gov
Tue Aug 19 16:45:08 PDT 2008


Martin,

Do you know if OpenJump finally fully integrated the cascade union?  I downloaded the openjump nightly build.  I did timings and it was able to complete usstatebounds in about 12 secs (whereas before remember it gave a memory overflow) and the original samply_poly I think it did in about 22 seconds.  I wasn't sure if this is because I am running on a Vista pc (and probably more beefed up one than I ran last time).  I also don't see the count down with the union thing anymore.  


Attached are my latest efforts. This includes the envelop check.  The bool_and aggregate was disappointing so I fell on good old EXISTS (in this scenario especially when you can't really rely on indexes, exists performs better than a LEFT JOIN).  

 Unfortunately in PostgreSQL since the geometries get swept into an array, I can't really count on indexes to help out so I limited the max check to be 1000 records and also do iterated checks.  Which I think is fairly different from the JTS approach.


 This time I changed the core of the cascade union function into a plpgsql function since I figured people would have an easier time reading it.  Not that I couldn't do it all with sql functions, but sql functions had the disadvantage of not allowing declaration of variables or using pretty names for input variables.

I have the st99_d00 set down to below 8 secs - but still sucks compared to JTS cascade union (3-4 secs), but still much better than the 48-55 second timings I get with the postgis union function.

I fear I can't do much better than this without delving into GEOS land, which I am not quite ready for.

One thing I did discover is that the dump collect method is a very important thing to have and would make the whole ST_Collect thing much more useful for people if it actually existed since I think that is what people actually expect from that function.  I think part of the frustration that people have is that st_collect returns geometry collections more often than not.  I created a helper function that does it since I had to use that technique so many times in the code, but I suspect this would be done much more efficiently in the lwpostgis code since it would then require only one trip instead of many.  I haven't calculated to see how much of the time is wasted in there.  I know that most of the lost time is still in the ST_Unite_Garray function and the non-intersect check did not help quite as much as I had hoped it would.




--7855 ms, 7874 ms
SELECT ST_CascadeUnion(the_geom) FROM st99_d00;

This syntax which escapes the aggregate penalty gives me these timings
--7195 ms, 6804 ms, 6996 ms
SELECT st_unitecascade_garray_sort(ARRAY(SELECT the_geom FROM st99_d00));


--114282 ms -- SELECT 114282/1000.0/60 -- 1.90 minutes (I didn't bother running this with ST_Union since I would be waiting till the cows came home)
SELECT st_unitecascade_garray_sort(ARRAY(SELECT the_geom FROM usstatebounds));

--152438 ms -- SELECT 152438/1000.0/60 -- 2.55 minutes
SELECT ST_CascadeUnion(the_geom) FROM usstatebounds;

My cursory glance of the stats looks like I am getting the right results although the internal sorting may not be optimal.

Any feedback would be appreciated.















-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20080819/73131de7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cascadeunion080818_12_functions_plpgsql.sql
Type: application/octet-stream
Size: 5244 bytes
Desc: cascadeunion080818_12_functions_plpgsql.sql
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20080819/73131de7/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cascadeunion080818_12_results.sql
Type: application/octet-stream
Size: 4360 bytes
Desc: cascadeunion080818_12_results.sql
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20080819/73131de7/attachment-0001.obj>


More information about the postgis-devel mailing list