[postgis-devel] ST_Union() performance problem (with possible funding)

Brent Wood pcreso at pcreso.com
Wed Oct 16 17:21:06 PDT 2013


Hi, 


Any advice appreciated!!

I'm undertaking a spatial analysis using Postgis (what else would I use!!!). The first part works well.


I take a large number (potentially millions) of lines defined by start & end points & buffer them to create polygons. (I'm working in lat/long EPSG:4326 but transforming to a custom equal area projection for the buffering operation).

I generate a grid of 5x5km cells (polygons) covering the region of interest.

I clip the line based polygons to the grid, so I can generate statistics for each cell describing the lines that intersect with it, various quantitative measures such as ST_Union() the clipped line polygons to generate a footprint in each cell to work out how much is/is not covered, or sum the ST_Area() of the clipped polygons grouped by cell to calculate an aggregate cover, which can be several times the actual cell area.



So far so good, it works well, the code is clear & transparent & provides a good result. At least as good as any commercial software can do. My test data subset is processed from scratch in about 30 minutes.

Now I want to ST_Union() all the cell based polygons into an overall single multipolygon representing the footprint. The code is simple. The performance, even with my subset,  is a problem.

I have thousands of cell based footprint multipolygons, each potentially with thousands of vertices to be ST_Union()ed. Runtime is weeks for an iteration. If I need separate total footprints for 20 different species annually for 5 years, that is 100 iterations. Memory & I/O use is minimal - it is totally cpu bound.

I am looking at trying to simplify the polygons to be unioned to reduce the number of vertices (& hence processing) involved, but to achieve any significant benefit I'm having to change the shape of the polygons to ST_Union() too much. 



Does anyone have any suggestions as to how this could be made significantly faster? 
If I had $$ to throw at developers to work on the codebase (presumably GEOS?) could performance be significantly improved?


Thanks,

   Brent Wood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20131016/02a509ea/attachment.html>


More information about the postgis-devel mailing list