[postgis-users] [postgis-devel] ST_Union() performance problem (with possiblefunding)

Brent Wood pcreso at pcreso.com
Wed Oct 16 20:46:43 PDT 2013

These tend to assume each operation is constrained to a single cell, hence parallelizable, hence undertaking an operation on multiple cells concurrently.

While I'm dealing with many cells - they are all being merged into a single multipolygon feature. I can't load two cells into the same multipolygon at the same time - two writes to the same record - so can't run concurrently.

What may be possible is to perhaps run multiple processes on subsets to create intermediate (larger) merged polygons which can then be merged themselves to create the final single feature.

This would probably allow better use of resources on a multi core system... at present I'm using 1.7% of memory on a 100% cpu process, so I'll look into this approach - 8 cores running concurrently giving giving close to 8x faster is useful.


 From: "maplabs at light42.com" <maplabs at light42.com>
To: Brent Wood <pcreso at pcreso.com>; PostGIS Development Discussion <postgis-devel at lists.osgeo.org>; Bborie Park <dustymugs at gmail.com> 
Cc: PostGIS Users Discussion <postgis-users at lists.osgeo.org> 
Sent: Thursday, October 17, 2013 3:16 PM
Subject: Re: [postgis-devel] ST_Union() performance problem (with	possiblefunding)

true.. but with a thorough read you might notice that the gdal_retile.py
experiment was largely ineffective, 
but if you click on the link at the
top to the *next post*  
      Variable Buffers in
you will find the one that really worked well.. in fact, we used
that 2nd post in production for months, to great effect.
The trick on one
machine was to split to work by some constant, and then make psycopg2
connections for each "bucket."

This worked very well.. 

Since then I have experimented only a tiny bit with SPARK from
the Berkeley Amp Lab for a distributed work load on a Hadoop file system, but
that world has no GEOS (yet) 

Brian M
OSGeo California

On Wed, 16 Oct 2013
17:28:27 -0700, Bborie Park <dustymugs at gmail.com>

Your best bet is to consider splitting the workload among several
postgresql connections.
>darkblueb had a blog post about
>On Wed, Oct 16, 2013 at 5:21
PM, Brent Wood <pcreso at pcreso.com> wrote:
>>Any advice appreciated!!
>>I'm undertaking a spatial analysis using Postgis (what else would I use!!!). The
first part works well.
>>I take a large number (potentially millions) of lines defined
by start & end points & buffer them to create polygons. (I'm working in
lat/long EPSG:4326 but transforming to a custom equal area projection for the
buffering operation).
>>I generate a grid of 5x5km cells (polygons) covering the region of
>>I clip the line based polygons to the grid, so I can generate statistics for
each cell describing the lines that intersect with it, various quantitative
measures such as ST_Union() the clipped line polygons to generate a footprint in
each cell to work out how much is/is not covered, or sum the ST_Area() of the
clipped polygons grouped by cell to calculate an aggregate cover, which can be
several times the actual cell area.
>>So far so good, it works
well, the code is clear & transparent & provides a good result. At least
as good as any commercial software can do. My test
 data subset is processed from scratch in about 30 minutes.
>>Now I want to ST_Union() all the cell based polygons into an overall
single multipolygon representing the footprint. The code is simple. The
performance, even with my subset,  is a problem.
>>I have thousands of
cell based footprint multipolygons, each potentially with thousands of vertices
to be ST_Union()ed. Runtime is weeks for an iteration. If I need separate total
footprints for 20 different species annually for 5 years, that is 100
iterations. Memory & I/O use is minimal - it is totally cpu bound.
am looking at trying to simplify the polygons to be unioned to reduce the number
of vertices (& hence processing) involved, but to achieve any significant
benefit I'm having to change the shape of the polygons to ST_Union() too much. 
>>Does anyone have any suggestions as to how this could be made
significantly faster? 
>>If I
 had $$ to throw at developers to work on the codebase (presumably GEOS?) could
performance be significantly improved?
>>   Brent
>>postgis-devel mailing list
>>postgis-devel at lists.osgeo.org
>postgis-devel mailing list
>postgis-devel at lists.osgeo.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20131016/5d46df13/attachment.html>

More information about the postgis-users mailing list