[postgis-devel] ST_Union() performance problem (with possiblefunding)
maplabs at light42.com
maplabs at light42.com
Wed Oct 16 19:16:18 PDT 2013
yes, true.. but with a thorough read you might notice that the
gdal_retile.py experiment was largely ineffective,
but if you click on the link at the top to the *next post*
Variable Buffers in PostGIS
you will find the one that really worked well.. in fact, we used that
2nd post in production for months, to great effect.
The trick on one machine was to split to work by some constant, and
then make psycopg2 connections for each "bucket."
This worked very well..
Since then I have experimented only a tiny bit with SPARK from the
Berkeley Amp Lab for a distributed work load on a Hadoop file system,
but that world has no GEOS (yet)
--
Brian M Hamlin
OSGeo California Chapter
blog.light42.com
On Wed, 16 Oct 2013 17:28:27 -0700, Bborie Park wrote:
Your best bet is to consider splitting the workload among
several postgresql connections.
darkblueb had a blog post about this...
http://blog.light42.com/wordpress/?p=23
On Wed, Oct 16, 2013 at 5:21 PM, Brent Wood
wrote:
Hi,
Any advice appreciated!!
I'm undertaking a spatial analysis using Postgis (what else would I
use!!!). The first part works well.
I take a large number (potentially millions) of lines defined by start
& end points & buffer them to create polygons. (I'm working in lat/long
EPSG:4326 but transforming to a custom equal area projection for the
buffering operation).
I generate a grid of 5x5km cells (polygons) covering the region of interest.
I clip the line based polygons to the grid, so I can generate
statistics for each cell describing the lines that intersect with it,
various quantitative measures such as ST_Union() the clipped line
polygons to generate a footprint in each cell to work out how much
is/is not covered, or sum the ST_Area() of the clipped polygons grouped
by cell to calculate an aggregate cover, which can be several times the
actual cell area.
So far so good, it works well, the code is clear & transparent &
provides a good result. At least as good as any commercial software can
do. My test data subset is processed from scratch in about 30 minutes.
Now I want to ST_Union() all the cell based polygons into an overall
single multipolygon representing the footprint. The code is simple. The
performance, even with my subset, is a problem.
I have thousands of cell based footprint multipolygons, each
potentially with thousands of vertices to be ST_Union()ed. Runtime is
weeks for an iteration. If I need separate total footprints for 20
different species annually for 5 years, that is 100 iterations. Memory
& I/O use is minimal - it is totally cpu bound.
I am looking at trying to simplify the polygons to be unioned to reduce
the number of vertices (& hence processing) involved, but to achieve
any significant benefit I'm having to change the shape of the polygons
to ST_Union() too much.
Does anyone have any suggestions as to how this could be made
significantly faster?
If I had $$ to throw at developers to work on the codebase (presumably
GEOS?) could performance be significantly improved?
Thanks,
Brent Wood
_______________________________________________
postgis-devel mailing list
postgis-devel at lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
-------------------------
_______________________________________________
postgis-devel mailing list
postgis-devel at lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20131016/9afa4615/attachment.html>
More information about the postgis-devel
mailing list