[postgis-users] Parallelisation provides powerful postgis performance perks (script + ppt slides)

Mark Wynter mark at dimensionaledge.com
Thu Jul 23 15:54:47 PDT 2015


A couple of tutorials on the subject... With full code on github

http://dimensionaledge.com/intro-vector-tiling-map-reduce-postgis/

http://dimensionaledge.com/from-days-to-minutes-geoprocessing-of-alberta-land-use-data/

> When I briefly look at the text you have written in the "Quick Example" It
> seems that you are distributing your query by an ID field. I am wondering
> how your method would apply to raster datasets? Distributing geographic
> data by an ID can get you into problems because of the dependency for
> certain analytical functions.

Gnu parallel great for processing complex pipelines. Mix and match with PostGis vectors and rasters, grass, R, gDAL etc 

iD is simplest way...  But your job list can have multiple arguments which you can feed into plpgsql function that would be called in the worker function. 

You can build in as much sophistication as you like into the plpgsql function.

Some things to bear in mind - get your querys working efficiently before scaling out - otherwise you are scaling out bad practice.
Batch processing faster than individual processing
And Dump your multipolgons into individual polygons if do intersection analysis.

Another parsllelusation tool is R via pl/r, which I'm using for routing analysis. More specialized and not always as versatile as Gnu parallel.

hTH.
Mark



More information about the postgis-users mailing list