[postgis-users] Geoprocessing & BigData

Vincent Picavet (ml) vincent.ml at oslandia.com
Mon Jan 18 10:30:10 PST 2016


Hi Ravi,




On 18/01/2016 19:14, Ravi Pavuluri wrote:
> Hi All,
> 
> I am checking if there is a way to process quickly large datasets such
> as census blocks in PostGIS and also by leveraging big data platform. I
> have few questions in this regard.
> 
> 1) When I try intersect for sample census blocks with another polygon
> layer, PostGIS 2.2(on Postgres 9.4) takes ~60 minutes (after optimizing
> from http://postgis.net/2014/03/14/tip_intersection_faster/ ) while on 
> ESRI ArcMap takes ~10 minutes. PostGIS layers already have geospatial
> indices. Is there anyway to optimize this further?

Following the links on your page, here is a good answer from Paul (TL;DR
: st_intersection is slow, avoid it) :
http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562

> 2) What is an equivalent of ESRI Union in PostGIS? I didn't see any out
> of the box functions and any tips here are appreciated.

If ESRI Union makes a union, maybe st_union ? But I guess there are some
semantic issues here.

> 3) Is there anyway we can expedite these geoprocessing
> tasks(union/intersect etc) using big data platform (Ex: hadoop)? Most
> examples talk about analysis (contains etc)  but not about geoprocessing
> on geospatial data. Any input is appreciated.

Lots of people do geoprocessing too with PostGIS, including long-running
jobs on large volumes of data ( worldwide osm data processing namely).
"Big data" is a really subjective word. Are your geoprocessing needs
really parallelizable ? What kind of volumes are we talking about ? MB,
GB, TB ? What kind of hardware do you have at hand ?

One way to do some sort of map-reduce with PostGIS is to use a bunch of
servers with FDW connections between a source master and these slaves,
map the data processing to the slave servers and reduce it on the main
server. With a bit of Python as glue code this can be automated and
quite efficient, even though this kind of sharding is not automated (
yet ?).

Vincent

> 
> Thanks,
> Ravi.
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
> 



More information about the postgis-users mailing list