[postgis-users] Geoprocessing & BigData
Ravi Pavuluri
ravitheja at ymail.com
Mon Jan 18 14:46:05 PST 2016
Vincent and Remi,
Thank you both for your inputs. I have combined two things in one thread. Parallelization is a secondary need and I will look into "Postgresql-Xc, Greenplum or custom code approach".
Regarding the PostGIS performance on intersecting geometries, I am not able to see any improvement. I am looking at intersection because of my use case. (Ex: What % of census blocks fall in Zone A, Zone B, Zone C etc. flood zones from Flood Zones Layer). If intersect is to avoided, can this be achieved through another way?
@Vincent : For ArcGIS Union, please see here.
http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/analysis_tools/union_analysis_.htm
Any inputs are appreciated.
Thanks again,
Ravi.
--------------------------------------------
On Mon, 1/18/16, Rémi Cura <remi.cura at gmail.com> wrote:
Subject: Re: [postgis-users] Geoprocessing & BigData
To: vincent.ml at oslandia.com, "PostGIS Users Discussion" <postgis-users at lists.osgeo.org>
Cc: "Ravi Pavuluri" <ravitheja at ymail.com>
Date: Monday, January 18, 2016, 2:51 PM
Hey,
if you have one
beefy server you can parallelize throwing several queries
working on sub set of your data.
(aka parallel
processing trough data partition).
One conceptual
example : you want to process the world, you create 20
workers, a list of countries, and then make the worker
process the list country by country.
If you think one
postgres server will not be sufficient,
you
could of course shard your data across several servers,
with options ranging from writting from scratch
(you rewrite everything),
to using existing
open source code, to dedicated solution like
Postgresql-Xc, greenplum, ...
However, sorry to
say this but in your case it looks like your first
improvement step will not come from massive paralleling but
from first better understanding the world of geospatial data
and postgis.
Cheers,
Rémi-C
2016-01-18 19:30 GMT+01:00
Vincent Picavet (ml) <vincent.ml at oslandia.com>:
Hi Ravi,
On 18/01/2016 19:14, Ravi Pavuluri wrote:
> Hi All,
>
> I am checking if there is a way to process quickly
large datasets such
> as census blocks in PostGIS and also by leveraging big
data platform. I
> have few questions in this regard.
>
> 1) When I try intersect for sample census blocks with
another polygon
> layer, PostGIS 2.2(on Postgres 9.4) takes ~60 minutes
(after optimizing
> from http://postgis.net/2014/03/14/tip_intersection_faster/
) while on
> ESRI ArcMap takes ~10 minutes. PostGIS layers already
have geospatial
> indices. Is there anyway to optimize this further?
Following the links on your page, here is a good answer from
Paul (TL;DR
: st_intersection is slow, avoid it) :
http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562
> 2) What is an equivalent of ESRI Union in PostGIS? I
didn't see any out
> of the box functions and any tips here are
appreciated.
If ESRI Union makes a union, maybe st_union ? But I guess
there are some
semantic issues here.
> 3) Is there anyway we can expedite these
geoprocessing
> tasks(union/intersect etc) using big data platform (Ex:
hadoop)? Most
> examples talk about analysis (contains etc) but not
about geoprocessing
> on geospatial data. Any input is appreciated.
Lots of people do geoprocessing too with PostGIS, including
long-running
jobs on large volumes of data ( worldwide osm data
processing namely).
"Big data" is a really subjective word. Are your
geoprocessing needs
really parallelizable ? What kind of volumes are we talking
about ? MB,
GB, TB ? What kind of hardware do you have at hand ?
One way to do some sort of map-reduce with PostGIS is to use
a bunch of
servers with FDW connections between a source master and
these slaves,
map the data processing to the slave servers and reduce it
on the main
server. With a bit of Python as glue code this can be
automated and
quite efficient, even though this kind of sharding is not
automated (
yet ?).
Vincent
>
> Thanks,
> Ravi.
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
>
_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list