[postgis-users] Geoprocessing & BigData

David Haynes haynesd2 at gmail.com
Mon Jan 25 07:18:45 PST 2016


We have done some work, implementing parallel spatial queries using a
spatial declustering algorithm. How large are your datasets?

On Mon, Jan 18, 2016 at 1:51 PM, Rémi Cura <remi.cura at gmail.com> wrote:

> Hey,
> if you have one beefy server you can parallelize throwing several queries
> working on sub set of your data.
> (aka parallel processing trough data partition).
> One conceptual example : you want to process the world, you create 20
> workers, a list of countries, and then make the worker process the list
> country by country.
>
> If you think one postgres server will not be sufficient,
> you could of course shard your data across several servers,
> with options ranging from writting from scratch (you rewrite everything),
> to using existing open source code, to dedicated solution like
>  Postgresql-Xc, greenplum, ...
>
> However, sorry to say this but in your case it looks like your first
> improvement step will not come from massive paralleling but from first
> better understanding the world of geospatial data and postgis.
>
> Cheers,
> Rémi-C
>
> 2016-01-18 19:30 GMT+01:00 Vincent Picavet (ml) <vincent.ml at oslandia.com>:
>
>> Hi Ravi,
>>
>>
>>
>>
>> On 18/01/2016 19:14, Ravi Pavuluri wrote:
>> > Hi All,
>> >
>> > I am checking if there is a way to process quickly large datasets such
>> > as census blocks in PostGIS and also by leveraging big data platform. I
>> > have few questions in this regard.
>> >
>> > 1) When I try intersect for sample census blocks with another polygon
>> > layer, PostGIS 2.2(on Postgres 9.4) takes ~60 minutes (after optimizing
>> > from http://postgis.net/2014/03/14/tip_intersection_faster/ ) while on
>> > ESRI ArcMap takes ~10 minutes. PostGIS layers already have geospatial
>> > indices. Is there anyway to optimize this further?
>>
>> Following the links on your page, here is a good answer from Paul (TL;DR
>> : st_intersection is slow, avoid it) :
>>
>> http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562
>>
>> > 2) What is an equivalent of ESRI Union in PostGIS? I didn't see any out
>> > of the box functions and any tips here are appreciated.
>>
>> If ESRI Union makes a union, maybe st_union ? But I guess there are some
>> semantic issues here.
>>
>> > 3) Is there anyway we can expedite these geoprocessing
>> > tasks(union/intersect etc) using big data platform (Ex: hadoop)? Most
>> > examples talk about analysis (contains etc)  but not about geoprocessing
>> > on geospatial data. Any input is appreciated.
>>
>> Lots of people do geoprocessing too with PostGIS, including long-running
>> jobs on large volumes of data ( worldwide osm data processing namely).
>> "Big data" is a really subjective word. Are your geoprocessing needs
>> really parallelizable ? What kind of volumes are we talking about ? MB,
>> GB, TB ? What kind of hardware do you have at hand ?
>>
>> One way to do some sort of map-reduce with PostGIS is to use a bunch of
>> servers with FDW connections between a source master and these slaves,
>> map the data processing to the slave servers and reduce it on the main
>> server. With a bit of Python as glue code this can be automated and
>> quite efficient, even though this kind of sharding is not automated (
>> yet ?).
>>
>> Vincent
>>
>> >
>> > Thanks,
>> > Ravi.
>> >
>> >
>> > _______________________________________________
>> > postgis-users mailing list
>> > postgis-users at lists.osgeo.org
>> > http://lists.osgeo.org/mailman/listinfo/postgis-users
>> >
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/postgis-users
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20160125/efd6f862/attachment.html>


More information about the postgis-users mailing list