<html><head></head><body><div style="color:#000; background-color:#fff; font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:12px"><div id="yui_3_16_0_1_1453926420135_3902"><span>Hi David,</span></div><div id="yui_3_16_0_1_1453926420135_3951"><br><span></span></div><div id="yui_3_16_0_1_1453926420135_3983"><span id="yui_3_16_0_1_1453926420135_3982">I are dealing with census blocks/census block groups spanning a few million records.</span></div> <div id="yui_3_16_0_1_1453926420135_3946" class="qtdSeparateBR"><br><div id="yui_3_16_0_1_1453926420135_4013">Thanks,</div><div id="yui_3_16_0_1_1453926420135_4014">Ravi.</div><div id="yui_3_16_0_1_1453926420135_4015"><br></div></div><div style="display: block;" id="yui_3_16_0_1_1453926420135_3927" class="yahoo_quoted"> <div id="yui_3_16_0_1_1453926420135_3926" style="font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif; font-size: 12px;"> <div id="yui_3_16_0_1_1453926420135_3925" style="font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif; font-size: 16px;"> <div id="yui_3_16_0_1_1453926420135_4017" dir="ltr"><font id="yui_3_16_0_1_1453926420135_4016" size="2" face="Arial"> On Monday, January 25, 2016 10:18 AM, David Haynes <haynesd2@gmail.com> wrote:<br></font></div>  <br><br> <div id="yui_3_16_0_1_1453926420135_3924" class="y_msg_container"><div id="yiv3682703813"><div id="yui_3_16_0_1_1453926420135_3923"><div id="yui_3_16_0_1_1453926420135_3928" dir="ltr">We have done some work, implementing parallel spatial queries using a spatial declustering algorithm. How large are your datasets?</div><div id="yui_3_16_0_1_1453926420135_3922" class="yiv3682703813gmail_extra"><br clear="none"><div id="yui_3_16_0_1_1453926420135_4087" class="yiv3682703813gmail_quote">On Mon, Jan 18, 2016 at 1:51 PM, Rémi Cura <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:remi.cura@gmail.com" target="_blank" href="mailto:remi.cura@gmail.com">remi.cura@gmail.com</a>></span> wrote:<br clear="none"><blockquote id="yui_3_16_0_1_1453926420135_4086" class="yiv3682703813gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div id="yui_3_16_0_1_1453926420135_4085" dir="ltr"><div id="yui_3_16_0_1_1453926420135_4095" class="yiv3682703813gmail_default" style="font-family:monospace, monospace;">Hey,<br clear="none"></div><div id="yui_3_16_0_1_1453926420135_4094" class="yiv3682703813gmail_default" style="font-family:monospace, monospace;">if you have one beefy server you can parallelize throwing several queries working on sub set of your data.<br clear="none"></div><div class="yiv3682703813gmail_default" style="font-family:monospace, monospace;">(aka parallel processing trough data partition).<br clear="none"></div><div class="yiv3682703813gmail_default" style="font-family:monospace, monospace;">One conceptual example : you want to process the world, you create 20 workers, a list of countries, and then make the worker process the list country by country.<br clear="none"><br clear="none"></div><div id="yui_3_16_0_1_1453926420135_4084" class="yiv3682703813gmail_default" style="font-family:monospace, monospace;">If you think one postgres server will not be sufficient,<br clear="none">you could of course shard your data across several servers, <br clear="none">with options ranging from writting from scratch (you rewrite everything),<br clear="none">to using existing open source code, to dedicated solution like<br clear="none"> Postgresql-Xc, greenplum, ...<br clear="none"><br clear="none"></div><div class="yiv3682703813gmail_default" style="font-family:monospace, monospace;">However, sorry to say this but in your case it looks like your first improvement step will not come from massive paralleling but from first better understanding the world of geospatial data and postgis.<br clear="none"></div><div class="yiv3682703813gmail_default" style="font-family:monospace, monospace;"><br clear="none"></div><div class="yiv3682703813gmail_default" style="font-family:monospace, monospace;">Cheers,<br clear="none"></div><div id="yui_3_16_0_1_1453926420135_4088" class="yiv3682703813gmail_default" style="font-family:monospace, monospace;">Rémi-C<br clear="none"></div></div><div id="yui_3_16_0_1_1453926420135_4092" class="yiv3682703813HOEnZb"><div id="yui_3_16_0_1_1453926420135_4091" class="yiv3682703813h5"><div id="yui_3_16_0_1_1453926420135_4090" class="yiv3682703813gmail_extra"><br clear="none"><div id="yui_3_16_0_1_1453926420135_4089" class="yiv3682703813gmail_quote">2016-01-18 19:30 GMT+01:00 Vincent Picavet (ml) <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:vincent.ml@oslandia.com" target="_blank" href="mailto:vincent.ml@oslandia.com">vincent.ml@oslandia.com</a>></span>:<br clear="none"><blockquote id="yui_3_16_0_1_1453926420135_4093" class="yiv3682703813gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Ravi,<br clear="none">
<br clear="none">
<br clear="none">
<br clear="none">
<br clear="none">
On 18/01/2016 19:14, Ravi Pavuluri wrote:<br clear="none">
> Hi All,<br clear="none">
><br clear="none">
> I am checking if there is a way to process quickly large datasets such<br clear="none">
> as census blocks in PostGIS and also by leveraging big data platform. I<br clear="none">
> have few questions in this regard.<br clear="none">
><br clear="none">
> 1) When I try intersect for sample census blocks with another polygon<br clear="none">
> layer, PostGIS 2.2(on Postgres 9.4) takes ~60 minutes (after optimizing<br clear="none">
> from <a rel="nofollow" shape="rect" target="_blank" href="http://postgis.net/2014/03/14/tip_intersection_faster/">http://postgis.net/2014/03/14/tip_intersection_faster/</a> ) while on<br clear="none">
> ESRI ArcMap takes ~10 minutes. PostGIS layers already have geospatial<br clear="none">
> indices. Is there anyway to optimize this further?<br clear="none">
<br clear="none">
Following the links on your page, here is a good answer from Paul (TL;DR<br clear="none">
: st_intersection is slow, avoid it) :<br clear="none">
<a rel="nofollow" shape="rect" target="_blank" href="http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562">http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562</a><br clear="none">
<br clear="none">
> 2) What is an equivalent of ESRI Union in PostGIS? I didn't see any out<br clear="none">
> of the box functions and any tips here are appreciated.<br clear="none">
<br clear="none">
If ESRI Union makes a union, maybe st_union ? But I guess there are some<br clear="none">
semantic issues here.<br clear="none">
<br clear="none">
> 3) Is there anyway we can expedite these geoprocessing<br clear="none">
> tasks(union/intersect etc) using big data platform (Ex: hadoop)? Most<br clear="none">
> examples talk about analysis (contains etc)  but not about geoprocessing<br clear="none">
> on geospatial data. Any input is appreciated.<br clear="none">
<br clear="none">
Lots of people do geoprocessing too with PostGIS, including long-running<br clear="none">
jobs on large volumes of data ( worldwide osm data processing namely).<br clear="none">
"Big data" is a really subjective word. Are your geoprocessing needs<br clear="none">
really parallelizable ? What kind of volumes are we talking about ? MB,<br clear="none">
GB, TB ? What kind of hardware do you have at hand ?<br clear="none">
<br clear="none">
One way to do some sort of map-reduce with PostGIS is to use a bunch of<br clear="none">
servers with FDW connections between a source master and these slaves,<br clear="none">
map the data processing to the slave servers and reduce it on the main<br clear="none">
server. With a bit of Python as glue code this can be automated and<br clear="none">
quite efficient, even though this kind of sharding is not automated (<br clear="none">
yet ?).<br clear="none">
<br clear="none">
Vincent<br clear="none">
<br clear="none">
><br clear="none">
> Thanks,<br clear="none">
> Ravi.<br clear="none">
><br clear="none">
><br clear="none">
> _______________________________________________<br clear="none">
> postgis-users mailing list<br clear="none">
> <a rel="nofollow" shape="rect" ymailto="mailto:postgis-users@lists.osgeo.org" target="_blank" href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br clear="none">
> <a rel="nofollow" shape="rect" target="_blank" href="http://lists.osgeo.org/mailman/listinfo/postgis-users">http://lists.osgeo.org/mailman/listinfo/postgis-users</a><div class="yiv3682703813yqt6439248620" id="yiv3682703813yqtfd43505"><br clear="none">
><br clear="none">
<br clear="none">
_______________________________________________<br clear="none">
postgis-users mailing list<br clear="none">
<a rel="nofollow" shape="rect" ymailto="mailto:postgis-users@lists.osgeo.org" target="_blank" href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br clear="none">
<a rel="nofollow" shape="rect" target="_blank" href="http://lists.osgeo.org/mailman/listinfo/postgis-users">http://lists.osgeo.org/mailman/listinfo/postgis-users</a></div></blockquote></div><div class="yiv3682703813yqt6439248620" id="yiv3682703813yqtfd15607"><br clear="none"></div></div><div class="yiv3682703813yqt6439248620" id="yiv3682703813yqtfd14795">
</div></div></div><div class="yiv3682703813yqt6439248620" id="yiv3682703813yqtfd42358"><br clear="none">_______________________________________________<br clear="none">
postgis-users mailing list<br clear="none">
<a rel="nofollow" shape="rect" ymailto="mailto:postgis-users@lists.osgeo.org" target="_blank" href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br clear="none">
<a rel="nofollow" shape="rect" target="_blank" href="http://lists.osgeo.org/mailman/listinfo/postgis-users">http://lists.osgeo.org/mailman/listinfo/postgis-users</a><br clear="none"></div></blockquote></div><div class="yiv3682703813yqt6439248620" id="yiv3682703813yqtfd67381"><br clear="none"></div></div></div></div><br><div class="yqt6439248620" id="yqtfd35735">_______________________________________________<br clear="none">postgis-users mailing list<br clear="none"><a shape="rect" ymailto="mailto:postgis-users@lists.osgeo.org" href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br clear="none"><a shape="rect" href="http://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">http://lists.osgeo.org/mailman/listinfo/postgis-users</a></div><br><br></div>  </div> </div>  </div></div></body></html>