<div dir="ltr"><div><div><div>The way that were are thinking of handling something like that now would load the partitioned data into nodes. I imagine you want to load all the data into 1 table on 1 node.<br></div>This would result in a large table that might be difficult for simple queries, alternatively you could use the postgresql inheritance and load data and create indices by state.<br><br></div>Address the comment that Remi-C brought up.<br></div>I have spent a lot of time investigating Postgres-Xc / Postgres-XL / CitusDB and greenplum. None of them really parallelize your spatial functions. This is what we have found.<br><br>The current implementation of Paragon uses a variant of round-robin
declustering. The declustering algorithm produced 1024 spatial partitions after
processing the dataset in Table 1. The physical storage and management of the
partitions in Paragon is done by taking advantage of PostgreSQL’s sharding
feature [16]. We extended the SQL create table statement to specify spatial
declustering parameters, such as, the number of partitions to be created, the
declustering method, and a label for the declustering scheme. To execute a
spatial join, the labels of the two tables, being joined, must match. This
mechanism allows the same spatial dataset to be partitioned using different
declustering schemes.
<p class="MsoNormal" style="margin:6pt 0in"><span style="font-size:8pt">Table </span><span style="font-size:8pt">1.</span><span style="font-size:8pt"> Spatial data used for comparison</span></p>
<div align="center">
<table class="" style="border-collapse:collapse;border:medium none" border="1" cellpadding="0" cellspacing="0" width="343">
<tbody><tr style="mso-yfti-irow:0;mso-yfti-firstrow:yes">
<td style="width:117.9pt;border:1pt solid black;padding:0in 5.4pt" valign="top" width="157">
<p class="MsoNormal" style="margin-bottom:0.3pt"><b style="mso-bidi-font-weight:
normal"><span style="font-size:8pt">Database Table
(acronym)</span></b></p>
</td>
<td style="width:57.15pt;border-width:1pt 1pt 1pt medium;border-style:solid solid solid none;border-color:black black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="76">
<p class="MsoNormal" style="margin-bottom:0.3pt"><b style="mso-bidi-font-weight:
normal"><span style="font-size:8pt">Geometry</span></b></p>
</td>
<td style="width:82.35pt;border-width:1pt 1pt 1pt medium;border-style:solid solid solid none;border-color:black black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="110">
<p class="MsoNormal" style="margin-bottom:0.3pt"><b style="mso-bidi-font-weight:
normal"><span style="font-size:8pt">Number of Objects</span></b></p>
</td>
</tr>
<tr style="mso-yfti-irow:1">
<td style="width:117.9pt;border-width:medium 1pt 1pt;border-style:none solid solid;border-color:-moz-use-text-color black black;padding:0in 5.4pt" valign="top" width="157">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt">Area-water (Aw)</span></p>
</td>
<td style="width:57.15pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="76">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt">Polygon</span></p>
</td>
<td style="width:82.35pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="110">
<p class="MsoNormal" style="margin:0in 13.3pt 0.3pt 0in;text-align:right" align="right"><span style="font-size:8pt">39,334</span></p>
</td>
</tr>
<tr style="mso-yfti-irow:2">
<td style="width:117.9pt;border-width:medium 1pt 1pt;border-style:none solid solid;border-color:-moz-use-text-color black black;padding:0in 5.4pt" valign="top" width="157">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt">Area-landmass (AI)</span></p>
</td>
<td style="width:57.15pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="76">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt">Polygon </span></p>
</td>
<td style="width:82.35pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="110">
<p class="MsoNormal" style="margin:0in 13.3pt 0.3pt 0in;text-align:right" align="right"><span style="font-size:8pt">5,5951</span></p>
</td>
</tr>
<tr style="mso-yfti-irow:3;mso-yfti-lastrow:yes">
<td style="width:117.9pt;border-width:medium 1pt 1pt;border-style:none solid solid;border-color:-moz-use-text-color black black;padding:0in 5.4pt" valign="top" width="157">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt">Edge (Ed)</span></p>
</td>
<td style="width:57.15pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="76">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt">Polyline</span></p>
</td>
<td style="width:82.35pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="110">
<p class="MsoNormal" style="margin:0in 13.3pt 0.3pt 0in;text-align:right" align="right"><span style="font-size:8pt">4,173,498</span></p>
</td>
</tr>
</tbody></table>
</div>
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"> </p>
<p class="MsoNormal" style="margin:6pt 0in"><span style="font-size:8pt">Table 2.</span><span style="font-size:8pt"> Comparison of Query Times:
Paragon vs PostgreSQL</span></p>
<div align="center">
<table class="" style="border-collapse:collapse;border:medium none" border="1" cellpadding="0" cellspacing="0" width="340">
<tbody><tr style="mso-yfti-irow:0;mso-yfti-firstrow:yes">
<td style="width:97.3pt;border:1pt solid black;padding:0in 5.4pt" valign="top" width="130">
<p class="MsoNormal" style="margin-bottom:0.3pt"><b style="mso-bidi-font-weight:
normal"><span style="font-size:8pt">Query
(acronym)</span></b></p>
</td>
<td style="width:58.5pt;border-width:1pt 1pt 1pt medium;border-style:solid solid solid none;border-color:black black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="78">
<p class="MsoNormal" style="margin-bottom:0.3pt"><b style="mso-bidi-font-weight:
normal"><span style="font-size:8pt">PostgreSQL
(seconds)</span></b></p>
</td>
<td style="width:56.6pt;border-width:1pt 1pt 1pt medium;border-style:solid solid solid none;border-color:black black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="75">
<p class="MsoNormal" style="margin-bottom:0.3pt"><b style="mso-bidi-font-weight:
normal"><span style="font-size:8pt">Paragon
(seconds)</span></b></p>
</td>
<td style="width:42.4pt;border-width:1pt 1pt 1pt medium;border-style:solid solid solid none;border-color:black black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="57">
<p class="MsoNormal" style="margin-bottom:0.3pt"><b style="mso-bidi-font-weight:
normal"><span style="font-size:8pt">Speedup</span></b></p>
</td>
</tr>
<tr style="mso-yfti-irow:1">
<td style="width:97.3pt;border-width:medium 1pt 1pt;border-style:none solid solid;border-color:-moz-use-text-color black black;padding:0in 5.4pt" valign="top" width="130">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt">Polygon overlaps Polygon (Aw_ov_Aw)</span></p>
</td>
<td style="width:58.5pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="78">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt"><span style="mso-spacerun:yes"> </span><span style="mso-spacerun:yes"> </span><span style="mso-spacerun:yes"> </span>77.3</span></p>
</td>
<td style="width:56.6pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="75">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt"><span style="mso-spacerun:yes"> </span><span style="mso-spacerun:yes"> </span>53.5</span></p>
</td>
<td style="width:42.4pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="57">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt"><span style="mso-spacerun:yes"> </span>1.37</span></p>
</td>
</tr>
<tr style="mso-yfti-irow:2">
<td style="width:97.3pt;border-width:medium 1pt 1pt;border-style:none solid solid;border-color:-moz-use-text-color black black;padding:0in 5.4pt" valign="top" width="130">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt">Polyline Touches Polygon (ED_to_Al)</span></p>
</td>
<td style="width:58.5pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="78">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt"><span style="mso-spacerun:yes"> </span><span style="mso-spacerun:yes"> </span>452.9</span></p>
</td>
<td style="width:56.6pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="75">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt"><span style="mso-spacerun:yes"> </span><span style="mso-spacerun:yes"> </span>246.0</span></p>
</td>
<td style="width:42.4pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="57">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt"><span style="mso-spacerun:yes"> </span>1.84</span></p>
</td>
</tr>
<tr style="mso-yfti-irow:3;mso-yfti-lastrow:yes">
<td style="width:97.3pt;border-width:medium 1pt 1pt;border-style:none solid solid;border-color:-moz-use-text-color black black;padding:0in 5.4pt" valign="top" width="130">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt">Polyline Crosses Polyline (Ed_cr_Ed)</span></p>
</td>
<td style="width:58.5pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="78">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt"><span style="mso-spacerun:yes"> </span>1693.2</span></p>
</td>
<td style="width:56.6pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="75">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt"><span style="mso-spacerun:yes"> </span>1022.0</span></p>
</td>
<td style="width:42.4pt;border-width:medium 1pt 1pt medium;border-style:none solid solid none;border-color:-moz-use-text-color black black -moz-use-text-color;padding:0in 5.4pt" valign="top" width="57">
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify"><span style="font-size:8pt"><span style="mso-spacerun:yes"> </span>1.65</span></p>
</td>
</tr>
</tbody></table>
</div>
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify;text-indent:0.15in"> </p>
<p class="MsoNormal" style="margin-bottom:0.3pt;text-align:justify;text-indent:0.2in">We executed spatial join queries from the Jackpine spatial database
benchmark [6] with Paragon in a two node cluster. The queries are expressed in
SQL with some of the spatial predicates adopted by Open Geospatial Consortium
(OGC). For instance, Code 1 demonstrates the “<span style="mso-fareast-font-family:
"Courier New"">Polyline Touches Polygon”</span> query shown in Table 2.</p>
<p class="" style=""><a name="_GoBack">Code </a><span style="mso-bookmark:_GoBack"><span style="mso-no-proof:yes">1</span></span><span style="mso-bookmark:_GoBack">. Spatial SQL Query</span></p>
<p class="MsoNormal" style="margin-left:0.25in"><span style="mso-bookmark:_GoBack"><span style="font-size:8pt;font-family:"Courier New"">SELECT
COUNT(*) FROM edges ed, arealm al WHERE ST_Touches(ed.geom, al.geom);</span></span></p>
<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 27, 2016 at 8:56 PM, Lars Aksel Opsahl <span dir="ltr"><<a href="mailto:Lars.Opsahl@nibio.no" target="_blank">Lars.Opsahl@nibio.no</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi<br>
<br>
<br>
We have done some testing on this using a single Postgis server.<br>
<br>
<br>
-layer 1 has 7924019 rows with 11 columns and about 1 billion points.<br>
<br>
-layer 2 has 1088614 rows with 20 columns.<br>
<br>
Both layers covers all of Norway.<br>
<br>
<br>
I do a “esri” union in a psql function and get a new table with 27852836 rows and 30 columns with multipolygon. The size of the new table is about 40 GB.<br>
<br>
<br>
This is done in less than 3 hours (real 152m8.248s)<br>
<br>
<br>
I have made a psql function called func_esri.get_esri_union that I calls shown below.<br>
<br>
<br>
psql -t -q -o /tmp/vroom2.sql sl -c"drop table IF EXISTS sl_lop.r1; drop table IF EXISTS sl_lop.c1; select func_esri.get_esri_union('org_ar5arsversjon.ar5_2013_komm_flate id geo','org_ar.ar250_flate sl_sdeid geo', 'sl_lop.r1','sl_lop.c1',3000,false)"<br>
<br>
<br>
Then I take the output from this function and uses Gnu parallel to run the computed sqls in 20 parallel threads.<br>
<br>
<br>
time parallel -j 20 psql -h vroom2 -U postgres sl -c :::: /tmp/vroom2.sql<br>
<br>
<br>
This is fast Postgis server with ssd disks and a lot of memory and cpu.<br>
<br>
<br>
The basic idea is that I use <a href="https://github.com/larsop/content_balanced_grid/" rel="noreferrer" target="_blank">https://github.com/larsop/content_balanced_grid/</a> to make a grid and then create sqls adjusted for this grid. The size of the cells varies a lot. The 3000 parameter in the sql function sets the limit to max 3000 bounding box pr cell.<br>
<br>
<br>
I will post the code on git hub as soon as I have time, I need to clean it up and make some comments first.<br>
<br>
<br>
We also did a small comparison with Arcgis where we ran on a small subset of the tables and we got result file with 186372 rows. That took about 5 minutes with Arcgis software and 1 minute in postgres. This test was running on smaller postgres database server.<br>
<br>
<br>
Since there are different hardware for Arcgis and Postgis I will not put much in to this comparison, but my point is Postgis scales very good on big data having the right hardware and software.<br>
<br>
<br>
Lars<br>
<br>
<br>
________________________________<br>
Fra: postgis-users [<a href="mailto:postgis-users-bounces@lists.osgeo.org">postgis-users-bounces@lists.osgeo.org</a>] på vegne av Ravi Pavuluri [<a href="mailto:ravitheja@ymail.com">ravitheja@ymail.com</a>]<br>
Sendt: 27. januar 2016 21:31<br>
Til: PostGIS Users Discussion<br>
Emne: Re: [postgis-users] Geoprocessing & BigData<br>
<br>
Hi David,<br>
<br>
I are dealing with census blocks/census block groups spanning a few million records.<br>
<br>
Thanks,<br>
Ravi.<br>
<span class=""><br>
On Monday, January 25, 2016 10:18 AM, David Haynes <<a href="mailto:haynesd2@gmail.com">haynesd2@gmail.com</a>> wrote:<br>
<br>
<br>
We have done some work, implementing parallel spatial queries using a spatial declustering algorithm. How large are your datasets?<br>
<br>
</span><span class="">On Mon, Jan 18, 2016 at 1:51 PM, Rémi Cura <<a href="mailto:remi.cura@gmail.com">remi.cura@gmail.com</a><mailto:<a href="mailto:remi.cura@gmail.com">remi.cura@gmail.com</a>>> wrote:<br>
Hey,<br>
if you have one beefy server you can parallelize throwing several queries working on sub set of your data.<br>
(aka parallel processing trough data partition).<br>
One conceptual example : you want to process the world, you create 20 workers, a list of countries, and then make the worker process the list country by country.<br>
<br>
If you think one postgres server will not be sufficient,<br>
you could of course shard your data across several servers,<br>
with options ranging from writting from scratch (you rewrite everything),<br>
to using existing open source code, to dedicated solution like<br>
Postgresql-Xc, greenplum, ...<br>
<br>
However, sorry to say this but in your case it looks like your first improvement step will not come from massive paralleling but from first better understanding the world of geospatial data and postgis.<br>
<br>
Cheers,<br>
Rémi-C<br>
<br>
</span>2016-01-18 19:30 GMT+01:00 Vincent Picavet (ml) <<a href="mailto:vincent.ml@oslandia.com">vincent.ml@oslandia.com</a><mailto:<a href="mailto:vincent.ml@oslandia.com">vincent.ml@oslandia.com</a>>>:<br>
<div><div class="h5">Hi Ravi,<br>
<br>
<br>
<br>
<br>
On 18/01/2016 19:14, Ravi Pavuluri wrote:<br>
> Hi All,<br>
><br>
> I am checking if there is a way to process quickly large datasets such<br>
> as census blocks in PostGIS and also by leveraging big data platform. I<br>
> have few questions in this regard.<br>
><br>
> 1) When I try intersect for sample census blocks with another polygon<br>
> layer, PostGIS 2.2(on Postgres 9.4) takes ~60 minutes (after optimizing<br>
> from <a href="http://postgis.net/2014/03/14/tip_intersection_faster/" rel="noreferrer" target="_blank">http://postgis.net/2014/03/14/tip_intersection_faster/</a> ) while on<br>
> ESRI ArcMap takes ~10 minutes. PostGIS layers already have geospatial<br>
> indices. Is there anyway to optimize this further?<br>
<br>
Following the links on your page, here is a good answer from Paul (TL;DR<br>
: st_intersection is slow, avoid it) :<br>
<a href="http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562" rel="noreferrer" target="_blank">http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562</a><br>
<br>
> 2) What is an equivalent of ESRI Union in PostGIS? I didn't see any out<br>
> of the box functions and any tips here are appreciated.<br>
<br>
If ESRI Union makes a union, maybe st_union ? But I guess there are some<br>
semantic issues here.<br>
<br>
> 3) Is there anyway we can expedite these geoprocessing<br>
> tasks(union/intersect etc) using big data platform (Ex: hadoop)? Most<br>
> examples talk about analysis (contains etc) but not about geoprocessing<br>
> on geospatial data. Any input is appreciated.<br>
<br>
Lots of people do geoprocessing too with PostGIS, including long-running<br>
jobs on large volumes of data ( worldwide osm data processing namely).<br>
"Big data" is a really subjective word. Are your geoprocessing needs<br>
really parallelizable ? What kind of volumes are we talking about ? MB,<br>
GB, TB ? What kind of hardware do you have at hand ?<br>
<br>
One way to do some sort of map-reduce with PostGIS is to use a bunch of<br>
servers with FDW connections between a source master and these slaves,<br>
map the data processing to the slave servers and reduce it on the main<br>
server. With a bit of Python as glue code this can be automated and<br>
quite efficient, even though this kind of sharding is not automated (<br>
yet ?).<br>
<br>
Vincent<br>
<br>
><br>
> Thanks,<br>
> Ravi.<br>
><br>
><br>
> _______________________________________________<br>
> postgis-users mailing list<br>
</div></div>> <a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><mailto:<a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a>><br>
<span class="">> <a href="http://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">http://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
<br>
><br>
<br>
_______________________________________________<br>
postgis-users mailing list<br>
</span><a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><mailto:<a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a>><br>
<span class=""><a href="http://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">http://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
<br>
<br>
_______________________________________________<br>
postgis-users mailing list<br>
</span><a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><mailto:<a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a>><br>
<span class=""><a href="http://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">http://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
<br>
<br>
_______________________________________________<br>
postgis-users mailing list<br>
</span><a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><mailto:<a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a>><br>
<div class="HOEnZb"><div class="h5"><a href="http://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">http://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
<br>
<br>
_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">http://lists.osgeo.org/mailman/listinfo/postgis-users</a></div></div></blockquote></div><br></div>