<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto">ST_Union in PostGIS should scale better than SQLite. <div>ST_Dump gives you singlepart geometries. </div><div><br><div>Best Regards<div><br></div><div>Andreas Oxenstierna</div><div><br></div><div><div><br></div></div></div><div><br>16 juli 2018 kl. 10:53 skrev Paul Meems <<a href="mailto:bontepaarden@gmail.com">bontepaarden@gmail.com</a>>:<br><br></div><blockquote type="cite"><div><div dir="ltr"><div>Thanks, Jon for your suggestion of GeoPandas.</div><div>Unfortunately, I'm not allowed to use new external dependencies.</div><div><br></div><div>I tried doing all steps in an SQLite file instead of using several intermediate shapefiles. And I had some good results, so I created a script dissolving an increasingly higher number of shapes.</div><div>Later I realized the performance increase was because in the new script I forgot '-<i>explodecollections</i>'. This makes a huge difference. For now, I'll keep the multipart polygons.</div><div><br></div><div>These commands I converted to a C# unit test:</div><div><font face="monospace, monospace">// Convert fishnet shapefile to SQLite:</font></div><div><font face="monospace, monospace">ogr2ogr -f SQLite taskmap.sqlite "Fishnet.shp" -nln fishnet -nlt POLYGON -dsco SPATIALITE=YES -lco SPATIAL_INDEX=NO -gt unlimited --config OGR_SQLITE_CACHE 4096 --config OGR_SQLITE_SYNCHRONOUS OFF </font></div><div><font face="monospace, monospace">// Add field:</font></div><div><font face="monospace, monospace">ogrinfo taskmap.sqlite -sql "ALTER TABLE fishnet ADD COLUMN randField real"</font></div><div><font face="monospace, monospace">// Fill random values:</font></div><div><font face="monospace, monospace">ogrinfo taskmap.sqlite -sql "UPDATE fishnet SET randField = ABS(RANDOM() % 10)"</font></div><div><font face="monospace, monospace">// Create index:</font></div><div><font face="monospace, monospace">ogrinfo taskmap.sqlite -sql "CREATE INDEX randfield_idx ON fishnet (randField)"</font></div><div><font face="monospace, monospace">// Combined dissolve and export:</font></div><div><font face="monospace, monospace">ogr2ogr -f "ESRI Shapefile" -overwrite taskmap.shp taskmap.sqlite -sql "SELECT ST_Union(geometry) as geom, randField FROM fishnet GROUP BY randField" -gt unlimited --config OGR_SQLITE_CACHE 4096 --config OGR_SQLITE_SYNCHRONOUS OFF</font></div><div><br></div><div>Some timing:</div><div>1,677 shapes --> 0.3s</div><div>4,810 shapes --> 1.8s</div><div>18,415 shapes --> 21.4s</div><div>72,288 shapes --> 5min, 54s</div><div>285,927 shapes --> 25m</div><div>1,139,424 shapes --> 6h, 47m</div><div>4,557,696 shapes --> Still running for 34h</div><div><br></div><div>4 million shapes are the amount my application needs to handle, but running for days is not an option.</div><div><br></div><div>I noticed my script is using only a fraction of my resources: 30% RAM (of 12GB), 22-28% CPU (on 8 cores).</div><div>How can I let GDAL use more resources? Might it speed up the process?</div><div><br></div><div>I also read about CascadedUnion of GEOS. Can I also use it with GDAL/OGR? If so how?</div><div>And would it help to enable GPU? If so, do I need a special build? I'm now using the Windows-64bit of <a href="http://gisinternals.com">gisinternals.com</a></div><div><br></div><div>Thanks again for any pointers and/or suggestions.</div><div><br></div><div>Paul<br></div><div class="gmail_quote"></div></div>
</div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>gdal-dev mailing list</span><br><span><a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a></span><br><span><a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a></span></div></blockquote></div></body></html>