[gdal-dev] Dissolve large amount of geometries

Andreas Oxenstierna ao at t-kartor.se
Mon Jul 16 23:35:05 PDT 2018


If it is 20 mill. coordinates, it could be faster than 80 minutes, I 
guess in the region of 10 minutes.

I would:
1 Combine 3+4 to something like (I assume ST_Multi is not needed) - this 
avoids data duplication and one unnecessary transaction commit
    SELECT randfield, (st_dump(ST_Union(f.geom))).geom AS geom FROM 
fishnet AS f GROUP BY randfield

2 Remove the CREATE INDEX and/or add a spatial index - I do not really 
understand the random union logic

3 CLUSTER any index and ANALYZE (after CREATE INDEX)

4 Check PostgreSQL configs (work_mem, shared_buffers etc) - your stats 
shows a non-linear scaling above 100000 polys so db/hardware limits are hit

5 Disable autovacuum - may interfere during ST_Union execution

6 Execute on real hardware - no (cheap) laptop and Linux is faster - as 
an example this kind of PostGIS execution can be 2-3 times faster on 
iMac compared with "standard laptop"

7 Step 2 may be more efficient as well with
ADD COLUMN randField integer DEFAULT ceil(RANDOM()*10)

> I took the advice of Andreas and converted my code to using PostGIS.
> And the speed difference is enormous.
>
> The commands I've used:
> // Import shapefile into PostGIS:
> ogr2ogr -f PostgreSQL PG:"host=localhost user=..." fishnet.shp -gt 
> unlimited -lco GEOMETRY_NAME=geom -a_srs "EPSG:28992"
> // Add random data:
> ogrinfo PG:"host=localhost user=..." -sql "ALTER TABLE fishnet ADD 
> COLUMN randField integer;UPDATE fishnet SET randField = 
> ceil(RANDOM()*10);CREATE INDEX randfield_idx ON fishnet (randField);
> // Dissolve:
> ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS 
> SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f 
> GROUP BY randfield"
> // Export to shapefile
> ogr2ogr -f "ESRI Shapefile" taskmap.shp PG:"host=localhost user=..." 
> -sql "SELECT randfield, (st_dump(geom)).geom AS geom FROM dissolved" 
> -overwrite -gt unlimited
> // Clean up:
> ogrinfo PG:"host=localhost user=..." -sql "DROP TABLE IF EXISTS 
> fishnet CASCADE;DROP TABLE IF EXISTS dissolved CASCADE"
>
> The timing of the steps, after I converted the above commands to C#:
> | #shapes |Import     |Add values |  Dissolve | Export |
> |1,677 |00:00:00.29|00:00:00.14|00:00:00.10|00:00:00.05|
> |4,810 |00:00:00.28|00:00:00.20|00:00:01.11|00:00:00.15|
> |18,415  |00:00:00.78|00:00:00.53|00:00:04.57|00:00:00.31|
> |72,288  |00:00:02.07|00:00:02.13|00:00:22.56|00:00:01.02|
> |285,927 |00:00:07.58|00:00:06.68|00:01:59.59|00:00:03.82|
> |1,139,424|00:00:26.63|00:00:33.51|00:11:34.63|00:00:15.19|
> |4,546,854|00:01:46.19|00:03:51.92|01:10:07.41|00:00:57.51|
>
> 4.5 million squares of 0.3m have a total area of about 40 ha.
> That is good enough for me. The total time will be around 80 min, 
> instead of days using SQLite.
>
> I'm assuming I can't speed up this command anymore, right?
> ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS 
> SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f 
> GROUP BY randfield"
>
> Thanks all for your valuable help.
>
> Regards,
>
> Paul
>
> Op ma 16 jul. 2018 om 11:30 schreef Andreas Oxenstierna 
> <ao at t-kartor.se <mailto:ao at t-kartor.se>>:
>
>     ST_Union in PostGIS should scale better than SQLite.
>     ST_Dump gives you singlepart geometries.
>
>     Best Regards
>
>     Andreas Oxenstierna
>

-- 
Hälsningar

Andreas Oxenstierna
T-Kartor Geospatial AB
mobile: +46 733 206831
mailto: ao at t-kartor.se
http://www.t-kartor.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20180717/21704952/attachment-0001.html>


More information about the gdal-dev mailing list