[gdal-dev] Dissolve large amount of geometries
Andreas Oxenstierna
ao at t-kartor.se
Mon Jul 16 23:35:05 PDT 2018
If it is 20 mill. coordinates, it could be faster than 80 minutes, I
guess in the region of 10 minutes.
I would:
1 Combine 3+4 to something like (I assume ST_Multi is not needed) - this
avoids data duplication and one unnecessary transaction commit
SELECT randfield, (st_dump(ST_Union(f.geom))).geom AS geom FROM
fishnet AS f GROUP BY randfield
2 Remove the CREATE INDEX and/or add a spatial index - I do not really
understand the random union logic
3 CLUSTER any index and ANALYZE (after CREATE INDEX)
4 Check PostgreSQL configs (work_mem, shared_buffers etc) - your stats
shows a non-linear scaling above 100000 polys so db/hardware limits are hit
5 Disable autovacuum - may interfere during ST_Union execution
6 Execute on real hardware - no (cheap) laptop and Linux is faster - as
an example this kind of PostGIS execution can be 2-3 times faster on
iMac compared with "standard laptop"
7 Step 2 may be more efficient as well with
ADD COLUMN randField integer DEFAULT ceil(RANDOM()*10)
> I took the advice of Andreas and converted my code to using PostGIS.
> And the speed difference is enormous.
>
> The commands I've used:
> // Import shapefile into PostGIS:
> ogr2ogr -f PostgreSQL PG:"host=localhost user=..." fishnet.shp -gt
> unlimited -lco GEOMETRY_NAME=geom -a_srs "EPSG:28992"
> // Add random data:
> ogrinfo PG:"host=localhost user=..." -sql "ALTER TABLE fishnet ADD
> COLUMN randField integer;UPDATE fishnet SET randField =
> ceil(RANDOM()*10);CREATE INDEX randfield_idx ON fishnet (randField);
> // Dissolve:
> ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS
> SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f
> GROUP BY randfield"
> // Export to shapefile
> ogr2ogr -f "ESRI Shapefile" taskmap.shp PG:"host=localhost user=..."
> -sql "SELECT randfield, (st_dump(geom)).geom AS geom FROM dissolved"
> -overwrite -gt unlimited
> // Clean up:
> ogrinfo PG:"host=localhost user=..." -sql "DROP TABLE IF EXISTS
> fishnet CASCADE;DROP TABLE IF EXISTS dissolved CASCADE"
>
> The timing of the steps, after I converted the above commands to C#:
> | #shapes |Import |Add values | Dissolve | Export |
> |1,677 |00:00:00.29|00:00:00.14|00:00:00.10|00:00:00.05|
> |4,810 |00:00:00.28|00:00:00.20|00:00:01.11|00:00:00.15|
> |18,415 |00:00:00.78|00:00:00.53|00:00:04.57|00:00:00.31|
> |72,288 |00:00:02.07|00:00:02.13|00:00:22.56|00:00:01.02|
> |285,927 |00:00:07.58|00:00:06.68|00:01:59.59|00:00:03.82|
> |1,139,424|00:00:26.63|00:00:33.51|00:11:34.63|00:00:15.19|
> |4,546,854|00:01:46.19|00:03:51.92|01:10:07.41|00:00:57.51|
>
> 4.5 million squares of 0.3m have a total area of about 40 ha.
> That is good enough for me. The total time will be around 80 min,
> instead of days using SQLite.
>
> I'm assuming I can't speed up this command anymore, right?
> ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS
> SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f
> GROUP BY randfield"
>
> Thanks all for your valuable help.
>
> Regards,
>
> Paul
>
> Op ma 16 jul. 2018 om 11:30 schreef Andreas Oxenstierna
> <ao at t-kartor.se <mailto:ao at t-kartor.se>>:
>
> ST_Union in PostGIS should scale better than SQLite.
> ST_Dump gives you singlepart geometries.
>
> Best Regards
>
> Andreas Oxenstierna
>
--
Hälsningar
Andreas Oxenstierna
T-Kartor Geospatial AB
mobile: +46 733 206831
mailto: ao at t-kartor.se
http://www.t-kartor.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20180717/21704952/attachment-0001.html>
More information about the gdal-dev
mailing list