[gdal-dev] Dissolve large amount of geometries

Andrew C Aitchison andrew at aitchison.me.uk
Mon Jul 16 02:27:28 PDT 2018


On Mon, 16 Jul 2018, Paul Meems wrote:

> Thanks, Jon for your suggestion of GeoPandas.
> Unfortunately, I'm not allowed to use new external dependencies.

> Some timing:
> 1,677 shapes --> 0.3s
> 4,810 shapes --> 1.8s
> 18,415 shapes --> 21.4s
> 72,288 shapes --> 5min, 54s
> 285,927 shapes --> 25m
> 1,139,424 shapes --> 6h, 47m
> 4,557,696 shapes --> Still running for 34h
>
> 4 million shapes are the amount my application needs to handle, but running
> for days is not an option.
>
> I noticed my script is using only a fraction of my resources: 30% RAM (of
> 12GB), 22-28% CPU (on 8 cores).
> How can I let GDAL use more resources? Might it speed up the process?

If you aren't using most of your CPU or memory, I'd guess that reading 
from or writing to disk is the bottleneck. I'm not sure whether ogr uses
GDAL_CACHEMAX, but you could try
     export GDAL_CACHEMAX=12288
to make gdal use 12GB of cache (default is 40MB or 5% of RAM).
If the bottleneck is in sqlite you might be able to do something equivalent
there. If the bottleneck is writing the file, perhaps a ram disk might 
make sense ?

-- 
Andrew C. Aitchison					Cambridge, UK
 			andrew at aitchison.me.uk


More information about the gdal-dev mailing list