[gdal-dev] Reading the same OSM data in multiple threads

Andrew C Aitchison andrew at aitchison.me.uk
Wed Jun 1 09:39:38 PDT 2016


On Wed, 1 Jun 2016, Damian Dixon wrote:

> Hi,
>
> I'm trying to speed up processing of OSM data by opening an OSM file into
> multiple datasets in multiple threads. One dataset per thread. Each thread
> is processing a separate section of data, basically tiling the data.
>
> I've however run into a scaling issue with the amount of memory allocated
> per dataset.
>
> The Open in the OSM driver seems to allocate a lot of memory for buffers
> for processing regardless of the size of the data loaded.
>
> So I have a couple of questions:
>
> 1. is there away of reducing the memory load when reading OSM in multiple
> threads?

GDAL_CACHEMAX should control that
https://trac.osgeo.org/gdal/wiki/ConfigOptions

> 2. Could I convert the OSM data into a different format that can be read
> efficiently from multiple threads? and what would that format be?

If you are going to process it in tiles it might make sense to
store it tiled, eg with gtiff, perhaps generated with
   gdal_translate -of GTiff -co "TILED=YES" in.osm out.tif

You don't say what sort of processing you are doing;
unless the file fits in memory or you do *a lot* of processing
you are probably limited by disk/file access speed, and you might be
better off with *fewer* threads.

-- 
Andrew C Aitchison



More information about the gdal-dev mailing list