[gdal-dev] Reading the same OSM data in multiple threads

Even Rouault even.rouault at spatialys.com
Wed Jun 1 10:26:23 PDT 2016


Damian,

> 
> I'm trying to speed up processing of OSM data by opening an OSM file into
> multiple datasets in multiple threads. One dataset per thread. Each thread
> is processing a separate section of data, basically tiling the data.
> 
> I've however run into a scaling issue with the amount of memory allocated
> per dataset.
> 
> The Open in the OSM driver seems to allocate a lot of memory for buffers
> for processing regardless of the size of the data loaded.
> 
> So I have a couple of questions:
> 
> 1. is there away of reducing the memory load when reading OSM in multiple
> threads?

You may play with the OSM_MAX_TMPFILE_SIZE config option that defaults to 100 
(MB) / dataset.
If you are brave enough, you can edit ogr/ogrsf_frmts/osm/ogrosmdatasource.cpp 
and reduce the values of the #define MAX_DELAYED_FEATURES, 
MAX_ACCUMULATED_NODES and HASHED_INDEXES_ARRAY_SIZE (and possibly disabling 
ENABLE_NODE_LOOKUP_BY_HASHING in ogr_osm.h)

> 
> 2. Could I convert the OSM data into a different format that can be read
> efficiently from multiple threads? and what would that format be?
> My thought for (2) would be to load the data into a database and read from
> the database using ogr. If this is the correct way forward which database
> would be recommended (PostGIS, SpatialLite,...) ?

Reading the same OSM file from multiple threads is indeed probably an inefficient 
approach as they don't have spatial indices, so you'll end up reading the 
whole file completely for each tile. So prior conversion would probably be 
better for later scaling. SpatiaLite/GPKG are probably good choices.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list