<div dir="ltr">Thanks for the reply's.<div><br></div><div>We are doing a lot of processing of the data and need to retain that data in a vector format.</div><div><br></div><div>For now we are disabling the multi-threading for OSM data and bumping up the memory allowed to be allocated by a significant amount.</div><div><br></div><div>We will probably go with converting OSM to SpatialLite when the data is over a certain size.</div><div><br></div><div>Thanks</div><div>Damian</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Jun 1, 2016 at 6:25 PM Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Damian,<br>

<br>

><br>

> I'm trying to speed up processing of OSM data by opening an OSM file into<br>

> multiple datasets in multiple threads. One dataset per thread. Each thread<br>

> is processing a separate section of data, basically tiling the data.<br>

><br>

> I've however run into a scaling issue with the amount of memory allocated<br>

> per dataset.<br>

><br>

> The Open in the OSM driver seems to allocate a lot of memory for buffers<br>

> for processing regardless of the size of the data loaded.<br>

><br>

> So I have a couple of questions:<br>

><br>

> 1. is there away of reducing the memory load when reading OSM in multiple<br>

> threads?<br>

<br>

You may play with the OSM_MAX_TMPFILE_SIZE config option that defaults to 100<br>

(MB) / dataset.<br>

If you are brave enough, you can edit ogr/ogrsf_frmts/osm/ogrosmdatasource.cpp<br>

and reduce the values of the #define MAX_DELAYED_FEATURES,<br>

MAX_ACCUMULATED_NODES and HASHED_INDEXES_ARRAY_SIZE (and possibly disabling<br>

ENABLE_NODE_LOOKUP_BY_HASHING in ogr_osm.h)<br>

<br>

><br>

> 2. Could I convert the OSM data into a different format that can be read<br>

> efficiently from multiple threads? and what would that format be?<br>

> My thought for (2) would be to load the data into a database and read from<br>

> the database using ogr. If this is the correct way forward which database<br>

> would be recommended (PostGIS, SpatialLite,...) ?<br>

<br>

Reading the same OSM file from multiple threads is indeed probably an inefficient<br>

approach as they don't have spatial indices, so you'll end up reading the<br>

whole file completely for each tile. So prior conversion would probably be<br>

better for later scaling. SpatiaLite/GPKG are probably good choices.<br>

<br>

Even<br>

<br>

--<br>

Spatialys - Geospatial professional services<br>

<a href="http://www.spatialys.com" rel="noreferrer" target="_blank">http://www.spatialys.com</a><br>

</blockquote></div>