<div dir="ltr">Thanks for the reply's.<div><br></div><div>We are doing a lot of processing of the data and need to retain that data in a vector format.</div><div><br></div><div>For now we are disabling the multi-threading for OSM data and bumping up the memory allowed to be allocated by a significant amount.</div><div><br></div><div>We will probably go with converting OSM to SpatialLite when the data is over a certain size.</div><div><br></div><div>Thanks</div><div>Damian</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Jun 1, 2016 at 6:25 PM Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Damian,<br>
<br>
><br>
> I'm trying to speed up processing of OSM data by opening an OSM file into<br>
> multiple datasets in multiple threads. One dataset per thread. Each thread<br>
> is processing a separate section of data, basically tiling the data.<br>
><br>
> I've however run into a scaling issue with the amount of memory allocated<br>
> per dataset.<br>
><br>
> The Open in the OSM driver seems to allocate a lot of memory for buffers<br>
> for processing regardless of the size of the data loaded.<br>
><br>
> So I have a couple of questions:<br>
><br>
> 1. is there away of reducing the memory load when reading OSM in multiple<br>
> threads?<br>
<br>
You may play with the OSM_MAX_TMPFILE_SIZE config option that defaults to 100<br>
(MB) / dataset.<br>
If you are brave enough, you can edit ogr/ogrsf_frmts/osm/ogrosmdatasource.cpp<br>
and reduce the values of the #define MAX_DELAYED_FEATURES,<br>
MAX_ACCUMULATED_NODES and HASHED_INDEXES_ARRAY_SIZE (and possibly disabling<br>
ENABLE_NODE_LOOKUP_BY_HASHING in ogr_osm.h)<br>
<br>
><br>
> 2. Could I convert the OSM data into a different format that can be read<br>
> efficiently from multiple threads? and what would that format be?<br>
> My thought for (2) would be to load the data into a database and read from<br>
> the database using ogr. If this is the correct way forward which database<br>
> would be recommended (PostGIS, SpatialLite,...) ?<br>
<br>
Reading the same OSM file from multiple threads is indeed probably an inefficient<br>
approach as they don't have spatial indices, so you'll end up reading the<br>
whole file completely for each tile. So prior conversion would probably be<br>
better for later scaling. SpatiaLite/GPKG are probably good choices.<br>
<br>
Even<br>
<br>
--<br>
Spatialys - Geospatial professional services<br>
<a href="http://www.spatialys.com" rel="noreferrer" target="_blank">http://www.spatialys.com</a><br>
</blockquote></div>