[gdal-dev] OSM Driver and World Planet file (pbf format)

Even Rouault even.rouault at mines-paris.org
Tue Jul 31 12:17:02 PDT 2012


> Another set of tests with a brand new and quite powerful laptop.
>  Specs for the
> computer:
> Intel i7-2760QM @2.4 GHz processor (8 threads)
> Hitachi Travelstar Z7K320 7200 rpm SATA disk
> 8 GB of memory
> Windows 7, 64-bit
> 
> GDAL-version r24717, Win64 build from gisinternals.com
> 
> Timings for germany.osm.pbf (1.3 GB)
> ====================================
> 
> A) Default settings with command
> ogr2ogr -f sqlite -dsco spatialite=yes germany.sqlite
> germany.osm.pbf -gt 20000 -progress --config OGR_SQLITE_SYNCHRONOUS OFF
> 
> - reading the data               67 minutes
> - creating spatial indexes       38 minutes
> - total                         105 minutes
> 
> B) Using in-memory Spatialite db for the first step by giving
> SET OSM_MAX_TMPFILE_SIZE=7000
> 
> - reading the data              16 minutes
> - creating spatial indexes      38 minutes
> - total                         54 minutes
> 
> Peak memory usage during this conversion was 4.4 GB.
> 
> Conclusions
> ===========
> * The initial reading of data is heavily i/o bound. This phase
> is really fast if there is enough memory for keeping the OSM
> tempfile in memory but SSD disk seems to offer equally good
> performance.
> * Creating spatial indexes for the Spatialite tables is also
> i/o bound. The hardware sets the speed limit and there are
> no other tricks for improving the performance. Multi-core
> CPU is quite idle during this phase with 10-15% load.
> * If user does not plan to do spatial queries then then it
> may be handy to save some time and create the Spatialite db
> without spatial indexes by using -lco SPATIAL_INDEX=NO option.
> * Windows disk i/o may be a limiting factor.
> 
> I consider that for small OSM datasets the speed starts to be
> good enough. For me it is about the same if converting the
> Finnish OSM data (137 MB in .pbf format) takes 160 or 140
> seconds when using the default settings or in-memory temporary
> database, respectively.

Interesting findings.

A SSD is of course the ideal hardware to get efficient random access to the 
nodes.

I've just introduced inr 24719 a new config. option OSM_COMPRESS_NODES that can 
be set to YES. The effect is to use a compression algorithm while storing the 
temporary node DB.  This can compress to a factor of 3 or 4, and help keeping 
the node DB to a size where it is below the RAM size and that the OS can 
dramatically cache it (at least on Linux). This can be efficient for OSM 
extracts of the size of the country, but probably not for a planet file. In the 
case of Germany and France, here's the effect on my PC (SATA disk) :

$ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf -
progress --config OSM_COMPRESS_NODES YES
[...]
real    25m34.029s
user    15m11.530s
sys 0m36.470s

$ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf -
progress --config OSM_COMPRESS_NODES NO
[...]
real    74m33.077s
user    15m38.570s
sys 1m31.720s

$ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf -progress 
--config OSM_COMPRESS_NODES YES
[...]
real    7m46.594s
user    7m24.990s
sys 0m11.880s

$ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf -progress 
--config OSM_COMPRESS_NODES NO
[...]
real    108m48.967s
user    7m47.970s
sys 2m9.310s

I didn't turn it to YES by default, because I'm unsure of the performance 
impact on SSD. Perhaps you have a chance to test.





More information about the gdal-dev mailing list