[gdal-dev] OSM Driver and World Planet file (pbf format)
even.rouault at mines-paris.org
Sun Jul 29 08:49:10 PDT 2012
Le samedi 28 juillet 2012 13:39:48, Jukka Rahkonen a écrit :
> Even Rouault <even.rouault <at> mines-paris.org> writes:
> > I've commited in r24707 a change that is mainly a custom indexation
> > mechanism for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to
> > improve performances (Improve them about by a factor of 2 on a 1 GB PBF
> > on my PC)
> I had a try with finland.osm.pbf and germany.osm.pbf with Windows 64-bit
> binaries containing that change. Conversion of the Finnish OSM data with
> ogr2ogr and the default osmconf.ini into Spatialite format took about 5
> minutes and it was a minute or two faster than it used to be. Conversion
> of German data took 17 hours and it was a about as slow as before.
Yes, the performance improvement isn't so obvious when I/O is the limiting
However, the performance on germany.osm.pbf seemed very slow on your PC, but
after testing on mine it takes ~9 hours, which seemed too slow since a
conversion for the full planet-latest.osm.pbf (17 GB) into "null" (this is a
debug output driver, not compiled by default, that doesn't write anything) has
taken ~ 30h (which, while looking at
http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks, isn't particularly
After investigations, most of the slowdown it is due to the building of the
spatial index of the output spatialite DB. When the spatial index is created
at DB initialization, and updated at each feature insertion, the performance
is clearly affected. For example, when adding -lco SPATIAL_INDEX=NO to the
command line, the conversion of germany only takes 2 hours. Adding manually
the spatial index at the end with ogrinfo the.db -sql "SELECT
CreateSpatialIndex('points', 'GEOMETRY')" (and the same for lines, polygons,
multilines, multipolygons, other_relations) takes ~ 22 minutes, so overall,
this is 4 times faster.
In r24715, I've implemented defered spatial index creation. And indeed the
whole process takes now ~ 2h20.
> I guess it may be the output to spatialite format that gets so slow when
> database size gets bigger. CPU usage was only couple of percents during
> the last 10 hours and process took only 100-200 MB of memory.
> What other output format could you recommend for testing?
I don't think the output format would change performance so much. What takes
time is disk seeking to get nodes to build way geometries, or to get ways to
build multi geometries. So having RAID disks might help. The writing of the
output data might certainly reduce the efficiency of OS I/O caching, but except
if an output format is particularly verbose comparing to others, that should
have little influence.
What can speed-up things is to have lots of RAM and specify a huge value for
OSM_MAX_TMPFILE_SIZE. Typically this would be 4 times the size of the PBF.
However if the temp file(s) doesn't fit entirely into that size, this will not
bring any advantage.
> -Jukka Rahkonen-
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
More information about the gdal-dev