[gdal-dev] OSM Driver and World Planet file (pbf format)

Even Rouault even.rouault at mines-paris.org
Sun Jul 29 08:49:10 PDT 2012


Le samedi 28 juillet 2012 13:39:48, Jukka Rahkonen a écrit :
> Even Rouault <even.rouault <at> mines-paris.org> writes:
> > I've commited in r24707 a change that is mainly a custom indexation
> > mechanism for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to
> > improve performances (Improve them about by a factor of 2 on a 1 GB PBF
> > on my PC)
> 
> I had a try with finland.osm.pbf and germany.osm.pbf with Windows 64-bit
> binaries containing that change. Conversion of the Finnish OSM data with
> ogr2ogr and the default osmconf.ini into Spatialite format took about 5
> minutes and it was a minute or two faster than it used to be. Conversion
> of German data took 17 hours and it was a about as slow as before.

Yes, the performance improvement isn't so obvious when I/O is the limiting 
factor.

However, the performance on germany.osm.pbf seemed very slow on your PC, but 
after testing on mine it takes ~9 hours, which seemed too slow since a 
conversion for the full planet-latest.osm.pbf (17 GB)  into "null" (this is a 
debug output driver, not compiled by default, that doesn't write anything) has 
taken ~ 30h (which, while looking at 
http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks, isn't particularly 
bad)

After investigations, most of the slowdown it is due to the building of the 
spatial index of the output spatialite DB. When the spatial index is created 
at DB initialization, and updated at each feature insertion, the performance 
is clearly affected. For example, when adding  -lco SPATIAL_INDEX=NO to the 
command line, the conversion of germany only takes 2 hours. Adding manually 
the spatial index at the end with ogrinfo the.db -sql "SELECT 
CreateSpatialIndex('points', 'GEOMETRY')" (and the same for lines, polygons, 
multilines, multipolygons, other_relations) takes ~ 22 minutes, so overall,  
this is 4 times faster.

In r24715, I've implemented defered spatial index creation. And indeed the 
whole process takes now ~ 2h20.

> I guess it may be the output to spatialite format that gets so slow when
> database size gets bigger. CPU usage was only couple of percents during
> the last 10 hours and process took only 100-200 MB of memory.
> What other output format could you recommend for testing?

I don't think the output format would change performance so much. What takes 
time is disk seeking to get nodes to build way geometries, or to get ways to 
build multi geometries. So having RAID disks might help. The writing of the 
output data might certainly reduce the efficiency of OS I/O caching, but except 
if an output format is particularly verbose comparing to others, that should 
have little influence.

What can speed-up things is to have lots of RAM and specify a huge value for 
OSM_MAX_TMPFILE_SIZE. Typically this would be 4 times the size of the PBF. 
However if the temp file(s) doesn't fit entirely into that size, this will not 
bring any advantage.


> 
> -Jukka Rahkonen-
> 
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev


More information about the gdal-dev mailing list