[gdal-dev] New OGR driver to read OpenStreetMap .osm / .pbf files

Thu Jul 19 02:25:42 PDT 2012

Even Rouault wrote:

>> Windows 7, 64-bit, SATA disk and 2x3 GHz is converting Finland.osm.pbf in
>> about
>> 8 minutes for me. But execution time does not increase at all linearly.
>> Germany.osm.pbf is about 10 times larger in filesize but ogr2ogr had to work
>> about 8 hours with it, thus roughly 70 times longer. Obviously I would save 6
>> hours by splitting the German pbf file into 10 smaller ones, running ogr2ogr
>> ten
>> times and combining results with ogrtileindex for use with Mapserver.

> I suppose there is a big discontinuity at some point. While the temporary
> database can fit into RAM, and then in the I/O cache of the operating system,
> the performance must be reasonably good. But when it grows above, you'll get
> disk access for almost every way and this will be sluggish.

Test with Germany.osm.pbf was more sluggish than I thought. Process was not 
complete and and finally it took 15 hours to run. Solving the relations must
have been heavy for ogr2ogr.

> Do you have comparisons of the performance with osm2pgsql on the same PC and
> with the same data ? I'd be curious if that slow down effect is found with every
> tool, or if it is something specific to the way sqlite is used, or if other
> tools do more clever things when indexing or retrieving nodes.

I fear that my comparisons would not give very much information.  Osm2pgsql is not at
all optimised for this kind of task. I would need to run it in a slim mode and 
then osm2pgsql is doing whole lot of things for preparing database to 
accept updates with the diff files. I suppose that osm2pgsql could be much 
faster if it had some kind of "slim, with no diff-support" mode.  A great part
of the job is also performed by PostgreSQL and db parameters seem to have
a big influence.
>From my own experience I can say that osm2pgsql gets very slow and unrieliable
on weak computers. Linux box with 700 MB of memory cannot import finland.osm
extract at all, and with my Window laptop with 2 GB it takes two or three hours.
Ogr2ogr on the same machine did the conversion in 40 minutes.
Let's hope that some OSM developer gets interested in this task.

> I suppose splitting could help, but I don't see an obvious reason why an
> intelligent splitting would be faster. I mention "intelligent" because you can
> do a rough splitting based on bounding box for example, but this will lead to
> ways that have unresolved nodes.

I forgot how much resolving there is in OSM format. Compared to that it is 
ridiculously simple to read already resolved OSM features from a WFS service
http://188.64.1.61/cgi-bin/tinyows?service=WFS&version=1.0.0&request=GetFeature&typeName=lv:osm_polygon&maxfeatures=20

-Jukka-