[gdal-dev] OSM Driver and World Planet file (pbf format)

Rahkonen Jukka Jukka.Rahkonen at mmmtike.fi
Wed Aug 1 14:27:06 PDT 2012


Even Rouault wrote:

>
>> I made a rough parallelizing test by making 4 copies of finland.osm.pbf and
>> running ogr2ogr in four separate windows.  This way the total CPU load of the
>> 8 cores was staying around 50%.
>> Result: All four conversions were ready after 3 minutes (45 seconds per
>> conversion) while a single conversion takes 2 minutes.

> In my opinion, "45 seconds per conversio"n isn't really a good summary : I'd say
> that your computer could handle 4 conversions in parallel in 3 minutes. But the
> fact of running conversions in parallel didn't make them *individually* faster
> (that would be non-sense) that running a single one. We probably agree, that's
> just the way of presenting the info that is a bit strange.

Ok, let's use other units. Some suggestions:
- data process rate as MB/sec or MB/minute (input sile size in pbf format)
- node conversion rate nodes/sec
- way or feature conversion rate as count/sec

None of them is a perfect speed unit. Nodes/sec feels most exact but practical speed 
depends on the nature of data, especially on the amount of relations and 
how complicated they are. Megabytes of pbf data per minute could be 
rather good measure too. In my single process vs. four parallel processes
example the conversion rates were 60 MB/minute vs. 160 MB/minute, 
respectively. By looking at file sizes in http://download.geofabrik.de/osm/europe/
one can make a fast estimate that converting 300 MB of data from Spain should
take about 5 minutes. With parallel runs Finland, Sweden and Norway would 
also be ready at the same time without any cost.

......
>> It may be difficult to feed rendering chain by having a bunch of source
>> databases but it looks strongly that by splitting Germany into four distinct
>> OSM source files it would be possible to import the whole country in 15
>> minutes with a good laptop.

> I still maintain that splitting a file is a non trivial task. I strongly believe
> that to do so, you must import the whole country and do spatial requests
> afterwards. So, if the data producer doesn't do it for you, there's no point in
> doing it at your end. However if you get it splitted , then it might indeed be
> beneficial to operate on smaller extracts. (With a risk of some duplicated
> and/or truncated and/or missing objects at the border of the tiles)

I agree. Splitting OSM data files on the client side was my ancient idea from more 
than a week ago. It does not make sense nowadays. Data should come splitted
from the data producer. It would need some thinking about how to split the
data so that there would not be troubles at the data set seams.  This GSoC 
project seems to aim at something similar
http://wiki.openstreetmap.org/wiki/Google_Summer_of_Code/2012/Data_Tile_Service

-Jukka-


More information about the gdal-dev mailing list