[gdal-dev] OSM Driver and World Planet file (pbf format)

Wed Aug 1 08:36:11 PDT 2012

>
> I made a rough parallelizing test by making 4 copies of finland.osm.pbf and
> running ogr2ogr in four separate windows.  This way the total CPU load of the
> 8 cores was staying around 50%.
> Result: All four conversions were ready after 3 minutes (45 seconds per
> conversion) while a single conversion takes 2 minutes.

In my opinion, "45 seconds per conversio"n isn't really a good summary : I'd say
that your computer could handle 4 conversions in parallel in 3 minutes. But the
fact of running conversions in parallel didn't make them *individually* faster
(that would be non-sense) that running a single one. We probably agree, that's
just the way of presenting the info that is a bit strange.

> Conclusion: 4 parellel  conversions in 3 minutes vs. within 8 minutes if
> performed as serial runs is much faster. 50% CPU load may tell that the speed
> of SATA disk is the limiting factor now.  Test with SSD drive should give
> more information about this.

Yes at some point the disk is the limiting factor whatever the number of CPUs
you have.

 Somehow it feels like the laptop has only 4 real processors/cores
> even the resource manager is showing eight.

I've not followed what the CPU state-of-the-art is currently, but perhaps it is
a quad-core with hyper-theading ? The hyper-threaded virtual cores wouldn't be
as efficient as normal cores.

>
> I believe that by parallelizing the conversion program it is hard to take the
> juice as effectively from all the cores.

Yes, if you parallelize I/O operations, then there's a risk that it makes it
slower actually. Only the CPU intensive operations should be parallelized to
limit that risk. But when reading OSM data, there isn't that much computation
involved. Way resolving is somehow stupid and mostly aobut I/O after all. Only
the resolving of multipolygons might involve CPU intensive operations to compute
the spatial relation between rings, but that's a tiny amount of the data of a
OSM file, and even if it is slow, it is perhaps 10 or 20% of the global
conversion time.

>
> It may be difficult to feed rendering chain by having a bunch of source
> databases but it looks strongly that by splitting Germany into four distinct
> OSM source files it would be possible to import the whole country in 15
> minutes with a good laptop.

I still maintain that splitting a file is a non trivial task. I strongly believe
that to do so, you must import the whole country and do spatial requests
afterwards. So, if the data producer doesn't do it for you, there's no point in
doing it at your end. However if you get it splitted , then it might indeed be
beneficial to operate on smaller extracts. (With a risk of some duplicated
and/or truncated and/or missing objects at the border of the tiles)