[gdal-dev] Parllelization slows down single gdal_calc process in python
Kor de Jong
kor at jemig.eu
Thu Mar 3 02:34:02 PST 2016
Dear Lorenzo,
On 03/03/2016 12:44 AM, Lorenzo Bottaccioli wrote:
> If i run the code with out parallelization it takes around 650s to
> complete the calculation. Each process of the for loop is executed in
> ~10s. If i run with parallelization it takes ~900s to complete the
> procces and each process of the for loop it takes ~30s.
>
> How is that? how can i Fix this?
If I am not mistaken, you are splitting your process into 8 concurrent
processes. Do you have at least 8 cores in your machine and can you
observe that the processes run indeed in parallel? If so, the overhead
may come from the processes trying to get access to input and output
files at the same time. Although your computations may run in parallel,
the I/O will still happen sequentially and less optimal than during your
non-concurrent run, because multiple processes are now frequently
fighting to get access to files at the same time, blocking each other.
You should probably instrument your code to figure out where the time is
spent.
In case I/O is indeed the bottleneck, then you might get better results
by distributing the processes over multiple disks (multiple
controllers). This will get rid of some synchronization points allowing
multiple processes to continue to run in parallel, even while doing I/O.
As a general rule, you don't want large scale I/O to happen concurrently
on non-parallel hardware.
Best regards,
Kor
More information about the gdal-dev
mailing list