[gdal-dev] Parllelization slows down single gdal_calc process in python

Kor de Jong kor at jemig.eu
Thu Mar 3 02:34:02 PST 2016


Dear Lorenzo,

On 03/03/2016 12:44 AM, Lorenzo Bottaccioli wrote:
> If i run the code with out parallelization it takes around 650s to
> complete the calculation. Each process of the for loop is executed in
> ~10s. If i run with parallelization it takes ~900s to complete the
> procces and each process of the for loop it takes ~30s.
>
> How is that? how can i Fix this?

If I am not mistaken, you are splitting your process into 8 concurrent 
processes. Do you have at least 8 cores in your machine and can you 
observe that the processes run indeed in parallel? If so, the overhead 
may come from the processes trying to get access to input and output 
files at the same time. Although your computations may run in parallel, 
the I/O will still happen sequentially and less optimal than during your 
non-concurrent run, because multiple processes are now frequently 
fighting to get access to files at the same time, blocking each other.

You should probably instrument your code to figure out where the time is 
spent.

In case I/O is indeed the bottleneck, then you might get better results 
by distributing the processes over multiple disks (multiple 
controllers). This will get rid of some synchronization points allowing 
multiple processes to continue to run in parallel, even while doing I/O.

As a general rule, you don't want large scale I/O to happen concurrently 
on non-parallel hardware.

Best regards,
Kor



More information about the gdal-dev mailing list