[gdal-dev] Experiments with multiprocessing

Ari Jolma ari.jolma at gmail.com
Wed Mar 8 05:37:57 PST 2017


I've tried multiprocessing a bit, here's my log on that.

My test case was computing the min and max of a 35989 x 61978 integer 
raster (Finland in 20 m x 20 m cells). The data is a LZW compressed 
GTiff with 128 x 128 blocks. The file size is ~200 MB.

I used block based access, Perl and PDL (Perl Data Language). Each block 
is read into a PDL object and the min and max of the block is then 
computed by PDL.

I used MCE first. MCE is "Multi-core engine for Perl" (a module 
available at CPAN). It can use threads but since my Perl is not compiled 
to use them (the usual case) it spawns child processes as workers.

The first experiment went fine, the computing time went from 214 secs 
with one worker to 125 secs with 5 workers (I have 4 CPUs). However, 
each worker processed one block at a time (opening the file each time 
anew), which I thought was not optimal because of overhead of spawning 
and opening. Then I changed the setup so that I arranged blocks into as 
many batches that I had workers, so each worker would work only once. I 
could not get that setup to work - I got low level errors from PDL.

The second experiment was to take the second setup from the first 
experiment (each worker works only once with a batch of blocks assigned 
to it) and use vanilla fork() from Perl core. Input to the spawned 
children is easy but for output I used files. This time there were no 
errors from PDL or elsewhere and everything worked fine. The computing 
time went from 62 secs with one worker to 36 secs with 4 workers.

It seems that using plain fork is quite easy and useful. I'd expect that 
similar results can be obtained with Python and its equivalent to fork() 
in Perl. I'm using Linux. Windows is bit different story since at least 
for Perl the fork() in Windows is somehow emulated version of the unix 
fork and that may cause issues.

The MCE module seems to be highly praised but it did not work for me well.

Ari




More information about the gdal-dev mailing list