[gdal-dev] Python programs on multi-core machines

Jan Hartmann j.l.h.hartmann at uva.nl
Mon Aug 13 04:03:12 PDT 2012


I'm working on multi-core VMss in a Cloud environment, that access their 
data on a central dataserver via NFS. Parallellizing jobs for different 
map sheets gives huge accelerations for C-programs like gdaladdo, but 
there seems to be a problem with Python-based programs like rgb2pct.py. 
Consider the following:

(
     rgb2pct.py file1.tif file1_256.tif
     gdaladdo file1_256.tif 2 4 8 16
)&
(
     rgb2pct.py file1.tif file1_256.tif
     gdaladdo file1_256.tif 2 4 8 16
)&
.. etc, for all available cores
wait

When running this on a 16-core VM I see first 16 python processes, each 
with CPU-loads around 20% for each processor, and then 16 gdaladdo 
processes, with CPU-loads around 95%. When I replace the tif-input-files 
for rgb2pct.py by equivalent jpg-files, the loads for the 16 rgb2pct.py 
processes increase to about 80% and the overall computing time more than 
halves.

So my impression is that one Python I/O process blocks  all others. I 
have read something about Python's GIL (Global Interpreter Lock, 
http://docs.python.org/faq/library#can-t-we-get-rid-of-the-global-interpreter-lock) 
and the multi-processing module, but I don't see an easy way to 
implement this for my setup.  Does anyone have a simple solution for 
this problem?

Jan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20120813/4ef7f4e6/attachment.html>


More information about the gdal-dev mailing list