[gdal-dev] Python programs on multi-core machines
Jan Hartmann
j.l.h.hartmann at uva.nl
Mon Aug 13 04:03:12 PDT 2012
I'm working on multi-core VMss in a Cloud environment, that access their
data on a central dataserver via NFS. Parallellizing jobs for different
map sheets gives huge accelerations for C-programs like gdaladdo, but
there seems to be a problem with Python-based programs like rgb2pct.py.
Consider the following:
(
rgb2pct.py file1.tif file1_256.tif
gdaladdo file1_256.tif 2 4 8 16
)&
(
rgb2pct.py file1.tif file1_256.tif
gdaladdo file1_256.tif 2 4 8 16
)&
.. etc, for all available cores
wait
When running this on a 16-core VM I see first 16 python processes, each
with CPU-loads around 20% for each processor, and then 16 gdaladdo
processes, with CPU-loads around 95%. When I replace the tif-input-files
for rgb2pct.py by equivalent jpg-files, the loads for the 16 rgb2pct.py
processes increase to about 80% and the overall computing time more than
halves.
So my impression is that one Python I/O process blocks all others. I
have read something about Python's GIL (Global Interpreter Lock,
http://docs.python.org/faq/library#can-t-we-get-rid-of-the-global-interpreter-lock)
and the multi-processing module, but I don't see an easy way to
implement this for my setup. Does anyone have a simple solution for
this problem?
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20120813/4ef7f4e6/attachment.html>
More information about the gdal-dev
mailing list