[gdal-dev] Python bindings: more multithreading friendly
Even Rouault
even.rouault at spatialys.com
Tue Sep 13 08:26:56 PDT 2016
Hi,
In the recent discussion about the VRT Python new capability, I realized that
the Python global interpreter lock wasn't released in the osgeo.gdal bindings
when going from Python to GDAL native code, thus making threaded Python code
inefficient.
I've committed a change to improve that. Basically I've enabled releasing the
GIL in all methods of the osgeo.gdal package, a few ones of the osgeo.org
package (ogr.Open() mostly) and none of osgeo.osr or osgeo.gdalconst. There
are some subtelties though as we a few callback mechanisms (error handler and
progress function) that can go back to Python code, so the GIL has to be
reacquired in the wrapping code.
Enabling everything in osgeo.gdal is a bit brute force since there are methods
that don't spend much time in native code so releasing/reacquiring the GIL
might be overkill. I'd been interested hearing if people would see noticeable
performance degradation in single threaded use case. In which case we might
want to add a few exceptions here or there.
But now you can do stuff like the following and it will nicely use your cores :
{{{
from osgeo import gdal
import threading
def worker():
ds = gdal.Open('myraster.tif')
ds.GetRasterBand(1).Checksum()
ds.GetRasterBand(1).ReadAsArray()
return
threads = []
for i in range(4):
t = threading.Thread(target=worker)
threads.append(t)
t.start()
for i in range(4):
threads[i].join()
}}}
One downside of this change is that if you did use Python multithreading
before without respecting the constraints of the C++ GDAL API (ie not using a
same GDAL object from several threads), your code happened to work due to the
serialization caused by the GIL. Now the following will crash (like it would
do in the equivalent C/C++ code):
{{{
from osgeo import gdal
import threading
ds = gdal.Open('byte.tif')
def worker():
while True:
ds.GetRasterBand(1).Checksum()
ds.FlushCache()
return
for i in range(4):
t = threading.Thread(target=worker)
t.start()
}}}
So I'm not so sure if the wins are really better than the cons. I'd suggest to
let the new behaviour in trunk and see later, when it will have been shaked by
more hands, if this was an appropriate move or not. I've opened ticket
https://trac.osgeo.org/gdal/ticket/6649 to track this change.
Even
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list