[gdal-dev] VRT derived band pixel functions written in Python

Even Rouault even.rouault at spatialys.com
Tue Sep 13 02:56:17 PDT 2016


Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> I overlooked the fact that it still moves through Python, is that the
> 'only' hurdle preventing parallel IO?

Not sure to understand your question. But if you have several sources, you 
could potentially do parallelized reading of them from the Python code by 
using Python threads and GDAL Python API. But looking in the SWIG generated 
code, it doesn't seem that SWIG releases the GIL automatically before calling 
native code. Hum... So that should probably added manually, at least around 
GDALRasterIO() calls, otherwise you'll get zero perf improvements.

> Since gdalwarp for example has the
> -multi flag, it seems as if GDAL is capable of it, or is that a
> specific/specialized implementation?

Parallelized I/O doesn't mean much by itself without more context. You may 
want to parallelize reading of different regions of the same dataset, or 
parallelize reading of different datasets. Due to GDAL objects not being 
thread-safe, the first case (reading of different regions of the same dataset) 
can be solved with the second one by opening several datasets for the same 
filename.

Regarding gdalwarp -multi, here's how that works. When you warp a dataset, 
there's a list of all chunks (windows) to be processed that is generated. 
gdalwarp -multi does the following

Thread I/O						Thread computation
Read data for chunk 1			
Read data for chunk 2			Do calculations for chunk 1
Write output of chunk 1			Do calculations for chunk 2
Read data for chunk 3			
Write output of chunk 2			Do calculations for chunk 3


> 
> Numba has several options which might eliminate using Python during
> execution. There are c-callbacks:
> http://numba.pydata.org/numba-doc/dev/user/cfunc.html

You can also use @jit(nopython=True, nogil=True) and your Python method will 
end up being pure native code (provided that you don't use too high level stuff 
otherwise the jit'ification will fail with an exception).

And for code that is not inlined in the VRT, you can also add cache=True so 
that the jit'ification can be reused.

With all that the cost of the Python layer becomes neglectable (except loading 
the Python environment the first time, if not already loaded, but for a 
computation that will be longer than a few seconds, that's not really a big 
deal)

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list