[gdal-dev] VRT derived band pixel functions written in Python
Even Rouault
even.rouault at spatialys.com
Tue Sep 13 02:56:17 PDT 2016
Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> I overlooked the fact that it still moves through Python, is that the
> 'only' hurdle preventing parallel IO?
Not sure to understand your question. But if you have several sources, you
could potentially do parallelized reading of them from the Python code by
using Python threads and GDAL Python API. But looking in the SWIG generated
code, it doesn't seem that SWIG releases the GIL automatically before calling
native code. Hum... So that should probably added manually, at least around
GDALRasterIO() calls, otherwise you'll get zero perf improvements.
> Since gdalwarp for example has the
> -multi flag, it seems as if GDAL is capable of it, or is that a
> specific/specialized implementation?
Parallelized I/O doesn't mean much by itself without more context. You may
want to parallelize reading of different regions of the same dataset, or
parallelize reading of different datasets. Due to GDAL objects not being
thread-safe, the first case (reading of different regions of the same dataset)
can be solved with the second one by opening several datasets for the same
filename.
Regarding gdalwarp -multi, here's how that works. When you warp a dataset,
there's a list of all chunks (windows) to be processed that is generated.
gdalwarp -multi does the following
Thread I/O Thread computation
Read data for chunk 1
Read data for chunk 2 Do calculations for chunk 1
Write output of chunk 1 Do calculations for chunk 2
Read data for chunk 3
Write output of chunk 2 Do calculations for chunk 3
>
> Numba has several options which might eliminate using Python during
> execution. There are c-callbacks:
> http://numba.pydata.org/numba-doc/dev/user/cfunc.html
You can also use @jit(nopython=True, nogil=True) and your Python method will
end up being pure native code (provided that you don't use too high level stuff
otherwise the jit'ification will fail with an exception).
And for code that is not inlined in the VRT, you can also add cache=True so
that the jit'ification can be reused.
With all that the cost of the Python layer becomes neglectable (except loading
the Python environment the first time, if not already loaded, but for a
computation that will be longer than a few seconds, that's not really a big
deal)
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list