[gdal-dev] VRT derived band pixel functions written in Python

Even Rouault even.rouault at spatialys.com
Tue Sep 13 12:32:59 PDT 2016


Le mardi 13 septembre 2016 21:22:20, James Ramm a écrit :
> I think you can call SWIG with the -threads argument on the command line so
> it will always release the GIL. Could be an easy option if it works

That's mostly what I've done. See my other message : 
https://lists.osgeo.org/pipermail/gdal-dev/2016-September/045155.html

> 
> On Tuesday, 13 September 2016, Even Rouault <even.rouault at spatialys.com>
> 
> wrote:
> > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> > > I overlooked the fact that it still moves through Python, is that the
> > > 'only' hurdle preventing parallel IO?
> > 
> > Not sure to understand your question. But if you have several sources,
> > you could potentially do parallelized reading of them from the Python
> > code by using Python threads and GDAL Python API. But looking in the
> > SWIG generated code, it doesn't seem that SWIG releases the GIL
> > automatically before calling
> > native code. Hum... So that should probably added manually, at least
> > around GDALRasterIO() calls, otherwise you'll get zero perf
> > improvements.
> > 
> > > Since gdalwarp for example has the
> > > -multi flag, it seems as if GDAL is capable of it, or is that a
> > > specific/specialized implementation?
> > 
> > Parallelized I/O doesn't mean much by itself without more context. You
> > may want to parallelize reading of different regions of the same
> > dataset, or parallelize reading of different datasets. Due to GDAL
> > objects not being thread-safe, the first case (reading of different
> > regions of the same dataset)
> > can be solved with the second one by opening several datasets for the
> > same filename.
> > 
> > Regarding gdalwarp -multi, here's how that works. When you warp a
> > dataset, there's a list of all chunks (windows) to be processed that is
> > generated. gdalwarp -multi does the following
> > 
> > Thread I/O                                              Thread
> > computation Read data for chunk 1
> > Read data for chunk 2                   Do calculations for chunk 1
> > Write output of chunk 1                 Do calculations for chunk 2
> > Read data for chunk 3
> > Write output of chunk 2                 Do calculations for chunk 3
> > 
> > > Numba has several options which might eliminate using Python during
> > > execution. There are c-callbacks:
> > > http://numba.pydata.org/numba-doc/dev/user/cfunc.html
> > 
> > You can also use @jit(nopython=True, nogil=True) and your Python method
> > will
> > end up being pure native code (provided that you don't use too high level
> > stuff
> > otherwise the jit'ification will fail with an exception).
> > 
> > And for code that is not inlined in the VRT, you can also add cache=True
> > so that the jit'ification can be reused.
> > 
> > With all that the cost of the Python layer becomes neglectable (except
> > loading
> > the Python environment the first time, if not already loaded, but for a
> > computation that will be longer than a few seconds, that's not really a
> > big deal)
> > 
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org <javascript:;>
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list