[gdal-dev] VRT derived band pixel functions written in Python
Even Rouault
even.rouault at spatialys.com
Tue Sep 13 01:26:09 PDT 2016
Le mardi 13 septembre 2016 09:02:09, Rutger a écrit :
> Hi Even,
>
> This is really amazing. I'm becoming more and more a fan of Numba for
> number crunching, so this certainly makes my day. As soon as i can find a
> Win x64 dev version on a Conda channel I'll give it a try.
>
> A use case that comes to mind, and which i run into regularly, is when i
> want to do some simple aggregation before using something like gdalwarp.
> For example when you have a file containing 24 hourly temperature values,
> and you are only interested in the daily mean. Currently i either
> aggregate before warping and write the intermediates to disk, or aggregate
> after warping which is computationally inefficient. Neither is optimal.
>
> A few questions, can you access a files metadata from within a pixel
> function? This would perhaps allow for example interpolating atmospheric
> data to the overpass time of a satellite image.
This is indeed a question I've asked myself, if the prototype of the pixel
function contained enough information. I had not put initially the
geotransform and whole raster dimensions for example, but found it was
necessary to have correct behaviour for hillshading at different zoom scales.
And I also added afterwards the xoff, yoff to be able to do Mandelbrot
generation (I guess there might be some more valid use cases ;-)). And then
the user provided dictionnary to have some parametrization of the algorithm
instead of hardcoding constants in it, so that you can off-load it into a
general lib. And I stopped at that point.
For your use case, 2 possibilities currently, using the
<PixelFunctionArguments> capability:
- either you get the metadata items you need at VRT generation and put them as
arguments
- either you only pass the names of the sources as arguments, and you use the
GDAL Python bindings themselves inside the pixel function itself to get access
to everything needed. But I'm just thinking it might be a bit inefficient to do
that for each RasterIO() request.
<loud_thinking>
Perhaps a compromise would be to allow, in addition to simple functions, to
specify a class name, where the constructor would receive values that don't
change from one call to another one, and a calc() method would receive the
ones that change at each RasterIO() request
def MyProcessingClass:
def __init__(self, raster_xsize, raster_ysize, buf_radius, gt,
source_filenames, **kwargs):
save above parameters that you may need in calc()
do one time things like gdal.Open()'ing sources to get metadata
def calc( self, in_ar, out_ar, xoff, yoff, xsize, ysize ):
do your computation
We could possibly pass the sources as datasets themselves, since they are
actually already opened, but that would make the osgeo.gdal bindings
availability a requirement (well, we could as a fallback pass the filenames if
the import fails)
</loud_thinking>
>
> Do the pixel functions also work with @numba.vectorize(), in particular
> when targeting 'parallel' or 'cuda'.
I've only scratched up the surface of Numba (didn't know it 2 days ago). I
guess this might work (might only be interested if the
VRTDerivedRasterBand::IRasterIO() is called with a big enough region. Which
depends on the pixel access pattern of upper layers). The VRT drivers just
calls a Python function that takes numpy arrays and a few extra args.
> And would that give parallel
> processing for both IO and calculations?
Only calculations. I/O is done before going to Python and after returning from
it.
Actually... if you specify zero source in the VRT, then it is up to you to do
the read operations the way you like in Python, so you could possibly
parallelize them there.
>You should give the folks at Continuum a heads up, i'm sure they appreciate
> seeing Numba used like this.
I have no connection with them, but yes their project rocks.
Even
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list