[gdal-dev] VRT derived band pixel functions written in Python

Even Rouault even.rouault at spatialys.com
Tue Sep 13 01:26:09 PDT 2016

Le mardi 13 septembre 2016 09:02:09, Rutger a écrit :
> Hi Even,
> This is really amazing. I'm becoming more and more a fan of Numba for
> number crunching, so this certainly makes my day. As soon as i can find a
> Win x64 dev version on a Conda channel I'll give it a try.
> A use case that comes to mind, and which i run into regularly, is when i
> want to do some simple aggregation before using something like gdalwarp.
> For example when you have a file containing 24 hourly temperature values,
> and you are only interested in the daily mean. Currently i either
> aggregate before warping and write the intermediates to disk, or aggregate
> after warping which is computationally inefficient. Neither is optimal.
> A few questions, can you access a files metadata from within a pixel
> function? This would perhaps allow for example interpolating atmospheric
> data to the overpass time of a satellite image.

This is indeed a question I've asked myself, if the prototype of the pixel 
function contained enough information. I had not put initially the 
geotransform and whole raster dimensions for example, but found it was 
necessary to have correct behaviour for hillshading at different zoom scales. 
And I also added afterwards the xoff, yoff to be able to do Mandelbrot 
generation (I guess there might be some more valid use cases ;-)). And then 
the user provided dictionnary to have some parametrization of the algorithm 
instead of hardcoding constants in it, so that you can off-load it into a 
general lib. And I stopped at that point.

For your use case, 2 possibilities currently, using the 
<PixelFunctionArguments> capability:
- either you get the metadata items you need at VRT generation and put them as 
- either you only pass the names of the sources as arguments, and you use the 
GDAL Python bindings themselves inside the pixel function itself to get access 
to everything needed. But I'm just thinking it might be a bit inefficient to do 
that for each RasterIO() request. 


Perhaps a compromise would be to allow, in addition to simple functions, to 
specify a class name, where the constructor would receive values that don't 
change from one call to another one, and a calc() method would receive the 
ones that change at each RasterIO() request

def MyProcessingClass:
	def __init__(self, raster_xsize, raster_ysize, buf_radius, gt, 
source_filenames, **kwargs):
		save above parameters that you may need in calc()
		do one time things like gdal.Open()'ing sources to get metadata

	def calc( self, in_ar, out_ar, xoff, yoff, xsize, ysize ):
		do your computation

We could possibly pass the sources as datasets themselves, since they are 
actually already opened, but that would make the osgeo.gdal bindings 
availability a requirement (well, we could as a fallback pass the filenames if 
the import fails)


> Do the pixel functions also work with @numba.vectorize(), in particular
> when targeting 'parallel' or 'cuda'. 

I've only scratched up the surface of Numba (didn't know it 2 days ago). I 
guess this might work (might only be interested if the 
VRTDerivedRasterBand::IRasterIO() is called with a big enough region. Which 
depends on the pixel access pattern of upper layers). The VRT drivers just 
calls a Python function that takes numpy arrays and a few extra args.

> And would that give parallel
> processing for both IO and calculations?

Only calculations. I/O is done before going to Python and after returning from 

Actually... if you specify zero source in the VRT, then it is up to you to do 
the read operations the way you like in Python, so you could possibly 
parallelize them there.

>You should give the folks at Continuum a heads up, i'm sure they appreciate
> seeing Numba used like this. 

I have no connection with them, but yes their project rocks.


Spatialys - Geospatial professional services

More information about the gdal-dev mailing list