[gdal-dev] VRT derived band pixel functions written in Python
Even Rouault
even.rouault at spatialys.com
Mon Sep 12 05:31:43 PDT 2016
Hi,
I wanted to mention a new (and I think pretty cool ) feature I've added in
trunk: the possibility to define pixel functions in Python in a VRT derived
band.
Let's start with a simple example to multiply a raster by 1.5 :
<VRTDataset rasterXSize="20" rasterYSize="20">
<SRS>EPSG:26711</SRS>
<GeoTransform>440720,60,0,3751320,0,-60</GeoTransform>
<VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
<PixelFunctionType>multiply</PixelFunctionType>
<PixelFunctionLanguage>Python</PixelFunctionLanguage>
<PixelFunctionArguments factor="1.5"/>
<PixelFunctionCode><![CDATA[
import numpy as np
def multiply(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
raster_ysize, buf_radius, gt, **kwargs):
factor = float(kwargs['factor'])
out_ar[:] = np.round_(np.clip(in_ar[0] * factor,0,255))
]]>
</PixelFunctionCode>
<SimpleSource>
<SourceFilename relativeToVRT="1">byte.tif</SourceFilename>
</SimpleSource>
</VRTRasterBand>
</VRTDataset>
or to add 2 (or more) rasters:
<VRTDataset rasterXSize="20" rasterYSize="20">
<SRS>EPSG:26711</SRS>
<GeoTransform>440720,60,0,3751320,0,-60</GeoTransform>
<VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
<PixelFunctionType>add</PixelFunctionType>
<PixelFunctionLanguage>Python</PixelFunctionLanguage>
<PixelFunctionCode><![CDATA[
import numpy as np
def add(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
raster_ysize, buf_radius, gt, **kwargs):
np.round_(np.clip(np.sum(in_ar, axis = 0, dtype = 'uint16'),0,255),
out = out_ar)
]]>
</PixelFunctionCode>
<SimpleSource>
<SourceFilename relativeToVRT="1">byte.tif</SourceFilename>
</SimpleSource>
<SimpleSource>
<SourceFilename relativeToVRT="1">byte2.tif</SourceFilename>
</SimpleSource>
</VRTRasterBand>
</VRTDataset>
You can put any python module code inside PixelFunctionCode, with at least one
function with the following arguments :
- in_ar: list of input numpy arrays (one numpy array for each source)
- out_ar: output numpy array to fill (initialized at the right dimensions and
with the VRTRasterBand.dataType)
- xoff: pixel offset to the top left corner of the accessed region of the band
- yoff: line offset to the top left corner of the accessed region of the band
- xsize: width of the region of the accessed region of the band
- ysize: height of the region of the accessed region of the band
- raster_xsize: total with of the raster band
- raster_ysize: total height of the raster band
- buf_radius: radius of the buffer (in pixels) added to the left, right, top
and bottom of in_ar / out_ar
- gt: geotransform
- kwargs: dictionnary with user arguments defined in PixelFunctionArguments
For basic operations, you just need to care about in_ar and out_ar.
With all that, you can do interesting stuff like hillshading (code ported from
gdaldem):
<VRTDataset rasterXSize="121" rasterYSize="121">
<SRS>EPSG:4326</SRS>
<GeoTransform>-80.004166666666663,0.008333333333333,0,
44.004166666666663,0,-0.008333333333333</GeoTransform>
<VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
<ColorInterp>Gray</ColorInterp>
<SimpleSource>
<SourceFilename relativeToVRT="1">n43.dt0</SourceFilename>
</SimpleSource>
<PixelFunctionLanguage>Python</PixelFunctionLanguage>
<PixelFunctionType>hillshade</PixelFunctionType>
<PixelFunctionArguments scale="111120" z_factor="30" />
<PixelFunctionCode>
<![CDATA[
# Licence: X/MIT
# Copyright 2016, Even Rouault
import math
def hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
raster_ysize, radius, gt, z, scale):
ovr_scale_x = float(out_ar.shape[1] - 2 * radius) / xsize
ovr_scale_y = float(out_ar.shape[0] - 2 * radius) / ysize
ewres = gt[1] / ovr_scale_x
nsres = gt[5] / ovr_scale_y
inv_nsres = 1.0 / nsres
inv_ewres = 1.0 / ewres
az = 315
alt = 45
degreesToRadians = math.pi / 180
sin_alt = math.sin(alt * degreesToRadians)
azRadians = az * degreesToRadians
z_scale_factor = z / (8 * scale)
cos_alt_mul_z_scale_factor = \
math.cos(alt * degreesToRadians) * z_scale_factor
cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \
254 * math.cos(azRadians) * cos_alt_mul_z_scale_factor
sin_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \
254 * math.sin(azRadians) * cos_alt_mul_z_scale_factor
square_z_scale_factor = z_scale_factor * z_scale_factor
sin_alt_mul_254 = 254.0 * sin_alt
for j in range(radius, out_ar.shape[0]-radius):
win_line = in_ar[0][j-radius:j+radius+1,:]
for i in range(radius, out_ar.shape[1]-radius):
win = win_line[:,i-radius:i+radius+1].tolist()
x = inv_ewres * ((win[0][0] + win[1][0] + win[1][0] + win[2][0])-\
(win[0][2] + win[1][2] + win[1][2] + win[2][2]))
y = inv_nsres * ((win[2][0] + win[2][1] + win[2][1] + win[2][2])-\
(win[0][0] + win[0][1] + win[0][1] + win[0][2]))
xx_plus_yy = x * x + y * y
cang_mul_254 = (sin_alt_mul_254 - \
(y * cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 - \
x * sin_az_mul_cos_alt_mul_z_scale_factor_mul_254)) / \
math.sqrt(1 + square_z_scale_factor * xx_plus_yy)
if cang_mul_254 < 0:
out_ar[j,i] = 1
else:
out_ar[j,i] = 1 + round(cang_mul_254)
def hillshade(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
raster_ysize, radius, gt, **kwargs):
z = float(kwargs['z_factor'])
scale= float(kwargs['scale'])
hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
raster_ysize, radius, gt, z, scale)
]]>
</PixelFunctionCode>
<BufferRadius>1</BufferRadius>
<SourceTransferType>Int16</SourceTransferType>
</VRTRasterBand>
</VRTDataset>
You can completely offload the python code itself into a proper my_lib.py file
and just specify
<PixelFunctionType>my_lib.hillshade</PixelFunctionType>
Technically, the interfacing with Python is done at run-time by dynamically
loading Python symbols with dlopen()/GetProcAddress(), when they are already
available in the process. For example if libgdal is loaded from a Python
interpreter, or from a binary like QGIS which has already loaded the Python
lib. Otherwise a few shared objects ("libpython2.7.so", "python27.dll",
"libpython3.4m.so", etc.) are tried, unless the PYTHONSO config option is
specified to point to a precise filename. The advantage of this approach is that
the same GDAL library binary is compatible of all Python 2.X (tested: 2.6,
2.7) and 3.X (tested: 3.1, 3.4) versions, and any numpy version that comes in
the Python environment used (at compilation time you don't need any
python/numpy development package). The numpy dependency is not a critical one:
one could imagine a fallback mode where Python arrays would be used instead,
but this has likely little interest.
Successfully tested on Linux, MacOSX, FreeBSD and Windows. Some extra tweaking
of the predefined set of shared object names - that are probed when no already
loaded Python environment is found - might be needed.
The implementation should be thread-safe regarding use of the Python Global
Interpreter Lock (GIL).
I've also tested with the numba Just-In-Time compiler
(http://numba.pydata.org/) and it provides major performance improvements for
highly computational code (example given below).
With numba enabled, I found that my above Python hillshading on a 10Kx10K
float32 raster was even faster than gdaldem (the reason is that the Python
version is a bit simplified as it doesn't take into account input nodata
values), whereas the non-jit'ed one is 100x slower. When removing the nodata
flag, gdaldem is only twice faster as the jit'ed python code. So this is a good
sign that such approach isn't only a toy or just for prototyping.
Speaking of JIT, there's no provision (yet?) for interfacing with PyPy. Would
require a new backend as PyPy C API has nothing to do with the CPython one.
There are obvious security concerns in allowing Python code to be run when
getting the content of a vrt file. The GDAL_VRT_ENABLE_PYTHON config option =
IF_SAFE / NO / YES can be set to control the behaviour. The default is IF_SAFE
(can be change at compilation time by defining
-DGDAL_VRT_ENABLE_PYTHON_DEFAULT="NO" e.g. And Python code execution can be
completely disabled with -DGDAL_VRT_DISABLE_PYTHON). Safe must be understood
as: the code will not read, write, delete... files, spawn external code, do
network activity, etc. Said otherwise, the code will only do "maths". But
infinite looping is something definitely possible in the safe mode. The
heuristics of the IF_SAFE mode is rather basic and I'd be grateful if people
could point ways of breaking it. If any of the following strings - "import"
(unless it is "import numpy" / "from numpy import ...", "import math" / "from
math import ..." or "from numba import jit"), "eval", "compile", "open",
"load", "file", "input", "save", "memmap", "DataSource", "genfromtxt",
"getattr", "ctypeslib", "testing", "dump", "fromregex" - is found anywhere in
the code, then the code is considered unsafe (there are interestingly a lot of
methods in numpy to do file I/O. Hopefully I've captured them with the previous
filters). Another 'unsafe' pattern is when the pixel function references an
external module like my above my_lib.hillshade example (who knows if there
will not be some day a shutil.reformat_your_hard_drive function with the right
prototype...)
This new capability isn't yet documented in the VRT doc, although this message
will be a start.
I'm interested in feedback you may have.
And to conclude with a fun example: a raster with a Mandelbrot fractal. Just a
grey-level version. Let to the reader as an exercice: add a color table. To be
opened for example in QGIS and enjoy the almost infinite zoom feeling. Make
sure to disable contrast enhancement. It uses numba when available, and when
this is the case, it's really fast when paning/zooming.
<VRTDataset rasterXSize="100000000" rasterYSize="100000000">
<VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
<PixelFunctionLanguage>Python</PixelFunctionLanguage>
<PixelFunctionType>mandelbrot</PixelFunctionType>
<PixelFunctionCode>
<![CDATA[
try:
from numba import jit
#print('Using numba')
g_max_iterations = 100
except:
class jit(object):
def __init__(self, nopython = True, nogil = True):
pass
def __call__(self, f):
return f
#print('Using non-JIT version')
g_max_iterations = 25
# Use a wrapper since the VRT code cannot access the jit decorated function
def mandelbrot(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
raster_ysize, r, gt, **kwargs):
mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize,
g_max_iterations)
@jit(nopython=True, nogil=True)
def mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize,
raster_ysize, max_iterations):
ovr_factor_y = float(out_ar.shape[0]) / ysize
ovr_factor_x = float(out_ar.shape[1]) / xsize
for j in range( out_ar.shape[0]):
y0 = 2.0 * (yoff + j / ovr_factor_y) / raster_ysize - 1
for i in range(out_ar.shape[1]):
x0 = 3.5 * (xoff + i / ovr_factor_x) / raster_xsize - 2.5
x = 0.0
y = 0.0
x2 = 0.0
y2 = 0.0
iteration = 0
while x2 + y2 < 4 and iteration < max_iterations:
y = 2*x*y + y0
x = x2 - y2 + x0
x2 = x * x
y2 = y * y
iteration += 1
out_ar[j][i] = iteration * 255 / max_iterations
]]>
</PixelFunctionCode>
<Metadata>
<MDI key="STATISTICS_MAXIMUM">255</MDI>
<MDI key="STATISTICS_MEAN">127</MDI>
<MDI key="STATISTICS_MINIMUM">0</MDI>
<MDI key="STATISTICS_STDDEV">127</MDI>
</Metadata>
<ColorInterp>Gray</ColorInterp>
<Histograms>
<HistItem>
<HistMin>-0.5</HistMin>
<HistMax>255.5</HistMax>
<BucketCount>256</BucketCount>
<IncludeOutOfRange>0</IncludeOutOfRange>
<Approximate>1</Approximate>
<HistCounts>0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0</HistCounts>
</HistItem>
</Histograms>
</VRTRasterBand>
</VRTDataset>
Statistics have been added just to make QGIS open the file a bit quicker.
Even
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list