[gdal-dev] Improving usage of GDAL in Mapnik

Thu Oct 30 17:01:45 PDT 2014

Le jeudi 30 octobre 2014 20:55:11, Robert Coup a écrit :
> Hi team,
> 
> Fixing a bug in the Mapnik GDAL code (
> https://github.com/mapnik/mapnik/tree/master/plugins/input/gdal) where the
> return value of RasterIO() isn't checked (so eg. if a component TIFF in a
> VRT is unavailable it just trucks right along rendering blank images
> without error) and it appears to me like the usage of GDAL could be
> optimised a bit in
> https://github.com/mapnik/mapnik/blob/master/plugins/input/gdal/gdal_featur
> eset.cpp
> 
> 
> I'm curious whether any of the following will actually make much
> difference, or whether GDALs caching nullifies most of it already? Or
> whether there are potentially other obvious performance optimisations?
> 
> * Favouring dataset.RasterIO() over band.RasterIO() to prevent up to 4x
> raster reads

Rob,

If the size of the image (or your requested window) is below the GDAL block 
cache maximum size (default is 40 MB), then most drivers should cache the data 
for other bands when reading one band (particularly if decompression is 
involved), if the storage on disk is pixel interleaved (i.e R1,G1,B1,R2,G2,B2, 
etc...), which is generally the case for GeoTIFF. So reading in one call or 4 
calls will probably not make a lot of difference.
But if you know that you have to read the 4 bands, there's no reason not to 
use dataset.RasterIO() in all situations. GDAL, either in its generic 
implementation or in the driver specific one, will look at the declared 
interleaving and will/should select the most appropriate reading strategy, ie. 
potentially operating block by block if the data is pixel interleaved, or band 
by band otherwise.

> * Use of GetBlockSize() to read on block boundaries, then cropping data
> later.

I've not looked closely at the code you point to, but that depends on what you 
do. If you have allocated a big buffer to contain the whole requested window, 
then just issue a single RasterIO() and let GDAL do its job to filling it the 
best way.
If you just allocate small chunks of memory and process chunk by chunk, then 
it is indeed better to call RasterIO() on the block boundaries (or multiple of 
block boundaries).

> * Use GetMaskBand()/GetMaskFlags() to simplify nodata/alpha handling
> (probably won't impact performance but might get rid of a bunch of code)

Yes that can be convenient to avoid having to deal both with nodata and alpha 
band in a unified way. Plus some formats can really have a mask band that is 
neither of the two. For example you can create a JPEG-compressed YCbCr TIFF 
and use an internal mask band to encode transparency (although this is 
admitedly quite an exotic TIFF formulation because most readers will ignore 
the transparency mask)

Even

> 
> Cheers,
> 
> Rob :)

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com