[gdal-dev] GDAL raster processing library?

jramm jamessramm at gmail.com
Tue Jul 12 05:40:27 PDT 2016


I wonder if there is a use case/interest in a small library within GDAL to
facilitate generic raster processing. 

My idea would be to have a user-extensible framework to run processing
functions on whole rasters, with a growing library of common-such functions
within GDAL. 

Something along the lines of this:

/*******************************************************************/
typedef CPLErr (*GDALRasterProcessFn)(double *padfInArray, double
*padfOutArray, 
                int nWindowXSize, int nWindowYSize, void *pData, double
*pdfNoDataValue);
/**
* \brief Definition of a raster processing function. 
*
* A GDALRasterProcessFn accepts an array of data as input, applies custom
logic and writes the output to padfOutArray.
* Such a function can be passed to GDALRunRasterProcess to apply custom
processing to a GDALDataset in chunks and create 
* a new GDALDataset. 
*
* @param padfInArray The input array of data. 
*
* @param padfOutArray The output array of data. On first call (via
GDALRunRasterProcess) this will be an empty, initialised array,
*    which should be populated with the result of calculations on
padfInArray. In subsequent calls it will contain the result of the 
*    previous window. 
*
* @param nWindowXSize the actual x size (width) of the read window.
*
* @param nWindowYSize the actual y size (height) of the read window. The
length of padfInArray == padfOutArray == nWindowXSize * nWindowYSize
*
* @param pData Process-specific data. This data is passed straight through
to the GDALRasterProcessFn and may contain e.g user defined parameters.
*     The GDALRasterProcessFn definition would define the structure/type of
such data.
*
* @param pdfNoDataValue The no data value of the dataset
*/

CPLErr GDALRunRasterProcess(GDALRasterProcessFn processFn, GDALDataset
*poSrcDataset, 
                        GDALDataset *poDstDataset, void *pData, int
*pnWindowXSize, 
                        int *pnWindowYSize, int *pnPixelBuffer);
/**
* \brief Apply a raster processing function to each sub-window of a raster.
* 
* The input raster dataset is read in chunks of nWindowXSize * nWindowYSize
and each chunk is passed to the processing
* function. The output array from the function is written to the destination
dataset. 
* An optional 'pixel buffer' can be specified to allow overlaps between
successive windows. This is useful for 
* some algorithms, e.g. blob extraction, watershed/stream flow analysis,
convolution etc.
* Process specific data can be passed (e.g. configuration parameters). This
data is simply passed straight through to the processing
* function on each call. 
*
* @param processFn A GDALRasterProcessFn to apply to each sub window of the
raster. 
* 
* @param poSrcDataset The source raster dataset from which pixel values are
read
*
* @param poDstDataset The destination raster dataset to which pixel values
are written. Must support RasterIO in write mode.
*
* @param pData Process-specific data. This is passed straight through to the
GDALRasterProcessFn on each call. 
*
* @param pnWindowXSize The desired width of each read window. If NULL it
defaults to the 'natural' block size of the raster
*
* @param pnWindowYSize The desired height of each read window. If NULL it
defaults to the 'natural' block size.
*
* @param pnPixelBuffer A pixel buffer to apply to the read window. The read
window is expanded by pnPixelBuffer pixels in all directions such that 
*    each window overlaps by pnPixelBuffer pixels. Ignored if NULL or 0
*
* @return a CPLErr enum indicating whether the function succesfully ran. 
*/

                        
CPLErr GDALRunRasterProcesses(GDALRasterProcessFn *paProcessFn, int
nProcesses, GDALDataset *poSrcDataset, 
                        GDALDataset *poDstDataset, void **paData, int
*pnWindowXSize, 
                        int *pnWindowYSize, int *pnPixelBuffer);
                        
/**
* \brief Apply multiple raster processing functions to each sub-window of a
raster
*
* For each window, the functions defined by the paProcessFn array are called
in turn, with the array output of the previous function forming the input
* to the next function. This allows processing 'toolchains' to be built
without having to create intermediate datasets, which can be less efficient
in time and space. 
*
*
* @param paProcessFn An array of GDALRasterProcessFn to apply to each sub
window of the raster
*
* @param nProcesses The size of paProcessFn
*
* @param poSrcDataset The source raster dataset from which pixel values are
read
*
* @param poDstDataset The destination raster dataset to which pixel values
are written. Must support RasterIO in write mode.
*
* @param paData an array of process-specific data objects of size
nProcesses. Each data object will be passed to the corresponding
GDALRasterProcessFn
*
* @param pnWindowXSize The desired width of each read window. If NULL it
defaults to the 'natural' block size of the raster
*
* @param pnWindowYSize The desired height of each read window. If NULL it
defaults to the 'natural' block size.
*
* @param pnPixelBuffer A pixel buffer to apply to the read window. The read
window is expanded by pnPixelBuffer pixels in all directions such that 
*    each window overlaps by pnPixelBuffer pixels. If NULL, it is ignored. 
*
* @return a CPLErr enum indicating whether the function succesfully ran. 
*/
/*******************************************************************/

So GDALRunRasterProcess would take care of I/O and looping through the
dataset in chunks, which can be configure by the user. Each chunk is
submitted to the processing function, which does 'stuff' and populates the
output array. GDALRunRasterProcess would write it to the output dataset
object. 

An example of a processing function could be something like this:

/*******************************************************************/
typedef struct ReclassArgs {
    int nReclassArgs;
    double *padfMinBound;
    double *padfMaxBound;
    double *padfBinValue;
    
} ReclassArgs;


CPLErr Reclass(double *padfInArray, double *padfOutArray, 
               int nWindowXSize, int nWindowYSize, void *pData,
               double *pdfNoDataValue)
{
    ReclassArgs *poRargs = static_cast<ReclassArgs *>(pData);
    double pixel;
    for (int i = 0; i < nWindowXSize * nWindowYSize - 1; ++i){
        pixel = padfInArray[i];
        for (int k = 0; k < poRargs->nReclassArgs; ++k){
                if ((pixel > poRargs->padfMinBound[k]) && (pixel <=
poRargs->padfMaxBound[k])){
                    padfOutArray[i] = poRargs->padfBinValue[k];
                    break;
                }
        }
    }    
    return CE_None;
}
/*******************************************************************/

To use it you would:

/*******************************************************************/
// Open a source dataset
GDALDataset *poSrc = GDALOpen(...)

// Create some output dataset with same dimensions etc..
GDALDataset *poDst = ....

// Setup the reclass args
ReclassArgs *poRargs = ...

// Call the process
CPLErr retval = GDALRunRasterProcess(Reclass, poSrc, poDst, (void *)poRargs,
NULL, NULL, NULL);
/*******************************************************************/

GDAL could feature its own library of common functions...reclass, truncate,
blob extraction etc etc and a user of GDAL could write their own and have
GDAL process it. 

It would be useful to expose enough to python via SWIG so that a
GDALRasterProcessFn could be defined in python - this would then allow rapid
development in python, but push the expensive work of looping over dataset
chunks to C++.
I dont know enough about SWIG yet to see whether this would be feasible.

However, even in C++, it allows the user to only think about their algorithm
and leave the boilerplate to GDAL. 

I imagine all this being implemented in a 'sub library', something like
gdpl.h (geospatial/gdal dataset processing library), but sitting in the GDAL
source tree (then I can happily expose all the useful GDAL goodness without
being worried about users having to include gdal headers etc, and avoid any
work in having to abstract it away)

This is a first stab at defining something like this and is probably not yet
comprehensive enough.

Any thoughts?




--
View this message in context: http://osgeo-org.1560.x6.nabble.com/GDAL-raster-processing-library-tp5275948.html
Sent from the GDAL - Dev mailing list archive at Nabble.com.


More information about the gdal-dev mailing list