[gdal-dev] Map algebra

Even Rouault even.rouault at spatialys.com
Sat Apr 2 06:08:28 PDT 2016


Le samedi 02 avril 2016 14:48:15, Ari Jolma a écrit :
> 02.04.2016, 15:15, Even Rouault kirjoitti:
> > A few thoughts and questions :
> > - I guess what would be cool from the perspective of a C++ user is to be
> > really to write the above pseudo code with C++ operator overload when
> > appropriate !  A += B, A.min(B), min_A= A.min(), etc...
> 
> Operator overloading is of course something to have. It is great for
> interactive use too.
> 
> I would most probably implement that in the Perl bindings.
> 
> > - There might be some numpy reinvention here... Or at least, it could
> > perhaps serve as an inspiration. I guess Python users would in priority
> > use numpy for the most basic operations. Of course the GIS specific
> > methods would still be of value.
> 
> I know little of numpy and how it is used by the Python bindings.

In the Python bindings, you have the methods to read from a GDAL raster and 
expose it as a numpy array, and the reverse, from a numpy array to writing a 
GDAL raster. People then use numpy own methods to do any computations they 
want.

> Does
> it load the whole raster band into memory or can it make use of the
> block read/write? If yes, then that is probably preferable for
> Pythonists at least when it comes to standard methods.

By default, if you do band.ReadAsArray() it will return the whole raster as a 
numpy array, but you can also do windowed read with band.ReadAsArray(x, y, 
width, height) (same for write).

Linux users aware of https://trac.osgeo.org/gdal/wiki/rfc45_virtualmem could 
use band.GetVirtualMemArray() on an arbitrarily large band that will return a 
huge "virtual" numpy array (actual data is loaded on first access due to 
catching memory faults). Not sure if anybody actually uses that.

Pythonists disliking GDAL's own bindings would probably use RasterIO block 
iterators :
https://github.com/mapbox/rasterio/blob/master/docs/windowed-rw.rst

> 
> > - Saturation arithmetics could be interesting. numpy by default do
> > wrapped arithmetics, you have to cast to a larger type and clip down
> > afterwards, whereas OpenCV does it by default :
> > http://opencvpython.blogspot.fr/2012/06/difference-between-matrix-arithme
> > tic- in.html
> > - Are the methods handling 2 bands able to deal with different block
> > sizes ?
> 
> Yes, the band size needs to be the same

Not sure to understand your answer: my question was rather about block sizes.

> currently but the caching and
> cell index functions allow working with two bands at the same time even
> when the cell neighborhood is needed. I'm not sure about the optimality
> of the code however, the cache is cleaned of not-needed blocks at each
> block step.
> 
> > - And with bands of different data types ?
> 
> Yes. It creates pretty huge switch tables however. They are mostly
> hidden in macros but are a problem. I don't see how I can get the
> datatype into the templates from the bands and I don't like the idea of
> leaving that to the users.

Perhaps we don't need to instanciate all template possibilities: just the 
<same_type,same_type> ones, and in case of different types promote to double ?

> 
> > - Why the need to reinvent a hashset mechanism ? Isn't std::set good
> > enough ? - Same for gma_array vs std::vector ?
> 
> Simply ignorance of C++ I think. I do see that for example C++11 and
> boost have lots of useful things but did not (yet) go that way. Earlier
> C++ standards seem to have many tools too. I learnt C++ >20 years ago...
> I believe I also need some small classes like "mapper", which
> reclassifies data from old values to new values and are not readily
> available.

std::set, std::map, std::vector are available in C++98 and already used in 
GDAL. We should use them when practical (admitedly I developed a CPLHashSet a 
long time ago, but in retrospect that didn't make sense)

> 
> > - various methods take a void* argument, whereas it seems that could be
> > rewritten as datatype* for better type safety
> 
> Noted. The problem is often that the callback to the method specific
> function needs a prototype, which can be used for many methods. Surely
> minimization of void pointers is a goal. Generic objects and
> introspection might be a partial solution.
> 
> > - there seems to be a block cache mechanism. What is it for ? How does
> > that work ?
> 
> In the block loop (looping through all blocks in a band) at each step
> the cache is checked for block(s) that are needed (if focal distance is
> 
>  > 0 then more than one block may be needed) and which are not needed.
> 
> Then blocks are read and written/discarded.

Isn't there some duplication with GDAL own's block cache ? Perhaps I'm just 
confusion by the proximity of the naming. I didn't study your code closely.

> 
> > - there is some overlap/connection with the pixel functions of VRT ( see
> > "Using Derived Bands" of http://gdal.org/gdal_vrttut.html).
> 
> ok
> 
> Thanks for the comments!
> 
> Ari
> 
> >> The C code is of course not this nice but it is not very far from this.
> >> All the basic four method types I mention below are implemented and
> >> adding new methods is not very difficult. The code makes heavy use of
> >> templates and macros in an attempt for versatility (GDAL rasters can
> >> have many datatypes) and easiness to write methods in a few lines of
> >> code. The whole set of code is now ~2500 lines but it creates rather
> >> large executables due to templates. The code is C++ but the API is C
> >> like, I've so far done no changes to existing GDAL code because of this.
> >> 
> >> As written below, the code works on two levels of x,y loops: the blocks
> >> within a band, which is generic and cells within a block, which is
> >> separate for each method. The methods work on line by line (no recursion
> >> etc.) and thus for example the fill_depressions is not optimal but
> >> instead very simple. Also, very large rasters can be processed.
> >> 
> >> I think this may be somewhat similar thing in relation to GDAL as the
> >> network support, so it could be a thing to add to the library if wished.
> >> 
> >> I've added RFC 62 "raster algebra" to spark discussion.
> >> 
> >> Ari
> >> 
> >> 20.03.2016, 10:21, Ari Jolma kirjoitti:
> >>> Folks,
> >>> 
> >>> I played around a bit with the map algebra idea and came up with
> >>> something that could work nicely as a plugin. The initial code is at
> >>> https://github.com/ajolma/gdal/tree/trunk/gdal/map_algebra
> >>> 
> >>> The idea is based on processing raster bands block by block and
> >>> calling a given function on each block. Using macros and templates the
> >>> code should be rather nice and manageable even when supporting
> >>> multiple data types for raster cells.
> >>> 
> >>> Further, the idea is to support three kinds of map algebra methods,
> >>> mostly working on a raster band in-place.
> >>> 
> >>> 1) Simple ones, which do not take any arguments beyond the raster band
> >>> itself, examples are functions like log, sin, rand.
> >>> 
> >>> 2) Those which take one non-spatial argument. The argument can be a
> >>> scalar like a number, or more complex like a reclassifier.
> >>> 
> >>> 3) Those which operate on two rasters. Examples are summation of
> >>> rasters and computing flow directions on a DEM. The latter is a bit
> >>> more complex since it is a focal method and requires a 3 x 3 matrix of
> >>> blocks from the other operand raster.
> >>> 
> >>> Maybe a fourth kind of methods are those which compute something from
> >>> a raster.
> >>> 
> >>> This would lead to four functions in the C API and raster band methods
> >>> in the bindings.
> >>> 
> >>> Comments are welcome, the initial code base contains a small demo
> >>> program. Eventually, if this works, I'll make a RFC from this.
> >>> 
> >>> Ari
> >> 
> >> _______________________________________________
> >> gdal-dev mailing list
> >> gdal-dev at lists.osgeo.org
> >> http://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list