[gdal-dev] Map algebra

Ari Jolma ari.jolma at gmail.com
Thu Apr 7 11:56:01 PDT 2016


07.04.2016, 18:27, alex kirjoitti:
> Hi Ari,
>
> Sorry for responding to your message late. I do appreciate your initiative to facilitate map algebra in GDAL.

No worries, the initiative and RFC will be active for some time still 
and decisions on it are still in the future.

>
> What exactly do you mean by Map Algebra? I understand it to cover the following:
>
> 1. Local operations: pixel-by-pixel operations on raster bands.
> 2. Focal operations: operations that calculate an indicator for each pixel based on the values in surrounding pixels
> 3. Zonal operations: operations that calculate indicators for groups (zones) of pixels.
> 4. Global operations: operations that calculate indicators over all pixels.

This is the traditional understanding and I share it. I've come to map 
algebra from hydrological analysis of digital elevation models, spatial 
simulation with rasters, analysis of land cover data, etc. I've dealt 
less with multi-channel RS data but I understand that analysis of such 
data, for example classification, is an important application domain.

>
> 1. Local operations: pixel-by-pixel operations on raster bands.
>
> To support this well, I think you should allow the user to specify a function and a combination of raster bands and constants that feed into the function, e.g:
>
> my_average = apply(average_function, band1, band2, band3);
>
> but the function should also be able to handle a mix of raster bands and constants, e.g.:
>
> my_average = apply(average_function, band1, band2, 7.25);
>
> Also, I think it would make sense to have overloads for all or most of the operators, so you can write:
>
> my_average = (band1 + band2 + band3) / 3.0;
>
> And, I would like to combine these:
>
> my_average = (2* apply(average_function, band1, band2) + band3) / 3.0;
>
> Is that the kind of functionality you are trying to offer?

That is one aspect, yes. However, so far I've considered only operations 
on one or two bands at time. Operations on more than two can of course 
be dissolved into a series of operations with max two at a time.

I agree that supporting overloading is important and it is more than 
just syntactic sugar since it can delegate some tasks to the compiler, 
which may create temporaries as needed etc.

>
> If you want to support this, it seems to be best practice to use expression templates (https://en.wikipedia.org/wiki/Expression_templates), because that would allow to create complex compositions of operations without having to create temporary datasets (e.g. for the sum of band1 and band2 in the example above).

Yes, I agree that avoiding temporaries is a worthy goal, and what I 
understand of expression templates, motivation for them has been that.

However, expression templates seem to be very C++ centric and do their 
work in compile time. The C API and high level language bindings are 
very important in GDAL. Therefore expression templates might not be the 
best solution in this case. However, perhaps the idea can be exploited 
somehow.

>
> The key thing that you need in order to implement this is an iterator over the raster bands. Eric Niebler writes at length about the benefits of using Ranges as a means of composing functions (https://github.com/ericniebler/range-v3 ) .You seem to be applying an iterator that iterates over blocks, rather than over pixels. How can that work when my_average, band1, band2 and band3 have different block sizes? Wouldn't your life be a lot easier if you iterated over pixels in the same order for all bands?.

I agree that it takes a bit of haggling to be able to work with more 
than one band at the same time when one has to maintain a separate cache 
for each band and each band has different block sizes. The situation is 
even worse for focal methods since one focal region can be on four 
blocks of one band in the worst case. It is however, in my opinion, an 
efficient approach when it is not feasible to load all raster data into 
memory. The approach is also strongly supported by the basic design of 
GDAL and the fundamental algorithm is presented in GDAL documentation (I 
gave the link in earlier email).

The solution I already have seems to work well for the one and two band 
case even with 3x3 focal region and as far as I can see it could be 
extended to cases involving more bands. The macros and functions I've 
written make cell (pixel) access possible with only a few lines of code 
but it could be probably improved. That would probably be desirable if 
more than two bands are present.

>
> If you are interested, I have a solution, albeit in C++11 (https://github.com/ahhz/map_algebra ).

I remember your email and I've taken a look at your code. I have nothing 
against using modern tools but I think GDAL is a bit conservative on 
them and portability is a big issue. However, I believe the map algebra 
implementation I'm writing will be a add-on or plugin for GDAL, and not 
something that will be tightly integrated into it.

>   The following is a program using it:
>
> #include <blink/map_algebra/map_algebra.h>
>
> namespace ma = blink::map_algebra;
>
> int my_function(int w, int x, int y, int z)
> {
>    return w * x + y * z;
> }
>
> int main()
> {
>    auto a = ma::open_read_only<int>("input_1.tif");
>    auto b = ma::open_read_only<int>("input_2.tif");
>    
>    // Example 1: Using operators
>    auto w = ma::create_from_model<double>("output_1.tif", a);
>    w = a + 3 * b;
>
>    // Example 2: Using assigning operators
>    auto x = ma::create_from_model<double>("output_2.tif", a);  x = 1;
>    x *= a;
>    x += 3 * b;
>    
>    // Example 3: Map algebra using cell-by-cell functions
>    auto y = ma::create_from_model<double>("output_3.tif", a);
>    y = ma::apply(my_function, 1, a, 3, b);
>
>    // Example 4: Combination
>    auto z = ma::create_from_model<double>("output_4.tif", a);
>    z = b + 3 * ma::apply(my_function, 1, a, 2, b);
>
>    return 0;
> }
>
> 2. Focal operations: operations that calculate an indicator for each pixel based on the values in surrounding pixels
>
> I think the key challenge here is to allow users to specify the function to be applied on the surrounding pixels. In my opinion a Map Algebra library should support all kinds of moving window analysis, and not just a number of built-in types of analysis. This can be implemented as yet another range that allows iteration over the pixels in the same order as the raster bands, but instead of return pixel values, it returns neighbourhood / window summaries.

The hydrological algorithm would benefit a lot from random access to 
extended neighborhood cells but so far I've written only code that uses 
linear, block by block and pixel by pixel within blocks, access.



>
> I have a C++11 library for this as well, but it is not yet working smoothly with the map algebra library (but it will!)
> https://github.com/ahhz/moving_window
> http://www.sciencedirect.com/science/article/pii/S0303243415300337
>
> 3. Zonal operations: operations that calculate indicators for groups (zones) of pixels.
> This seems more trivial than the other two components of map algebra and I haven't given it much attention, but I think should be supported.

One complication, which I just today worked on, seems to stem from one 
requirement that I have. I want the API be free of templates and thus 
there is a need for generic classes for, e.g., numbers (GDAL supports 
bands where data is various length integers or floats). Thus, for 
example a simple zonal method, which returns which zones share a 
boundary returns a rather complex data structure.

>
> 4. Global operations
> I don't think these need to be explicitly supported, because there is no genericity to exploit.

I find that I often need methods, which return for example the histogram 
of values in a band.

>
> In summary:
>
> What I am really trying to say is that developing map algebra functionality on top of GDAL or as part of GDAL seems very worthwhile and feasible. However, I would aim for functionality that is more generically supporting map algebra, rather than implementing specific map algebra tools. The foundation of such functionality in my opinion must be a GDALRange class that facilitates iteration over pixels in a uniform order. If I were you, I would first push a RFC to create a GDALRange, and only once that is done and dusted, consider the map algebra on top of it.

The RFC procedure of GDAL is open to everybody and the deciding body is 
the GDAL PSC, to which I do not belong to. As I wrote, the API I'm 
suggesting is essentially an add-on and thus leaves, even if adopted, 
room for other solutions.

Best regards,

Ari

>
> Kind regards, Alex
>
> (sorry for the self-promotion)
>
>> -----Original Message-----
>> From: gdal-dev [mailto:gdal-dev-bounces at lists.osgeo.org] On Behalf Of Ari
>> Jolma
>> Sent: 01 April 2016 09:39
>> To: gdal-dev at lists.osgeo.org
>> Subject: Re: [gdal-dev] Map algebra
>>
>> I've got this into a rather nice working shape. The code needs to be
>> reorganized (it is now, horror, all in header files) but that's another
>> thing.
>>
>> I've so far built this around a basic DEM catchment analysis workflow.
>> It includes many kinds of steps so it is a pretty good benchmark. In
>> pseudo code the workflow is something like
>>
>> dem = open_from_file
>> dem->fill_depressions
>> fd = dem->compute_flow_directions
>> fd->route_flat_cells
>> ua = fd->compute_upstream_area
>> c = new_raster // compute the catchment into this
>> c->set_border_cells // we're interested in the outlet cells
>> c *= ua
>> c *= c > 10000 // we're interested in the big catchments
>> outlets = c->get_cells // outlets is an array of cells, and a cell is an
>> object containing (x,y,value)
>> c = 0
>> c->mark_catchment(fd, outlets[0], 1) // assuming the outlet cell is now
>> in outlets[0]
>> ua2 = new_raster // for visualization log of upstream area is nice
>> ua2 += ua
>> ua2->log
>>
>> The C code is of course not this nice but it is not very far from this.
>> All the basic four method types I mention below are implemented and
>> adding new methods is not very difficult. The code makes heavy use of
>> templates and macros in an attempt for versatility (GDAL rasters can
>> have many datatypes) and easiness to write methods in a few lines of
>> code. The whole set of code is now ~2500 lines but it creates rather
>> large executables due to templates. The code is C++ but the API is C
>> like, I've so far done no changes to existing GDAL code because of this.
>>
>> As written below, the code works on two levels of x,y loops: the blocks
>> within a band, which is generic and cells within a block, which is
>> separate for each method. The methods work on line by line (no recursion
>> etc.) and thus for example the fill_depressions is not optimal but
>> instead very simple. Also, very large rasters can be processed.
>>
>> I think this may be somewhat similar thing in relation to GDAL as the
>> network support, so it could be a thing to add to the library if wished.
>>
>> I've added RFC 62 "raster algebra" to spark discussion.
>>
>> Ari
>>
>>
>> 20.03.2016, 10:21, Ari Jolma kirjoitti:
>>> Folks,
>>>
>>> I played around a bit with the map algebra idea and came up with
>>> something that could work nicely as a plugin. The initial code is at
>>> https://github.com/ajolma/gdal/tree/trunk/gdal/map_algebra
>>>
>>> The idea is based on processing raster bands block by block and
>>> calling a given function on each block. Using macros and templates the
>>> code should be rather nice and manageable even when supporting
>>> multiple data types for raster cells.
>>>
>>> Further, the idea is to support three kinds of map algebra methods,
>>> mostly working on a raster band in-place.
>>>
>>> 1) Simple ones, which do not take any arguments beyond the raster band
>>> itself, examples are functions like log, sin, rand.
>>>
>>> 2) Those which take one non-spatial argument. The argument can be a
>>> scalar like a number, or more complex like a reclassifier.
>>>
>>> 3) Those which operate on two rasters. Examples are summation of
>>> rasters and computing flow directions on a DEM. The latter is a bit
>>> more complex since it is a focal method and requires a 3 x 3 matrix of
>>> blocks from the other operand raster.
>>>
>>> Maybe a fourth kind of methods are those which compute something from
>>> a raster.
>>>
>>> This would lead to four functions in the C API and raster band methods
>>> in the bindings.
>>>
>>> Comments are welcome, the initial code base contains a small demo
>>> program. Eventually, if this works, I'll make a RFC from this.
>>>
>>> Ari
>>>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev



More information about the gdal-dev mailing list