[gdal-dev] Map algebra
Ari Jolma
ari.jolma at gmail.com
Thu Apr 7 23:04:37 PDT 2016
08.04.2016, 02:21, alex kirjoitti:
>> That is one aspect, yes. However, so far I've considered only operations on one
>> or two bands at time. Operations on more than two can of course be dissolved
>> into a series of operations with max two at a time.
> That will often be possible, but not always. For instance how would you dissolve the following function? (without resorting to complete ugliness like c * a + !c * b)
>
> int conditional(bool c, int a, int b) { return c ? a : b; }
As an assignment that would be
d = c ? a : b;
If all variables are spatial then that's four bands. To save space we'll
do in-place modifications to bands. If we do not want to change any of
the bands a, b and c, we'll begin by making d a copy of b:
d = b;
and then use a conditional
d = a if c;
In the case where c is in fact a simple function of a (like a > 50) this
is already possible in the current API. I admit that this kind of
operation is often needed, and it should be possible.
One problem with functions of multiple bands is the curse of
dimensionality related to templates. The datatype of a band is one of 7
values. If we want to support all, then for three bands it is 7x7x7
variations. Already now with a small number of methods for two bands the
object file of just that source file is ~3.5 MB. However, I think that
in conditional functions like above the c could be limited to byte datatype.
>
> Also, would you expect the user to break down all his/her functions into two-argument or fewer functions?
In fact, more or less yes. If one can devise a mathematical function she
needs, then she probably can also think of a way to do it with a series
of a simple operations. I any mathematical operation involving n
variables can be broken down into a series of operations involving max
two variables, then I'd be happy with that set of operations. It is in
fact an interesting theoretical question what would be the simplest such
set of operations.
>
>> I agree that supporting overloading is important and it is more than just
>> syntactic sugar since it can delegate some tasks to the compiler, which may
>> create temporaries as needed etc.
> To be fair, in my project it is just syntactic sugar: "a + b" is just a short way to write "apply(std::plus<>{}, a, b) ";
I've written a map algebra overload code in Perl with underlying code in
C. It was possible to write a set of simple methods, which the Perl
compiler would use to compute any algebraic expression. Some of the
simple methods would create a result, which Perl would use as an input
to other methods and then discard it. That would happen invisible to the
user.
>
>>> If you want to support this, it seems to be best practice to use expression
>> templates (https://en.wikipedia.org/wiki/Expression_templates), because that
>> would allow to create complex compositions of operations without having to
>> create temporary datasets (e.g. for the sum of band1 and band2 in the example
>> above).
>>
>> Yes, I agree that avoiding temporaries is a worthy goal, and what I
>> understand of expression templates, motivation for them has been that.
>>
>> However, expression templates seem to be very C++ centric and do their
>> work in compile time. The C API and high level language bindings are
>> very important in GDAL. Therefore expression templates might not be the
>> best solution in this case. However, perhaps the idea can be exploited
>> somehow.
> I would think that you can get the principle to work in object-oriented fashion as well, where the template based function "apply" looks like this
>
> auto apply(Function&& function, Args&&... args)
>
> Couldn't the object oriented version for two arguments be something like this:
>
> GDALRange* apply(GDALValue*( *function)(GDALValue*, GDALValue*), GDALArgument * arg1 GDALArgument * arg2)
>
> Assuming that all ranges derive from GDALRange and all values from GDALValue, and both of these derive from GDALArgument. You can now sort out the function composition in runtime instead of compile-time.
>
> I know which approach I would prefer, but I am not convinced it is not possible.
>
> [snip]
>> You
>> seem to be applying an iterator that iterates over blocks, rather than over pixels.
>> How can that work when my_average, band1, band2 and band3 have different
>> block sizes? Wouldn't your life be a lot easier if you iterated over pixels in the
>> same order for all bands?.
>>
>> I agree that it takes a bit of haggling to be able to work with more
>> than one band at the same time when one has to maintain a separate cache
>> for each band and each band has different block sizes. The situation is
>> even worse for focal methods since one focal region can be on four
>> blocks of one band in the worst case.
> The worst-case can be a lot worse, consider a focal window with a radius of 500 pixels.
>
>> It is however, in my opinion, an
>> efficient approach when it is not feasible to load all raster data into
>> memory.
> Iterating pixel-by-pixel does not imply having to load all raster data into memory. It does imply loading multiple blocks into memory, i.e. a whole row of blocks.
>
>> The approach is also strongly supported by the basic design of
>> GDAL and the fundamental algorithm is presented in GDAL documentation (I
>> gave the link in earlier email).
> I didn't see that email and just trawled through the whole thread, can you give it again?
http://www.gdal.org/classGDALRasterBand.html#ad80cecc562fd48f5783677de625360ac
it is a simple algorithm but demonstrates the two level approach.
>
>> The solution I already have seems to work well for the one and two band
>> case even with 3x3 focal region and as far as I can see it could be
>> extended to cases involving more bands.
I'll add the disclaimer about the curse of dimensionality I mention
above for which a solution should be found first. It could be a simple
one like copy all blocks to same datatype first.
>> The macros and functions I've
>> written make cell (pixel) access possible with only a few lines of code
>> but it could be probably improved. That would probably be desirable if
>> more than two bands are present.
>>
>>> If you are interested, I have a solution, albeit in C++11
>> (https://github.com/ahhz/map_algebra ).
>>
>> I remember your email and I've taken a look at your code. I have nothing
>> against using modern tools but I think GDAL is a bit conservative on
>> them and portability is a big issue. However, I believe the map algebra
>> implementation I'm writing will be a add-on or plugin for GDAL, and not
>> something that will be tightly integrated into it.
>>
> [snip]
>>> 2. Focal operations: operations that calculate an indicator for each pixel based
>> on the values in surrounding pixels
>>> I think the key challenge here is to allow users to specify the function to be
>> applied on the surrounding pixels. In my opinion a Map Algebra library should
>> support all kinds of moving window analysis, and not just a number of built-in
>> types of analysis. This can be implemented as yet another range that allows
>> iteration over the pixels in the same order as the raster bands, but instead of
>> return pixel values, it returns neighbourhood / window summaries.
>>
>> The hydrological algorithm would benefit a lot from random access to
>> extended neighborhood cells but so far I've written only code that uses
>> linear, block by block and pixel by pixel within blocks, access.
>>
>>
> I think I need to read up on your fundamental algorithm to understand this. Is linear access line-by-line? That would ease my worries because line-by-line is independent of block size, right?
It really depends on the block size, which is something GDAL core and
drivers handle.
>
> [snip]
>>> 3. Zonal operations: operations that calculate indicators for groups (zones) of
>> pixels.
>>> This seems more trivial than the other two components of map algebra and I
>> haven't given it much attention, but I think should be supported.
>>
>> One complication, which I just today worked on, seems to stem from one
>> requirement that I have. I want the API be free of templates and thus
>> there is a need for generic classes for, e.g., numbers (GDAL supports
>> bands where data is various length integers or floats). Thus, for
>> example a simple zonal method, which returns which zones share a
>> boundary returns a rather complex data structure.
> I understand zonal methods to be operating on pixels within zones and not on the relationship between zones. What kind of methods do you have in mind?
I admit that it is perhaps not a "pure" zonal method. For example, think
about the question: "What are the neighboring countries of each country?".
>
>>> 4. Global operations
>>> I don't think these need to be explicitly supported, because there is no
>> genericity to exploit.
>>
>> I find that I often need methods, which return for example the histogram
>> of values in a band.
>>
> Me too. But I don't feel it urgently needs special library support.
In my mind it goes logically into this API, but that's just a matter of
taste.
>
>>> In summary:
>>>
>>> What I am really trying to say is that developing map algebra functionality on
>> top of GDAL or as part of GDAL seems very worthwhile and feasible. However, I
>> would aim for functionality that is more generically supporting map algebra,
>> rather than implementing specific map algebra tools. The foundation of such
>> functionality in my opinion must be a GDALRange class that facilitates iteration
>> over pixels in a uniform order. If I were you, I would first push a RFC to create a
>> GDALRange, and only once that is done and dusted, consider the map algebra on
>> top of it.
>>
>> The RFC procedure of GDAL is open to everybody and the deciding body is
>> the GDAL PSC, to which I do not belong to. As I wrote, the API I'm
>> suggesting is essentially an add-on and thus leaves, even if adopted,
>> room for other solutions.
> I thought the idea of discussing on this mailing-list was to look for consensus or at least strengthen ideas.
Sure.
Ari
>
> Kind regards, Alex
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
More information about the gdal-dev
mailing list