[postgis-devel] [raster] Architecture wiki document contribution

Fri Jul 1 15:13:11 PDT 2011

Note: Some of the cross talk is because I'm speaking architecture and
you're speaking implementation. Unfortunately, some of the benefits of
an alternate implementation (or problems with the existing one) will
not be visible until the task is viewed from a more "generalist"
standpoint. Conversely, problems with the architecture sometimes don't
pop up until an implementation detail exposes an achilles heel. Thus
far, I wrote an implementation based off of improvements to the loop
in your mapalgebra SQL code made possible in C. The architecture is a
generalization of that, along with an introduction to the abstractions
upon which the design relies (which has no ties to PostGIS raster or
my own implementation, it's strictly a conceptual design). The next
step is to apply insights made possible by the architecture to rev. 2
of the implementation. Yes, it's iterative. Yes, that means that the
architecture is ahead of the implementation at this point. Yes, you
have to be comfortable with the entire gamut of abstraction in order
to follow along. No, I'm not the best communicator.

But when we're done iterating, we have pre-generated a conceptual
description of what we've implemented. A "developer's guide" to
parallel the "reference manual" (API documentation).

I want you to understand that the presence of this tool, conceptually
or in an implementation, says nothing about how it is used. We can use
it only for MapAlgebra and implement spatial operations by writing
expressions for MapAlgebra to evaluate. Or we can use the tool for
MapAlgebra as well as for spatial operations which are hardcoded and
compiled to a shared library. It's not "architecture vs. Pierre's
plan". Architecture puts the plan in context, divorcing it from the
specific implementation currently in vogue, and allows you to do the
same thing in an entirely different environment.

On Fri, Jul 1, 2011 at 4:41 PM, Pierre Racine
<Pierre.Racine at sbf.ulaval.ca> wrote:
> Could you describe how your Result Grid Iteration Engine is different from MapAlgebra(rast1, rast2)?
>
> I understand it as a mask generation engine but MapAlgebra(rast1, rast2) and MapAlgebra(rast) are also mask generation engines... since a mask is a raster and a raster can easily be viewed as a mask... (nodata = 0 and withdata = 1)

It's way more flexible than MapAlgebra (SQL), and is the core around
which MapAlgebra (C) can be built. :)

Start by breaking MapAlgebra into two pieces: the first part
calculates the size of the resultant raster, and the second iterates
over the result raster. The grid iteration engine is the second part.

Now add extension points. In C I used callbacks implemented by
function pointers
(http://en.wikipedia.org/wiki/Callback_%28computer_programming%29).
One extension point (spatial_op) calculates whether the raster cell is
"inside" or "outside" the result. The other extension point (value_op)
calculates a value for the cell, given data from the two rasters and
the result of spatial_op.

Spatial ops currently implemented: intersection, difference, union,
symdifference.
Value ops currently implemented: generate mask value; copy pixel

Generate mask value simply returns the result of spatial_op. "Copy
pixel" will return all bands from raster1 unless raster1 is not
present; in which case it will return all bands from raster2; it
returns the "nodata" value if spatial_op returned false. By copying
all bands, you can deal with color images. The requirement for "copy
pixel" is that raster1, raster2, and the result all have the same
number of bands of the same pixtype in the same order. This is a
general requirement of any operation which works on "raster values"
instead of "single-band-values", though. Generate mask value imposes
no such restriction, as it outputs a single band and ignores the input
values completely.

The spatial_op "extension point" replaces the hardcoded sequence of
if/then in the MapAlgebra (SQL) loop. The main loop is leaner, meaner
and easier to maintain.

You can combine any spatial_op with any value_op. To make a C version
of MapAlgebra, you implement a value_op which can apply an expression
to two bands, and return a single band.

Note: to "implement" an extension point, you just write a C function
with a signature that matches the typedef for that extension point.
Consider it a "plugin".

For rev. 2, I have in mind to reorganize the extension points as
indicated in the architecture document. At that point, this single
function should accept any combination of geometry, geometry-value, or
raster types and return a raster. The input types simply need to be
compatible with the provided value_op. If you've implemented a
MapAlgebra value_op, and constrain the inputs to two Spatial+Value
types (raster and geometry-value), you need do nothing else other than
call this engine with the correct arguments.

In short, and in terms of the planned implementation where all spatial
operations are backed by a MapAlgebra call, I've given spatial
operations access to the important and relevant parts of the
MapAlgebra internals, at the same time giving them increased access to
the raster data (e.g., all bands).

> I think your document need to define properly concepts like:
>
> -information types
> -data types
> -mask
>

It does. (or tries to).

http://trac.osgeo.org/postgis/wiki/WKTRaster/SeamlessArchitecture#Types-of-information
http://trac.osgeo.org/postgis/wiki/WKTRaster/SeamlessArchitecture#Representation

I hammered this out pretty quickly, so it may need to be reorganized,
reworded or redone, but that's why it's on a wiki. It should accrete
improvements over time.

> And you also need to list the advantages of developing a modular approach using mask have over the planned approach ST_Mapalgebra(rast), ST_MapAlgebra(rast1, rast2). Up to now I don't see any. Sorry. What can we do with this architecture that we cannot do with a good set of Mapalgebra and set operators like ST_Intersection()?

First off, the architecture document is not the implementation.  The
architecture document is a set of concepts which will hopefully be
useful when trying to define what constitutes "seamless" vector/raster
interaction. Comprehensive and abstract is the name of the game here.
Because all of the options which could possibly apply to a two-input
operation have been fleshed out once, and because these "possible
combinations" are the same for every two-input operation, you can look
at any specific operation and determine "degrees" of seamlessness by
the number of combinations which have not been covered. It may even be
that a single operation goes by different names when it accepts
different arguments or returns a different type. Think of it as an
abstract specification. It can be implemented over PostGIS raster, but
it could also be implemented over Oracle Spatial Georaster. There is
nothing that ties it to a particular environment.

Second, because of the above, this architecture document is not in
competition with MapAlgebra or set operators. It can't replace them.
It just gives a way to think about all the possible options for
arguments and return values. Even the tools it outlines are just
concepts (like the notion of a "coverage"). The hope is that such
tools are useful in the implementation of things like MapAlgebra, but
also that they have a much wider applicability.

Third, if you're talking about the C implementation on ticket 1058
competing with the planned approach...well currently the
implementation is just a C framework within which the implementation
of Map Algebra should be pretty trivial. (implement the value_op). If
you choose to port Map Algebra to C, you can either start from
scratch, or you can just implement that value_op extension point.
Besides, you can go ahead with the plan even if MapAlgebra is
implemented in C instead of SQL.

Fourth, MapAlgebra is strictly a "band math" tool. It can't hack an
RGB image, even for simple ops like Union. The iteration engine (both
in concept and in implementation) doesn't care whether the value_op is
"band math" or "raster math".

Fifth, the rev. 2 concept for the raster iteration engine (in the
document) is comprehensive. All combinations of Spatial+Value
information types should be handled. MapAlgebra(raster, raster) ->
raster has 1/4 the ability to handle inputs with different data types
as the rev. 2 iteration engine.

Sixth, reuse, encapsulation, and maintainability: this abstract design
has good separation of concerns while preserving your
one-pass-thru-the-raster performance. In C, not SQL.

>
>> "Operations which accept two Spatial+Value inputs and return a Spatial+Value
>> output may accept or return any mix of geometry-value or raster data types.
>> They may not accept or return geometries. Map Algebra is an example. "
>
> Could they accept geomval?

Yes. geomval is the Postgis raster implementation of
CV_GeometryValuePair, which is listed in the table as being in the
"Spatial+Value" column and the "Vector" row. We could add "examples"
which translate the ISO 19123 types to postgis raster types. See the
3x3 table in the section:

http://trac.osgeo.org/postgis/wiki/WKTRaster/SeamlessArchitecture#Expected-Results-and-Information-types

For that matter, the "geometry" type can be used wherever you see
GM_Object (from ISO 19107) or CV_DomainObject (from ISO 19123).
"geomval" can be used wherever CV_GeometryValuePair (ISO 19123)
occurs.  and "raster" can be used in place of CV_Grid (ISO 19123) or
any of its children. For our purposes here, CV_Grid and all of its
children are the same type.

> I think you need to come up with some example function prototypes (or just signatures) so we better understand your proposition. You introduce many concepts and we don't see the functional end of them.

The table in Example 1 has three columns: Input 1, Input 2 and Return
Each row in the table is a signature. Since the example is for
intersection, the signatures would be (using postgis raster data
types):

intersection(geometry, geometry) -> geometry
intersection(geometry, raster) -> 	geometry
intersection(raster, geometry) -> 	geometry
intersection(raster, raster) -> 	geometry
intersection(geometry, geometry) -> 	raster
intersection(geometry, raster) ->	raster
intersection(raster, geometry) -> 	raster
intersection(raster, raster) -> 	raster

The name of the function is not important. The point of the example is
that when you choose limit the information type to spatial-only, there
are eight possible function signatures (defined in terms of data
types).

The reason to avoid getting locked into datatypes and to use
information types instead is: the above eight signatures, given in
terms of data types, does not make explicit the fact that all of the
rasters are treated as masks. This is what "spatial only" means. If
you wish to do something with values, you wander into Example two,
where you decide to support more combinations of information types.
Supporting an additional information type has a broader impact than
merely doing something with the values of the rasters. It also means
adding the geomval data type into the mix. It changes what you expect
to get back from the function.

> They can be exported to external systems
>> which may require this simple representation because they are not so
>> geospatially keen as Postgis Raster (e.g., ImageMagick or the GIMP;
>> transparency comes to mind). If things which return spatial+value information
>> could initialize their basic raster properties (position of UL corner, cell size,
>> extent, etc.) from a mask, the result would be inherently aligned. Using the same
>> mask more than once would give a set of aligned products.
>
> In you view is a mask a special object or merely a 1BB raster?

A mask is a Spatial-Only raster (or one which we declare will be
interpreted as a spatial-only raster). It can be represented by:

A 1BB raster with true/false values.
A non-1BB raster without a nodata value, where (0==false; !0==true)
A non-1BB raster with a nodata value, where (nodata==false; !nodata==true)

It is not a special type. It is an existing type with special
conditions (e.g. the values convey spatial information).
Representations #1 and #3 cannot be misused, however, defining
functions which accept representation #2 may result in someone
mistakenly passing a value-containing raster when they meant to pass
spatial information. A certain amount of care must be exercised here.

If the mask needs to be exported, the correct representation for the
external application needs to be selected by the user at runtime.

In essence, when we declare that an operation/function accepts a mask
somewhere in the argument list, we are stating that only the spatial
information will be used. The value information will be ignored.
Clearly, you can supply any old raster where a function specifies a
"mask". But the fact that it's declared to be a mask should tell you
that the function ignores the raster's values.You should be confident
that changing the values will not change the result (as long as it
doesn't change the spatial true/false evaluation listed above.) If you
have two rasters with the needed spatial information but different
types (8BUI and 32BFP), you can use the smaller one and use less
memory for the same result.

> ST_AsRaster(geometry, value) is planned to accept a value to be burned in the raster. This value can be different than 1. Rasterized geometries might be masks but they can also convey values. Only one. You can then use them in ST_MapAlgebra(rast1, rast2).

A valid use of ST_AsRaster(geometry, value) would be to create an
instance of the third representation above: a non-1BB raster with a
nodata value. This signature really needs TWO values (a true value and
a false value). The "nodata value" of the result must be set to the
"false" value, and all pixels of the result raster must be initialized
with a value: true_value is inside, false_value is outside.

Bryce