[postgis-devel] gen2 raster iterator tutorial

Sat Sep 17 10:16:57 PDT 2011

On Sat, Sep 17, 2011 at 12:01 PM, Pierre Racine
<Pierre.Racine at sbf.ulaval.ca> wrote:
>> I'm still writing code, so this isn't on a ticket yet.
>
> Normally we write a ticket first, then we discuss the strategic implementation and then we write code.

The "architecture" wiki page describes the approach at an abstract
level, with pictures, in a great amount of detail. At that point you
asked for code examples. This is it. The architecture document didn't
even start until "generation 2". Generation 1 is still on a ticket.
(#1058).

For four months I've been dragging people into discussions on the list
regarding things I see as real or potential problems, with varying
outcomes. Some related to this, some not. Some inspired conversation
and some didn't.

> I still haven't seen a clear argument demonstrating the flaws of this direction and the advantages in term of simplicity/performance/flexibility/new functionality of a new approach and, still, this is far from being clear when reading this tutorial.

Well the tutorial is meant to provide an example of usage, not a
justification for existence.

I don't know how many times I've said: this is mostly about porting
the existing SQL into C in the most maintainable way with the least
amount of redundant code. Specifically, it's about not concentrating
all possible variants of functionality inside the main loop of
MapAlgebra, then writing simple, special case user functions to select
that particular branch. This is not about changing user-visible method
signatures.

You can think of it as a transition from a bunch of "if" statements in
the middle of the loop to set of function callbacks. At times, I've
called this providing for "extension points". It's the specifics of
implementation, not the direction, which this architecture addresses.

To reiterate points from earlier in the summer: the current
implementation can only accomodate new functionality by tacking yet
another piece on, endangering all existing functions due to unplanned
interactions. "Extension points" allow new functionality to be created
by allowing the creation of a plugin which is separate from existing
code and which cannot interact with it. (e.g., to use the new
functionality, you provide the new plugin instead of the old one.) In
short, the current implementation is good for a prototype which can
demonstrate a small subset of the possible combinations on the tables
on the architecture page, but cannot grow much more than it already
has. Part of this has to do with the limited capabilities of pl/pgsql:
I think you may have maxed out that platform. Plus you don't have
access to GDAL from SQL.

And of course, at some point I stop providing justification for a
well-designed architecture adhering to standard practices. Everything
I've said so far can be summarized to a software engineer succinctly
as: The current implementation has poor encapsulation, poor separation
of concerns, and nonexistent extensibility. The same can not be said
of the architecture I proposed, offered for discussion, then
implemented.

> What are the problems encountered in the current approach (vectorize raster when operating in vector mode and rasterize when operation in raster mode) justifying a new one?

False statement: you have no current implementation of "raster mode"
ST_Intersection. They all return geomvals. You only have
ST_Intersection and none of the other operations. Thus far, my
architecture only concerns "raster mode".

Complexity. The current implementation is scary long. And so far, it
supports no spatial relationship functions.

Fragility: Adding new things endangers old ones.

> What benefit would we have with the new one? What do we lose (if we lose anything)?
>
> What impact would this have on the SQL API? What new functionality does this bring?

1] Well, you'd gain a comprehensive set of "raster mode" operations
(ST_Intersection, ST_Union, ST_Difference, ST_Symdifference: all of
which return raster). There is nothing in the current implementation
for these to replace (or mask); hence there really is no "raster mode"
as of now.

2] If my idea about adding a geometry/geomval iterator to the
framework pans out ("future directions" on the tutorial), you gain a
solid base on which to finish writing functions which return geomvals
(ST_Union, ST_Difference, and ST_Symdifference). Again, you haven't
written these.

3] ST_Intersection returning geomval could be ported to C. The SQL
interface should stay the same.

4] The current implementation resamples on the fly (eliminating the
storage of the rasterized intermediate product), can handle arbitrary
raster alignments, and can reproject to different coordinate systems.
As far as I know, these features aren't even being discussed for the
functions which don't yet exist.

5] As the (EVALUATOR *) is an extension point, and is what provides
for sampling or resampling, we can write one for any sampling method
we choose: bilinear, bicubic, etc. Provided that GDAL exposes the
required functionality, we can provide a thin adapter which defers the
heavy lifting to GDAL.

6] One-and-two input MapAlgebra would be improved by porting them to C
using this framework. (Cleanly, I might add). Just write an (EVALUATOR
*) which submits the expression to the SQL parser, like you have now.
It can immediately benefit from everything in #4. Due to the nature of
the framework, you also get (for free) MapAlgebra functions which can
take (geomval, raster) as arguments. This was the main improvement of
gen2 over gen1.

Mostly what you'd lose is a plan (not code) to base all of #1 on a
very complex two raster MapAlgebra function which can't handle
geomvals, can't reproject, and can't tolerate different raster
alignments.

Why is this meeting such resistance?

> This is a basic 1) problem identification, 2) proposed solution, 3) pros and cons methodology. Up to now we've got only 2)...

False.