[postgis-devel] Raster philosophy in a vector world

Bryce L Nordgren bnordgren at gmail.com
Sat Jun 11 12:00:09 PDT 2011


Warning: long. very long.

I spent some time thinking about a statement of work for postgis raster
fv.02 and 03 this morning. (BTW: The "planning" page is out of sync with the
"Specifications" page, as fv.03 on the planning page is fv.04 on the specs
page.)  I want to try and make sure that proposed development on my part is
consistent with the overarching structure established by you all. For that I
need the big picture, so the first thing I need to make sure I have nailed
down is the big picture.

The thing which attracted me to postgis raster at first was the desire to
provide a set of seamless set of operations for vector and raster
operations. However, it seems that even my very first attempt to use the
tool has exposed a fundamental philosophical difference between vector and
raster which is difficult to handle uniformly. Basically, I wanted to
reconstruct an entire raster from the tiles into which it had been
segmented.

A simple case like this, where the tiles are regularly blocked and disjoint,
seems amenable to treatment with a raster version of the existing
ST_Collect() function (which is speedy because it doesn't try anything
crafty to eliminate overlaps.) The proposed implementations of ST_Union in
svn assume a more general case where there were overlaps, and hence a need
to handle the selection of a destination raster value from among the
candidates in the source rasters.

Right there is the difference between geometry and raster processing.
Geometries carry shape information with no implied value (e.g., they are
primitives). Rasters are closer to coverages or feature tables: each pixel
location (geometry) is inseparably paired with a set of numeric values in
the various bands. (CV_GeometryValuePair in 19123 speak)

Looking at the current prototypes, I think there may be some benefit to
identifying some concerns to separate before too much more work is done.

Firstly, I think we can and should separate purely geometric items from the
need to generate a value fore each resultant pixel.

Secondly, we need to separate single-band-value operations from raster-value
operations. ENVI embodies this distinction as Band Math vs. Spectral Math.
For instance, setting the color of a pixel would be a raster-value (all
band) operation, whereas most of the MapAlgebra and statistics functions are
single-band-value operations.

Thirdly, I think we've entered territory where aggregate functions need to
be considered separately from the non aggregate functions; at least in some
cases.

Intuitively, I want the geometric operations to be as similar to their
vector counterparts as possible/reasonable. They should be primitives upon
which more complex functions can be based, and not the other way around.

For the simple case, the geometric behavior of these new raster-returning
functions is well constrained. For more complex cases, geometric behavior is
somewhat less well defined. Whether the geometric behavior is simple or not
depends on the data itself (do the input grids have the same pixel size? are
they related by a simple translation? are they rotated with respect to each
other?)  Geometrically speaking, raster aggregate functions do not need
different treatment than their vector counterparts.

Considered separately, the complexity of a selection of a value for each
destination raster cell depends on the spatial predicate and not the data.
If two rasters are participating: ST_Difference and ST_SymDifference have
only one possible value; ST_Union could possibly force a choice;
ST_Intersection is guaranteed to force a choice.  Alternatively, each of
these predicates could be equally simple if they just returned a mask. (And
returning a mask would be as close as possible to the semantics/behavior of
the geometric predicates.) Clearly, there is no ambiguity as to the
resultant value if a raster and a geometry are inputs to the predicate.

Aggregate functions which set the value in a raster result must be defined
and used extremely carefully. Functions which yield the same value
regardless of evaluation order can be provided safely (count, sum, stdev,
min, max, etc.) Other functions may be provided on an "at-the-user's risk"
basis. "First" and "Last" will evaluate differently depending on how the
query is evaluated by the server--assuming there are overlapping input
rasters. However, for the ST_Collect or ST_Union call intended to assemble a
larger raster out of many non-overlapping components, these are needed.

Thank you for your patience as I grapple with the big picture in email. I'll
let this brew for a little while before solidifying any statements of work.
Please if you do have observations, comments or corrections to my big
picture, do speak up.

Bryce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20110611/2e732d32/attachment.html>


More information about the postgis-devel mailing list