[postgis-devel] Raster todo list

Bryce L Nordgren bnordgren at gmail.com
Fri Jun 17 11:10:37 PDT 2011


On Fri, Jun 17, 2011 at 8:56 AM, Pierre Racine
<Pierre.Racine at sbf.ulaval.ca>wrote:

In my two rasters prototype of MapAlgebra(raster, raster) there is also this
> separation between the spatial operation and the value computation part. The
> spatial operation determine the area of the FIRST, SECOND, INTERSECTION or
> UNION of both raster. This is a first step toward your mask with the
> difference that this one is strictly geometric, represent only the extent to
> be iterated over and hence do not require iterating over all the pixels.
>

You store a nodata if either raster contains a nodata. This essentially
forces the only choice of spatial area to be INTERSECTION, but you're
forcing this choice with the value computation and not the spatial
computation. I can see this leading to confusion when someone selects
"UNION" and receives a large, mostly empty raster which is really an
intersection. Actually, I agree with this choice, but it does mean that you
can eliminate "FIRST", "SECOND", and "UNION" from the options since they can
never have any affect other than to create a larger raster than needed to
store the result, padding the "INTERSECTION" with nodata values. It's
non-obvious effects like this which make me very leery of a giant all-in-one
function.

In addition, I would consider it vital that I be able to mask out areas
based on spatial information not present in either raster. For instance:
calculate the NDVI index over all pixels in a satellite scene which are over
land. I'd need to specify a mask to eliminate water areas, using information
not in the satellite scene.


> The second part of MapAlgebra(raster, raster) determine values for each
> pixel using the right expression. This is your mapalgebra part but more
> flexible because it deal with pixels values not just with a mask.
>

The mask is used only to determine where the computation should occur.
Values are always drawn from the raster.


> You said the one raster version of mapalgebra should accept a mask. This is
> fundamentally a two raster mapalgebra. The first raster is the one you want
> to perform calculation on and the second one is the mask. What if both
> raster are not aligned? There is also no guarantee that the mask will
> contain only true/false values. You are clearly building a more restrictive
> MapAlgebra(raster, raster). For me this is a waste of flexibility.
>

Guarantee by convention: 0 is false. everything else is true.
Guarantee by type: band is of type '1BB'

The first band of a mask-raster is the only one which is interpreted.
Another convention.

I agree that raster alignment is a problem. It's a spatial problem which I
think should be solved in ST_{Intersection,Union,Difference,SymDifference},
and not MapAlgebra. In any case, the current version of MapAlgebra requires
that the two rasters have identical pixel dimensions and be related by a
simple translation (not even rotation/skew). The spatial operations should
probably start with the same restrictions and become more general as we
learn how.

The ideal endpoint is that we learn how to produce a mask (and calculate
values) in an arbitrary, user specified projection from two rasters each in
different projections. I just don't think this should be the first problem
we tackle. I have ideas on this, but we clearly have more fundamental issues
here. :)


> First concern: you CAN NOT do 'rast1 * rast2' (nor 'rast1 + rast2') with a
> one raster mapalgebra accepting only a raster and a mask... You could do
> this only if all the pixels of rast2 would have the same value and then act
> as a constant.
>

I'm not sure I understand. You can't do rast1+rast2 with a single raster
version under any circumstances. A mask on the single raster version simply
allows you to apply the single-raster expression to a controlled area.

I think I see a source of confusion, tho. Let's agree that a "mask"
contributes spatial information and does not contribute values, whereas a
"raster" contributes values as well as spatial information. (i.e., a
two-raster, one-mask mapalgebra version would accept three arguments with a
data type of raster.)

First, if you really want to implement 'rast1 + rast2' then what you need is
> actually a three raster version of MapAlgebra(rast1, rast2, mask)... What if
> none of them is aligned? Did you speak about simplicity?
>

I don't think the two approaches differ in the complexity they are going to
encounter out in the wild. Regardless of which functions handle the spatial
alignment, they will have to start out simple and evolve to handle more
complex situations.


>
> In my prototype of mapalgebra the mask extent computation is geometric thus
> VERY fast (no iteration) and the value computation require only one pixel
> iteration over only the intersecting extent and this only once for the whole
> 'rast1 + rast2 + 120' expression. You can event optimize in many case when
> the expression resume to 'rast1' or 'rast2' by copying chunk of pixel values
> (this is the case for ST_Collect). This is what the optimized version is
> trying to do.
>

The two approaches do not differ in what needs to be done, only in where it
is done. Clearly your approach also has memory savings because the mask is
never stored as a raster. Sometimes this is beneficial, and sometimes it is
not.


> There is implicitly a mask in the combination of both rasters. You can
> determine this mask at the same time as you process pixel values. This is
> much more efficient than precomputing the mask and then computing values.
>

Perhaps I should have said "add the option of specifying a mask". Clearly
you can optimize the special case where the mask is never re-used and the
computation of the mask is simple (such as an intersection). However,
complex masks or masks from external sources (land/water mask) are also
common and necessary. Certainly the optimized version of an intersection can
be considered the default if no mask is specified. But if the same mask
needs to be used for other operations (ST_Histogram on one of the source
rasters, for instance), optimizing this one function comes at the cost of
slowing down the entire processing chain. Very Bad Mojo.

Flexibility is very difficult to optimize.

As to the attached sql which is  implements the various ST_{operation}
functions...I think they're going to need a more thorough examination than
what I have time for now. This one leaped out at me:

-- 5) ST_Difference making a mere geometric difference
> SELECT ST_MapAlgebra(rast1, rast2, 'CASE WHEN NOT rast2 IS NULL THEN NULL
> ELSE rast1 END', '32BF', 'FIRST', NULL, 'rast1', NULL)
> FROM (SELECT ST_TestRaster(0, 0, 1) rast1, ST_TestRaster(1, 1, 3) rast2)
> foo
>

Bear in mind that the expression is not evaluated unless BOTH rast1 and
rast2 contain a valid value. The spatial domain of this function should be
only that area of rast1 NOT covered by rast2, so the entire result should be
empty. Aside from that, the expression should just be "rast1", or some
function involving only rast1, because rast2 is guaranteed to not have data
over the result. In short, the second raster conveys ONLY spatial
information and never value information. I don't think you can implement
ST_Difference with the current two-raster mapalgebra at all. This would be a
one-raster mapalgebra accepting a mask.

Bryce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20110617/6a3a4573/attachment.html>


More information about the postgis-devel mailing list