[GRASS5] raster inaccuracy

Thu Nov 11 22:05:33 EST 2004

Roger Miller wrote:

> > The key point is that, in common with almost every other program 
> > which operates upon rasters, d.rast isn't reading (or displaying)
> >  the original raster, but the raster after it has been resampled according
> > to the current region settings.
> > 
> > If the current region has the same resolution as the raster, but 
> > isn't aligned to the same grid, the result is that the raster which 
> > is displayed on the monitor will be shifted by up to +/- half a cell.
> 
> I see what you mean.  It leaves me with a problem.  In the general case grass
> raster displays and grass raster queries are not accurate; they may be grossly
> inaccurate.  That is because the original data when displayed are resampled to
> a grid that may have little in common with the original data array.  In my
> examples the data were displayed in a grid that overlapped as little as 25% of
> the actual data cell.  The d.what.rast query returns the data *as displayed*
> not the original data, so it propogates the problem.

However, this isn't just an issue for d.rast. It applies to (more or
less) r.<everything>.

One way to look at it is: "What You See Is What GRASS Gets", i.e. what
is displayed on the monitor is what a raster (r.*) program will get if
it reads the raster while the current region settings are in effect
(if it doesn't, that *is* a bug).

> It appears to me that there are two (and only two) cases in which the raster
> display and the raster queries will always be accurate:
> 
> 1) The region resolution is set to the pixel size for the current display. 
> For instance, in my 2x2 example with the default 640x480 monitor size that
> means setting the row and column sizes to 22 units.
> 
> 2) The region boundaries are set to correspond exactly to boundaries between
> rows and columns in the original data and the region resolution is set so
> there is an integral number of region rows and columns for each row and column
> in the original data.

Correct.

> I don't understand what gets displayed (hence queried) in the case where the
> data cells are smaller than the pixel size.  That seems to be a largely
> different problem, but another problem with accuracy.

Actually, it's exactly the same problem. The sampling algorthim is
identical regardless of whether region's resolution is higher or lower
than that of the original raster.

Possibly the easiest way to look at it is this: a raster map defines a
function over the plane (the set R^2), i.e. it has infinite resolution
and infinite extent (although any non-null values are contained in a
finite subregion).

The value at any point on the plane is the value from the raster cell
which contains the point, i.e. the cell whose centre is closest to
that point. If the point lies outside of the raster, the value is
null.

However, a program can't read an infinite amount of data, so it
obtains a finite number of samples, forming a rectangular grid. The
program can choose the dimensions and spacing of the grid, although
most allow the user to choose this via the current region (i.e. the
WIND file).

If you partition the rectangle defined by the N/S/E/W values into
rows*cols "tiles", the sample points are at the centre of each tile,
i.e. at the coordinates:

	x = west  + (east  - west ) * (col + 1/2) / cols
	y = north - (north - south) * (row + 1/2) / rows

for col in [0..cols-1] and row in [0..rows-1] inclusive.

If the region's resolution (spacing) is finer (smaller distance) than
that of the raster, several adjacent samples may fall into the same
cell in the original raster. OTOH, if the region's resolution is
coarser, some cells in the original raster will be omitted from the
sample. If the resolutions are identical, each cell should be sampled
exactly once.

> The first solution is easy.  The simple solution is generally useful for the
> purpose of display and query but it would not work for most analyses of raster
> data -- through r.mapcalc, for instance.
> 
> The second case is more complicated, but I think it would work for all cases
> where the pixel size is much smaller than the resolution of the data.
> 
> It is at least a burden on users to require them to make sure those conditions
> are met, and probably really too much to expect from most users.  Certainly I
> don't like the prospect of manually checking and adjusting the region every
> time I need to query a raster map.
> 
> Is there some way to impose those limits on the region definitions in an
> automatic fashion?

"g.region rast=..." will trivially force the second case (the
"integral number" being 1). Although it might be useful to have a
scale= option for g.region to allow rescaling by an integral amount
("g.region res=..." suffices when the resolution is an integer, but
not in general).

Actually, it might also be useful to have rows= and cols= options, so
that you can set those directly rather than setting res=.

The first case is somewhat more complex, as the monitor's aspect ratio
won't generally match that of the raster, so you first have to
determine whether to "pad" the region horizontally or vertically, then
determine the amount of padding.

A third option would be to provide an option to d.rast to
automatically use the original raster's resolution, so that it is only
resampled once (in general, it gets resampled from the original grid
to the current region, and then again to the monitor's pixel grid).

However, this is risky, because what you're seeing isn't what r.* will
get. It's probably better to force the user to explicitly set the
region so that the display matches the "view" that r.* will receive.

-- 
Glynn Clements <glynn at gclements.plus.com>