[GRASS5] r.average and FP maps

Thu Aug 12 12:43:02 EDT 2004

Markus Neteler wrote:

> I have tried to average elevations for selected zones
> with r.average but wasn't quite successful.
> 
> base map: raster map of farms with category number per farm
> cover map: FP DEM
> 
> Example with Spearfish 5.7:
> 
> r.info -t elevation.10m
> datatype=DCELL
> 
> r.average -c base=fields cover=elevation.10m out=fields_elev_mean
> WARNING: r.stats: cats for elevation.10m sre either missing or have no
>          explicit labels. Using nsteps=255

This message is because r.stats is being passed the -C flag. I don't
know whether or not that's the right thing to do.

> r.stats:  100%
> percent complete:  100%
> 
> r.average is calling r.stats to calculate the result, but probably
> the module needs a significant change to work properly with
> FP data?
> 
> Any other idea?

The nature of r.stats is such that it doesn't make sense to use it on
FP maps; the behaviour would typically degenerate to "r.stats -1",
i.e. each FP value which actually occurred in the map would be output
with an associated count of 1 (or an associated area equal to the area
of a cell).

Programs such as r.statistics or r.average may as well use something
like:

	r.stats -1 | sort -n -k 1 | uniq -c | awk '{print $2, $3, $1}'

for FP maps, and scale the cell counts by the area of a cell.

Note that this will be significantly slower than using r.stats. 
However, it will be more accurate, as no quantisation is occurring.

With the r.stats approach, increasing the quantisation granularity
would improve accuracy; OTOH it will also increase r.stats' memory
consumption. In the degenerate case, r.stats will end up with one
"bin" per cell, i.e. storing both maps in memory in a very inefficient
format (a 28-byte Node structure per cell in addition to the actual
cell values).

The real solution would be for r.statistics and r.average to do the
collation themselves, with an accumulator structure for each category
in the base map. For each (baseval,coverval) pair, you would update
the accumulator corresponding to the base value.

E.g. for method=average (pseudo-code):

	int num_cats = base_max - base_min + 1;
	struct accum {
		DCELL sum;
		int count;
	} *accumulator = G_malloc(num_cats * sizeof(struct accum));
	int i;

	for (i = 0; i <= base_max - base_min; i++)
	{
		accumulator[i].sum = 0;
		accumulator[i].count = 0;
	}

	for (row = 0; row < cellhd.rows; row++)
	{
		G_get_c_raster_value(base_fd, base_buf, row);
		G_get_d_raster_value(cover_fd, cover_buf, row);

		for (col = 0; col < cellhd.cols; col++)
		{
			CELL baseval = base_buf[col];
			DCELL coverval = cover_buf[col];
			int idx = baseval - base_min;

			accumulator[idx].sum += coverval;
			accumulator[idx].count++;
		}
	}

	for (i = 0; i <= base_max - base_min; i++)
		average[i] = accumulator[i].sum / accumulator[i].count;

This requires memory proportional to the range of base categories. I
suppose that there would be a problem if the categories were sparse,
but that would also be a problem when generating the reclass table.

The sum, standard deviation, variance etc can be computed similarly. 
Computing the median is somewhat harder; you may need to use multiple
passes unless you're willing to store an entire map in memory. The
mode probably isn't meaningful for FP data.

-- 
Glynn Clements <glynn.clements at virgin.net>