[GRASS5] r.average and FP maps

Fri Aug 13 06:11:48 EDT 2004

On Thu, Aug 12, 2004 at 05:43:02PM +0100, Glynn Clements wrote:
> 
> Markus Neteler wrote:

> > I have tried to average elevations for selected zones
> > with r.average but wasn't quite successful.

> The nature of r.stats is such that it doesn't make sense to use it on
> FP maps; the behaviour would typically degenerate to "r.stats -1",
> i.e. each FP value which actually occurred in the map would be output
> with an associated count of 1 (or an associated area equal to the area
> of a cell).
> 
> Programs such as r.statistics or r.average may as well use something
> like:
> 
> 	r.stats -1 | sort -n -k 1 | uniq -c | awk '{print $2, $3, $1}'
> 
> for FP maps, and scale the cell counts by the area of a cell.
> 
> Note that this will be significantly slower than using r.stats. 
> However, it will be more accurate, as no quantisation is occurring.

That's fine. Time is not an issue if it worked... at time it is
sort of broken.

> With the r.stats approach, increasing the quantisation granularity
> would improve accuracy; OTOH it will also increase r.stats' memory
> consumption. In the degenerate case, r.stats will end up with one
> "bin" per cell, i.e. storing both maps in memory in a very inefficient
> format (a 28-byte Node structure per cell in addition to the actual
> cell values).

Isn't this somewhat related to the new r.univar implementation?
The memory problem will be similar.

> The real solution would be for r.statistics and r.average to do the
> collation themselves, with an accumulator structure for each category
> in the base map. For each (baseval,coverval) pair, you would update
> the accumulator corresponding to the base value.
> 
> E.g. for method=average (pseudo-code):
> 
> 	int num_cats = base_max - base_min + 1;
> 	struct accum {
> 		DCELL sum;
> 		int count;
> 	} *accumulator = G_malloc(num_cats * sizeof(struct accum));
> 	int i;
> 
> 	for (i = 0; i <= base_max - base_min; i++)
> 	{
> 		accumulator[i].sum = 0;
> 		accumulator[i].count = 0;
> 	}
> 
> 	for (row = 0; row < cellhd.rows; row++)
> 	{
> 		G_get_c_raster_value(base_fd, base_buf, row);
> 		G_get_d_raster_value(cover_fd, cover_buf, row);
> 
> 		for (col = 0; col < cellhd.cols; col++)
> 		{
> 			CELL baseval = base_buf[col];
> 			DCELL coverval = cover_buf[col];
> 			int idx = baseval - base_min;
> 
> 			accumulator[idx].sum += coverval;
> 			accumulator[idx].count++;
> 		}
> 	}
> 
> 	for (i = 0; i <= base_max - base_min; i++)
> 		average[i] = accumulator[i].sum / accumulator[i].count;
> 
> This requires memory proportional to the range of base categories. I
> suppose that there would be a problem if the categories were sparse,
> but that would also be a problem when generating the reclass table.
> 
> The sum, standard deviation, variance etc can be computed similarly.

This would also close the report:
http://intevation.de/rt/webrt?serial_num=1848&display=History

> Computing the median is somewhat harder; you may need to use multiple
> passes unless you're willing to store an entire map in memory. The
> mode probably isn't meaningful for FP data.

At least a working average and stddev would be fine...

Markus