[GRASS-dev] r.resamp.aggregate, trimmed means

Mon Oct 2 14:10:39 EDT 2006

Hamish wrote:

> > > > Can we make a decision as to the official name of this module?
> > > 
> > > how about r.resamp.bin? (as in binning, not binary)
> > > 
> > > If r.resamp.aggregate is used, I'd vote for using the full word.
> > > I don't think "aggregate" abreviates well.
> > >
> > > by the way, what is this module?
> >
> > It's a resampling module where the value of the output cell is an
> > aggregate (mean, median, mode, min, max etc) of the values from all of
> > the input cells whose centres lie within the bounds of the output
> > cell.
> 
> Hey, that is pretty neat. I can see it being useful for sub-sampling
> noisy imagery data with low signal:noise (but > 50%) using a median
> filter.
> 
> Do we have a way to do a trimmed mean? (eg mean of values between perc10
> and perc90; replace 10% with 5% or x%, or 2 Stdevs, etc..) I've always
> liked it as a nice compromise between the outlier-discarding ability of
> the median and the less arbitrary nature of the mean.
> I'd like to have this in r.univar and r.series* too... :)

r.resamp.stats currently supports:

	{c_ave,    "average",   "average (mean) value"},
	{c_median, "median",    "median value"},
	{c_mode,   "mode",      "most frequently occuring value"},
	{c_min,    "minimum",   "lowest value"},
	{c_max,    "maximum",   "highest value"},
	{c_quart1, "quart1",    "first quartile"},
	{c_quart3, "quart3",    "third quartile"},
	{c_perc90, "perc90",    "ninetieth percentile"},

This list is essentially all of the aggregates from lib/stats where
the result is "representative" of the inputs (where the result is a
value within the range covered by the inputs, i.e. not variance,
stddev etc) and which don't depend upon the order of the inputs (e.g. 
the linear regression aggregates).

If additional aggregates are added to lib/stats, it's straightforward
to extend r.resamp.stats and/or r.series to support the new
aggregates.

The design of lib/stats isn't practical for modules which compute
aggregates over a large number of samples (e.g. r.statistics), as the
aggregates require the entire set of values to be passed as a single
array.

The main deficiency at present is that it doesn't support aggregates
with parameters, e.g. you can't have a generic "percentile" aggregate
where the percentile is specified as a parameter.

For more complex aggregates (e.g. a trimmed mean), this is likely to
be a significant deficiency. As it stands, "trimmed mean between 10%
and 90%" would be one aggregate, "trimmed mean between +/- 2 sigma"
would be another, etc.

It would be simple enough to add a "const char *parms" argument to the
aggregates, allowing an arbitrary set of parameters to be passed as a
string (similar to PROJ projection parameters).

A more structured approach, where the aggregates declare their
parameters, would be preferable, as this would allow the library
and/or module to handle validation and parsing. But that needs to be
designed before it can be implemented, and requiring a design for
something usually equates to "kicking it into the long grass"
(grass/GRASS pun not intended, although probably appropriate).

-- 
Glynn Clements <glynn at gclements.plus.com>