[GRASS-dev] r.regression.line fix proposal

Wed Dec 5 11:32:49 EST 2007

Dylan Beaudette wrote:

> > Would it be a big mess to implement r.regression.line a C module?
> >
> > Markus
> 
> If written as a C module, could it take advantage of any stats library 
> functions lying around ? 

The aggregate functions in lib/stats require that the entire sample is
held in memory. This makes them unsuitable for computing an aggregate
over a substantial proportion of a map's values.

For r.regression.line, r.univar, r.statistics, etc, you need to use an
incremental approach. For some aggregates (count, sum, mean), this is
relatively straightforward.

For variance and deviation, there's the issue of a one-pass or
two-pass algorithm. A two-pass approach (calculating the mean on the
first pass) is more accurate, but requires two passes (which rules out
reading data from a pipe).

For quantiles, you don't want to sort vast amounts of data at
O(n.log(n)) complexity just to obtain specific quantiles. It's more
efficient to compute successive histograms, refining the interval(s)
containing the desired quantile(s) on each pass, and only sorting once
you've reduced the data to a manageable size. This could require
several passes, depending upon the amount of data, the amount of
memory available, and the distribution of the data.

-- 
Glynn Clements <glynn at gclements.plus.com>