[GRASS-dev] r.univar: allow multiple rasters to be processed

Glynn Clements glynn at gclements.plus.com
Sat Feb 23 05:32:17 EST 2008


Glynn Clements wrote:

> So far as memory usage is concerned: if we think that people might
> want to compute quantiles on data sets which are large enough that we
> need to worry about memory consumption, we should probably be looking
> for a more efficient algorithm. Sorting the entire data set then
> pulling out quantiles is less than ideal if you're dealing with that
> much data.

I've added a new module, r.quantile, which computes quantiles without
loading the entire map into memory.

Apart from not being limited by memory availability, it should have
better asymptotic performance. Sorting large amounts of data is
O(n.log(n)), while r.quantile is mostly O(n).

The final step still involves sorting, but the data being sorted
consists of one bin for each quantile, where the size of the bin will
be roughly inversely proportional to the number of bins used (which is
user selectable, and defaults to 1,000,000 bins).

It has only had brief testing, but it manages to process a map of ~30
million cells (elevation.dem resampled to 3m resolution, plus some
noise to smooth the distribution) in ~1 minute on a P3/800.

I tried running r.univar on the same map for comparison, but it
crashed while trying to compute the percentile (the other statistics
were computed okay).

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list