[GRASS5] [bug #1848] (grass) merge of r.average, r.median and r.mode

Glynn Clements glynn.clements at virgin.net
Thu Dec 18 21:09:20 EST 2003


Harmisch Bowman via RT wrote:

> this bug's URL: http://intevation.de/rt/webrt?serial_num=1848

> There is a lot of room for code merging here, see the following thread:
> 
> http://article.gmane.org/gmane.comp.gis.grass.devel/1969
> http://article.gmane.org/gmane.comp.gis.grass.devel/1970
> http://article.gmane.org/gmane.comp.gis.grass.devel/1973
> http://article.gmane.org/gmane.comp.gis.grass.devel/2380
> 
> 
> Summary:
> 
> It would be nice to take the functions from
> src/raster/r.series/cmd/c_*
> 
> and make a stats library in
> src/libes/gmath

Those functions are only suitable for handling relatively small
amounts of data, as they require the entire set to be passed as a
single array. E.g. computing the mean cell value for an entire map
would require reading the entire map into memory.

A more useful interface would have begin/update/end operations, with
the update operation being called repeatedly in a loop. However, some
aggregates might need multiple passes.

E.g. for computing the median, you want to avoid having to sort large
lists. It would be preferable to have a first pass which simply counts
the number of values falling into each of a number of ranges. From
this, you can determine the range into which the median falls, and the
position of the median within that range. A second pass would extract
the subset of values which fall into that range. Finally, you sort the
subset and extract the appropriate element.

For a highly uneven distribution, you might need to make several
passes before getting a subset which is small enough to sort (the
first pass might have 99% of the data in a single range).

This isn't an issue for r.series; because it needs to have all of the
input maps open simultaneously, the limit on the number of open maps
will ensure that the number of samples remains manageable.

But, in general, assuming that the entire set fits into memory
excludes any application where the set is either an entire raster or a
significant portion of one. So I doubt that the code from r.series is
of much use except for r.series (and possibly r.mapcalc; although the
nature of r.mapcalc's interface is such that the required "glue" code
may exceed that of the actual algorithm).

-- 
Glynn Clements <glynn.clements at virgin.net>




More information about the grass-dev mailing list