[GRASS-user] r.statistics depreciated?

Glynn Clements glynn at gclements.plus.com
Tue Sep 30 10:27:48 EDT 2008


Jarosław Jasiewicz wrote:

> >> 2.) i'd like to rework it to handle floating points numbers. Could be 
> >> there some possible problems with that?
> >
> > AFAICT, one difficulty lies in the fact that the current code calls 
> > r.reclass which only works with integers. But there might be other 
> > issues.
> 
> but it looks it is only for cover map. Base map could be floating?

That would solve the reclass issue.

However, there are other issues, mostly related to r.stats.

First, it currently quantises any FP inputs using the map's
quantisation rules. You could disable the quantisation (and if you
don't, there isn't really much to be gained from making r.statistics
accept FP maps), but that leads to another issue.

r.stats "bins" the data into a set of (base,cover,count) tuples. If
either of the maps is FP, you will typically get one tuple per cell,
with count equal to one.

The main problem with this is the memory consumption. It allocates a
node (bin) for every distinct combination of values from the input
maps. The amount of data required is between 28 and 48 bytes
(depending upon whether pointers and "long" are 32 or 64 bits) for
each node, plus the storage for the values (i.e. another 16 bytes for
2 "double"s). This could mean up to 64 bytes per cell.

There's also the time complexity of inserting new nodes into the data. 
r.stats uses a hashtable with 7307 entries, with each entry containing
a binary tree. If you have one bin for each cell, the individual trees
will still be quite large, which will impact that time taken to
process each cell.

Last (and by no means least), r.stats has to sort the output
(r.statistics relies upon this, e.g. for the median, minimum and
maximum aggregates. And it doesn't even take advantage of the sorting
information held within the binary trees; it sipmly qsort()s the
entire list of tuples. This could be quite slow if you have a lot of
cells.

Having said all that, none of this may be a problem if you aren't
processing particularly large maps. It certainly won't be as much of a
problem as it would have been when r.stats/r.statistics were
originally written.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-user mailing list