[GRASS-dev] r.statistics limitation to CELL
Sören Gebbert
soerengebbert at gmx.de
Sat Jun 2 08:55:44 EDT 2007
Hi,
Markus Neteler schrieb:
> On Sat, Jun 02, 2007 at 01:00:12PM +0100, Glynn Clements wrote:
>> Markus Neteler wrote:
>>
>>> would it be much work to fix this:
>>>
>>> GRASS 6.3.cvs (nc_spm_05):~ > r.statistics base=landuse96_28m \
>>> cover=elevation out=elevstats_avg method=average
>>> ERROR: This module currently only works for integer (CELL) maps
>>>
>>> Rounding elevation to CELL first isn't a great option.
>> 1. r.statistics works by reclassing the base map, so the base map
>> can't be FP.
>
> In this case I meant the cover map "elevation" which is rejected.
> landuse96_28m is a CELL map, elevation FCELL.
>
>> 2. r.statistics uses r.stats to calculate the statistics, and r.stats
>> reads its inputs as CELL.
>
> In this case, r.statistics could also accept an FCELL map
> without complaining? Currently I need extra steps to round
> elevation to a CELL map before running r.statistics.
>
>> r.stats is inherently based upon discrete categories. Even if it reads
>> FP maps as FP, you would need to quantise the values, otherwise you
>> would end up with every cell as its own category with count == 1. This
>> would require memory proportional to the size of the input map
>> multiplied by a significant factor (48-64 bytes per cell, or even
>> more).
>>
>> To handle FP data, you really need a completely new approach which
>> computes aggregates incrementally, using an accumulator. This would
>> limit it to aggregates which can be computed that way, e.g. count,
>> sum, mean, variance and standard deviation.
>
> I darkly remember that Soeren has something already in the works.
I have implemented r3.stats from scratch to handle fp ranges:
GRASS 6.3.cvs > r3.stats help
Description:
Generates volume statistics for raster3d maps.
Keywords:
raster3d, statistics
Usage:
r3.stats [-e] input=name [nsteps=value] [--verbose] [--quiet]
Flags:
-e Calculate statistics based on equal value groups
--v Verbose module output
--q Quiet module output
Parameters:
input Name of input raster3d map
nsteps Number of sub-ranges to collect stats from
default: 20
Example:
#region
north: 1000
south: 0
west: 0
east: 2000
top: 10.00000000
bottom: 0.00000000
nsres: 50
nsres3: 50
ewres: 50
ewres3: 50
tbres: 1
rows: 20
rows3: 20
cols: 40
cols3: 40
depths: 10
cells: 800
3dcells: 8000
# create a 3d map
r3.mapcalc "map3d = depth()"
# automatically calculated value range sub-groups
GRASS 6.3.cvs > r3.stats map3d nsteps=5
100%
num | minimum <= value | value < maximum | volume | perc | cell count
1 1.000000000 2.800000000 4000000.000 20.00000 1600
2 2.800000000 4.600000000 4000000.000 20.00000 1600
3 4.600000000 6.400000000 4000000.000 20.00000 1600
4 6.400000000 8.200000000 4000000.000 20.00000 1600
5 8.200000000 10.000000001 4000000.000 20.00000 1600
6 * * 0.000 0.00000 0
Sum of non Null cells:
Volume = 20000000.000
Percentage = 100.000
Cell count = 8000
Sum of all cells:
Volume = 20000000.000
Percentage = 100.000
Cell count = 8000
# groups of equal values
GRASS 6.3.cvs > r3.stats -e map3d
100%
Sort non-null values
num | value | volume | perc | cell count
1 1.000000 2000000.000 10.00000 800
2 2.000000 2000000.000 10.00000 800
3 3.000000 2000000.000 10.00000 800
4 4.000000 2000000.000 10.00000 800
5 5.000000 2000000.000 10.00000 800
6 6.000000 2000000.000 10.00000 800
7 7.000000 2000000.000 10.00000 800
8 8.000000 2000000.000 10.00000 800
9 9.000000 2000000.000 10.00000 800
10 10.000000 2000000.000 10.00000 800
11 * 0.000 0.00000 0
Number of groups with equal values: 10
Sum of non Null cells:
Volume = 20000000.000
Percentage = 100.000
Cell count = 8000
Sum of all cells:
Volume = 20000000.000
Percentage = 100.000
Cell count = 8000
The approach is based on binary tree search algorithm.
The memory consumption increases with the number of sub-ranges or
the number of groups of equal values.
The map does not need to be read completely in memory.
AFAICT the sub-range search algorithm has an O(nlog(n)) complexity.
I guess the equal value computation can be improved, because the
computation time increases with the number of detected equal value groups.
Sören
>
>> [The last two would need to either use the one-pass algorithm (which
>> can result in negative variance for near-constant data due to rounding
>> error), or use two passes (computing the mean in the first pass so
>> that the actual deviations can be used in the second pass). See also:
>> the history of r.univar.]
>>
>> As I've mentioned several times before, computing quantiles (e.g.
>> median) of large amounts of floating-point data is an open-ended
>> problem; any given approach has both pros and cons.
>>
>> --
>> Glynn Clements <glynn at gclements.plus.com>
>
> Markus
>
> _______________________________________________
> grass-dev mailing list
> grass-dev at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grass-dev
>
More information about the grass-dev
mailing list