[GRASS-dev] r.statistics limitation to CELL

Sat Jun 2 08:55:44 EDT 2007

Hi,

Markus Neteler schrieb:
> On Sat, Jun 02, 2007 at 01:00:12PM +0100, Glynn Clements wrote:
>> Markus Neteler wrote:
>>
>>> would it be much work to fix this:
>>>
>>> GRASS 6.3.cvs (nc_spm_05):~ > r.statistics base=landuse96_28m \
>>>                cover=elevation out=elevstats_avg method=average
>>> ERROR: This module currently only works for integer (CELL) maps
>>>
>>> Rounding elevation to CELL first isn't a great option.
>> 1. r.statistics works by reclassing the base map, so the base map
>> can't be FP.
> 
> In this case I meant the cover map "elevation" which is rejected.
> landuse96_28m is a CELL map, elevation FCELL.
> 
>> 2. r.statistics uses r.stats to calculate the statistics, and r.stats
>> reads its inputs as CELL.
> 
> In this case, r.statistics could also accept an FCELL map
> without complaining? Currently I need extra steps to round
> elevation to a CELL map before running r.statistics.
>  
>> r.stats is inherently based upon discrete categories. Even if it reads
>> FP maps as FP, you would need to quantise the values, otherwise you
>> would end up with every cell as its own category with count == 1. This
>> would require memory proportional to the size of the input map
>> multiplied by a significant factor (48-64 bytes per cell, or even
>> more).
>>
>> To handle FP data, you really need a completely new approach which
>> computes aggregates incrementally, using an accumulator. This would
>> limit it to aggregates which can be computed that way, e.g. count,
>> sum, mean, variance and standard deviation.
> 
> I darkly remember that Soeren has something already in the works.

I have implemented r3.stats from scratch to handle fp ranges:

GRASS 6.3.cvs > r3.stats help

Description:
  Generates volume statistics for raster3d maps.

Keywords:
  raster3d, statistics

Usage:
  r3.stats [-e] input=name [nsteps=value] [--verbose] [--quiet]

Flags:
   -e   Calculate statistics based on equal value groups
  --v   Verbose module output
  --q   Quiet module output

Parameters:
    input   Name of input raster3d map
   nsteps   Number of sub-ranges to collect stats from
            default: 20

Example:

#region
north:      1000
south:      0
west:       0
east:       2000
top:        10.00000000
bottom:     0.00000000
nsres:      50
nsres3:     50
ewres:      50
ewres3:     50
tbres:      1
rows:       20
rows3:      20
cols:       40
cols3:      40
depths:     10
cells:      800
3dcells:    8000

# create a 3d map
r3.mapcalc "map3d = depth()"

# automatically calculated value range sub-groups

GRASS 6.3.cvs > r3.stats map3d nsteps=5
  100%
   num   | minimum <= value   | value < maximum    |     volume    |   perc  | cell count
       1          1.000000000          2.800000000     4000000.000   20.00000         1600
       2          2.800000000          4.600000000     4000000.000   20.00000         1600
       3          4.600000000          6.400000000     4000000.000   20.00000         1600
       4          6.400000000          8.200000000     4000000.000   20.00000         1600
       5          8.200000000         10.000000001     4000000.000   20.00000         1600
       6                    *                    *           0.000   0.00000            0

Sum of non Null cells:
         Volume =  20000000.000
         Percentage = 100.000
         Cell count = 8000

Sum of all cells:
         Volume =  20000000.000
         Percentage = 100.000
         Cell count = 8000

# groups of equal values

                   GRASS 6.3.cvs > r3.stats -e map3d
  100%
Sort non-null values
   num   |        value       |     volume    |   perc  | cell count
       1             1.000000     2000000.000   10.00000          800
       2             2.000000     2000000.000   10.00000          800
       3             3.000000     2000000.000   10.00000          800
       4             4.000000     2000000.000   10.00000          800
       5             5.000000     2000000.000   10.00000          800
       6             6.000000     2000000.000   10.00000          800
       7             7.000000     2000000.000   10.00000          800
       8             8.000000     2000000.000   10.00000          800
       9             9.000000     2000000.000   10.00000          800
      10            10.000000     2000000.000   10.00000          800
      11                    *           0.000   0.00000            0

Number of groups with equal values: 10
Sum of non Null cells:
         Volume =  20000000.000
         Percentage = 100.000
         Cell count = 8000

Sum of all cells:
         Volume =  20000000.000
         Percentage = 100.000
         Cell count = 8000

The approach is based on binary tree search algorithm.
The memory consumption increases with the number of sub-ranges or
the number of groups of equal values.

The map does not need to be read completely in memory.
AFAICT the sub-range search algorithm has an O(nlog(n)) complexity.
I guess the equal value computation can be improved, because the
computation time increases with the number of detected equal value groups.

Sören

> 
>> [The last two would need to either use the one-pass algorithm (which
>> can result in negative variance for near-constant data due to rounding
>> error), or use two passes (computing the mean in the first pass so
>> that the actual deviations can be used in the second pass). See also:
>> the history of r.univar.]
>>
>> As I've mentioned several times before, computing quantiles (e.g. 
>> median) of large amounts of floating-point data is an open-ended
>> problem; any given approach has both pros and cons.
>>
>> -- 
>> Glynn Clements <glynn at gclements.plus.com>
> 
> Markus
> 
> _______________________________________________
> grass-dev mailing list
> grass-dev at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grass-dev
>