univariate stat module

Tue Apr 13 13:41:41 EDT 1993

Dan Riker (riker at hydro1.geo.duke.edu) recently expressed
the need for a general univariate statistics module
for GRASS (Re: Hydrologic Modeling and GRASS). 

What should the design criteria be? 

 o How should such a module work? Should it just accept
   column(s) of data (e.g. output of r.stats, output of
   s.out.ascii) or should it access data directly?

 o What statistics should be calculated? (mean, std dev,
   variance, skewness, everything that the SAS "PROC UNIVAR"
   does?)

 o What output format would work best? 

I think that if grassu people define what is needed, it would
take only a small effort for a grassp person to do this.
Afterall, there are several sources of free code to do these
calculations, some already packaged up as UNIX commands.
Working from one of these, all that is needed is a GRASS
wrapper (parser).

BTW, I vaguely remember a shell script to do univariate 
statistics. I'm not sure who wrote it, but it is appended
for your enjoyment.

--Darrell

(I had this saved as r.univar)

#!/bin/sh
while test $# != 0
do
	case "$1" in
	-z) z=z;shift;;
	-v) v=v;shift;;
	-zv|-vz) z=z;v=v;shift;;
	-|-*) oops=yes;break;;
	*) break;;
	esac
done
if test $# != 1 -o "$oops" = yes
then
	echo "Usage: `basename $0` [-vz] cellfile" >&2
	exit 1
fi
r.stats -c$z$v "$1" | awk '
   BEGIN{sum=0.0;sum2=0.0}
   NR==1{min=$1; max=$1}
	{sum += $1 * $2; sum2 += $1 * $1 * $2; N += $2}
	{if($1 > max) max = $1; if ($1 < min) min = $1}
	{if($2 > modecount) {mode=$1; modecount=$2}}
     END{
         print "min      ", min
	 print "max      ", max
	 print "mean     ", sum/N
	 print "mode     ", mode
	 print "variance ", (sum2 - sum*sum/N)/N
	 print "deviation", sqrt((sum2 - sum*sum/N)/N)
        }'