# univariate stat module

Darrell McCauley mccauley at ecn.purdue.edu
Tue Apr 13 13:41:41 EDT 1993

```Dan Riker (riker at hydro1.geo.duke.edu) recently expressed
the need for a general univariate statistics module
for GRASS (Re: Hydrologic Modeling and GRASS).

What should the design criteria be?

o How should such a module work? Should it just accept
column(s) of data (e.g. output of r.stats, output of
s.out.ascii) or should it access data directly?

o What statistics should be calculated? (mean, std dev,
variance, skewness, everything that the SAS "PROC UNIVAR"
does?)

o What output format would work best?

I think that if grassu people define what is needed, it would
take only a small effort for a grassp person to do this.
Afterall, there are several sources of free code to do these
calculations, some already packaged up as UNIX commands.
Working from one of these, all that is needed is a GRASS
wrapper (parser).

BTW, I vaguely remember a shell script to do univariate
statistics. I'm not sure who wrote it, but it is appended

--Darrell

(I had this saved as r.univar)

#!/bin/sh
while test \$# != 0
do
case "\$1" in
-z) z=z;shift;;
-v) v=v;shift;;
-zv|-vz) z=z;v=v;shift;;
-|-*) oops=yes;break;;
*) break;;
esac
done
if test \$# != 1 -o "\$oops" = yes
then
echo "Usage: `basename \$0` [-vz] cellfile" >&2
exit 1
fi
r.stats -c\$z\$v "\$1" | awk '
BEGIN{sum=0.0;sum2=0.0}
NR==1{min=\$1; max=\$1}
{sum += \$1 * \$2; sum2 += \$1 * \$1 * \$2; N += \$2}
{if(\$1 > max) max = \$1; if (\$1 < min) min = \$1}
{if(\$2 > modecount) {mode=\$1; modecount=\$2}}
END{
print "min      ", min
print "max      ", max
print "mean     ", sum/N
print "mode     ", mode
print "variance ", (sum2 - sum*sum/N)/N
print "deviation", sqrt((sum2 - sum*sum/N)/N)
}'

```