[STATSGRASS] raster cross-stats

Roger Bivand Roger.Bivand at nhh.no
Thu Dec 13 07:31:30 EST 2001


Hi Agus!

On 13 Dec 2001, Agustin Lobo wrote:

> One suggestion:
> 
> r.statistics should have the option of having a table as output
> (similar to r.surface), where the categories would be the rows
> and the different statistics (men, ver, median, etc) would be
> the columns. This would be an alternative, more compact
> format than the current rasters. Also, this table would be
> easy to be read from R for ulterior analysis.

Great to see you "here" too! I'm enclosing below bits of correspondence
between Markus, Helena and me of about two weeks ago, together with a
draft shell script to use R functions for GRASS raster layers. My current
feeling is that with R being easy to install and maintain, shell scripts
using R (here for the leics dataset):

> library(GRASS)
> G <- gmeta()
> work <- rast.get(G, c("landcov", "topo"), c(T,F))
> res <- by(work$topo, work$landcov, summary)
INDICES: Background
NULL
------------------------------------------------------------ 
INDICES: Industry
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  40.00   42.00   59.00   83.61  120.00  220.00 
------------------------------------------------------------ 
INDICES: Residential
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   40.0    53.0    60.0    76.3    80.0   218.0 
------------------------------------------------------------ 
INDICES: Quarry
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   40.0    59.0   161.0   135.3   177.0   198.0 
------------------------------------------------------------ 
INDICES: Woodland
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   40.0   118.0   156.0   142.9   174.0   248.0 
------------------------------------------------------------ 
INDICES: Arable
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     40      63      81     104     153     270 
------------------------------------------------------------ 
INDICES: Pasture
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   40.0    75.0   145.0   133.5   180.0   278.0 
------------------------------------------------------------ 
INDICES: Scrub
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   40.0   100.0   161.0   148.2   196.0   270.0 
------------------------------------------------------------ 
INDICES: Water
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  47.00   65.00   99.00   99.27  131.00  221.00 
> matrix(unlist(res), ncol=6)
       [,1]  [,2]  [,3] [,4]  [,5]   [,6]
[1,]  40.00  60.0 177.0   40 145.0 196.00
[2,]  42.00  76.3 198.0   63 133.5 270.00
[3,]  59.00  80.0  40.0   81 180.0  47.00
[4,]  83.61 218.0 118.0  104 278.0  65.00
[5,] 120.00  40.0 156.0  153  40.0  99.00
[6,] 220.00  59.0 142.9  270 100.0  99.27
[7,]  40.00 161.0 174.0   40 161.0 131.00
[8,]  53.00 135.3 248.0   75 148.2 221.00

where summary could be a function that is written on the fly before
submission to R.

I'll be very pleased to hear your thinking - your input to other things -
like the R pixmap package was helpful!

Roger

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.

---------- Forwarded message ----------
Date: Thu, 29 Nov 2001 12:20:01 +0100 (CET)
From: Roger Bivand <Roger.Bivand at nhh.no>
To: neteler at itc.it
Cc: hmitaso at unity.ncsu.edu
Subject: Re: s.sv, s.normal, s.probplt and R

Markus, Helena:




Further to my message of Monday (below), here is what I sent Frank
Warnerdam in September when he had asked "Also, how do I get information
on the min, max, and mean of a raster?" - this is what I was thinking of
by suggesting that even using R through shell scripts writing R code on
the fly may be fast enough for some things that don't need R to be seen.
The s.kcv code is an R function, but it wouldn't be difficult to put it in
a script in a similar way. The same could be done for plenty of others.

> G <- gmeta()
> Zinc <- sites.get(G, "ex.Zn.in")
>  names(Zinc)
[1] "east"  "north" "var1"  "var2" 
> system("s.univar ex.Zn.in")
Reading sites list ...                   100%
Calculating statistics (decfield: 1) ... 100%
        number of points 98
                    mean 481.031
      standard deviation 398.808
coefficient of variation 82.9069
                skewness 1.44865
                kurtosis 1.67661
         mean of squares 388815
 mean of absolute values 481.031
                 minimum 113
          first quartile 186
                  median 307.5
          third quartile 685
                 maximum 1839
> skewness(Zinc$var2)
[1] 1.426529
> kurtosis(Zinc$var2)
[1] 1.581654
> summary(Zinc$var2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  113.0   186.3   307.5   481.0   698.5  1839.0 
> mean(Zinc$var2)
[1] 481.0306
> sd(Zinc$var2)
[1] 398.8077

While mean and standard deviation agree between R and s.univar, as do min,
max, median, skewness and kurtosis don't (quartiles depend on assumptions
for ties, so aren't so difficult)

> shapiro.test(Zinc$var2)

        Shapiro-Wilk normality test

data:  Zinc$var2 
W = 0.8186, p-value = 1.259e-09 

is a bit easier to read than s.normal! qqnorm(Zinc$var2) or
ecdf(Zinc$var2) do nice plots too!

As far as s.sv goes, several of the geostats packages in R do the same
too.

> system("s.sv sites=ex.Zn.in lag=50")
Reading sites list ...                   100%
98 sites found
Computing sample semivariogram ...   100%
50 67827.5 22
100 67968.9 94
150 69124.6 200
200 97473.3 196
250 78232.2 220
300 124723 256
350 102276 248
400 166055 242
450 140789 284
500 137712 290

> library(fields)
> try2 <- vgram(cbind(Zinc$east, Zinc$north), Zinc$var2, dmax=1000,
breaks=seq(50,1000,50))
> try2$breaks[1:10]
 [1]  50 100 150 200 250 300 350 400 450 500
> try2$stats[1:3,1:10]
                1         2        3         4        5         6        7
N           31.00     78.00     98.0    104.00    117.0    119.00    130.0
mean     52392.50  61888.99 101409.0  71277.74 127455.7  85657.06 149756.9
Std.Dev. 91029.62 129359.38 168443.0 153194.52 204767.0 165459.43 280845.6
                8        9       10
N           137.0    132.0    149.0
mean     122474.6 165412.9 163237.1
Std.Dev. 225055.5 301939.9 279704.7

> plot(try2$d, try2$vgram)
> lines(try2$centers, try2$stats["mean",], col=4)

just from Doug Nychka's fields package, which includes robust variograms
too, and variograms from rasters.

Maybe this is not really necessary at all, but working up these kinds of
analytical things is just so much easier in R - and more people use them
so that the chance of bugs not being found is much less - that at least
for me it is what I choose.

Yours,

Roger

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.

Message of Monday 26/11:

I'm in Gdansk now, so can't try things until later in the week. My feeling 
is that the key question here is how many GRASS users who use these kinds 
of functionalities would benefit in the longer run from moving the
analysis 
bits to the R engine, over the interface at the moment. I've already 
written a couple of scripts to do some of this from the GRASS prompt. My 
preference would be to use R functionality where appropriate, because it
is 
used by more people and more likely for that reason to be cleaner. I 
remember an exchange some months ago when Frank Warnerdam wanted a better 
r.average, and that wasn't a problem to do as a script writing a short R 
script to collect the raster or whatever, compute the result, and if 
necessary push it back across the interface. But then R has to be 
installed, as well as the interface library.

An alternative is to use the interface to help prototype things, then code 
them in C on the R side. Certainly what I saw of one s.* program (to
choose 
a training and a test set from sites) made me distinctly unhappy - it was 
doing things that not only seemed unnecessary, and doing them rather 
clumsily.


in reply to:

as you know we are working on the GRASS book. Helena is
revising the sites chapter and asked about a few
modules. As you may have tested them already, I would
like to know if you have hints for us:

On Sun, Nov 25, 2001 at 08:41:55AM -0600, Helena wrote:
> Markus,
> 
> I am working on chapter7 (sites) and I would like to ask whether you
have
> any experience with s.sv, s.normal, s.probplt, s.medp. At least s.sv and
> s.normal appear to be working but one needs g.gnuplot to get the
graphics.
> Also I do not know whether these tools are all designed properly. Did
you
> ever try to compare the results with those from R-stats?
> Has Roger Bivand tried it - does he think that they are useful?

Today I have exported the zinc data from Maas and put into
s.sv. Well, it's looking a bit different.
But I am not sure if both implementations are comparable. Did
you ever try above mentioned GRASS modules?

BTW: All only require gnuplot, not g.gnuplot (we had implemented the
environment variable GRASS_GNUPLOT which is per default "gnuplot").
If this doesn't work, you GRASS is too old... The change was done several
weeks ago.

The question for us is if to mention the modules or not (and refer to
R, I just added QQ plots to the related section).
---------- Forwarded message ----------
Date: Thu, 6 Sep 2001 10:50:17 +0200 (CEST)
From: Roger Bivand <Roger.Bivand at nhh.no>
To: Frank Warmerdam <warmerdam at pobox.com>
Subject: Re: [GRASS5] r.in.gdal and GCPs

Frank:

Here is a rough draft of a shell script to do what you want (I think):
(there is a time downside to loading R first, but the time taken to read
the raster layer should be comparable, since both the interface and say
r.stats are using the same functions in libgis.a).

-------------------
#!/bin/sh
# r.statistics.R: shell script to call simple summary functions from R
# (needs installed R in path, and installed R GRASS package; for
# skewness and kurtosis also e1071 package)
#
# usage: r.statistics.R layer method
# where layer is a raster map layer and method is one of:
# summary fivenum mean max min median length sum var sd kurtosis skewness
#

layer=$1

r.info "$layer" 2>&1 > /dev/null
if [ "$?" == 1 ] ; then
echo "$layer: no such raster map layer"
exit 1
fi

method=$2
found=`grep "$method" << EOF
summary fivenum mean max min median length sum var sd kurtosis skewness
EOF`

if [ "$?" == 1 ] ; then
echo "$method: no such method"
exit 1
fi

e1071="FALSE"

if [ "$method" == "kurtosis" ] ; then e1071="TRUE" ; fi
if [ "$method" == "skewness" ] ; then e1071="TRUE" ; fi

if [ "$e1071" == "TRUE" ] ; then

R --vanilla --quiet --slave << EOF
library(e1071)
library(GRASS)
print($method(rast.get(gmeta(), "$layer")[[1]]))
EOF

else

R --vanilla --quiet --slave << EOF
library(GRASS)
print($method(rast.get(gmeta(), "$layer")[[1]]))
EOF

fi

------------

Roger

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.













More information about the grass-stats mailing list