[GRASS-stats] Example code for MRPP on a multivariate data set
Nikos Alexandris
nik at nikosalexandris.net
Sat Jan 8 06:48:50 EST 2011
Dear all,
after many trial-n-error attempts, re-reading basic R-stuff (like "?load"),
I've finally ended up with several single scripts to run, each time, on one
data.frame. I don't have the time currently to make this more automatic.
So, for ignorant and beginners like me, here is an example code on how to
perform MRPP
- on a single data.frame composed by 6 variables (in this example Landsat5
bands)
- with several observations (in this example 18865 obs. = sampled pixels)
- grouped according to a factor column (in this example 7 major land cover
classes)
Of course, the data.frame needs to be prepared (lot of fun with this :-)). I
have 10 data.frames and a script for each data.frame. They are all available
at any time for anybody interested (at a personal ftp-space).
If asked, I can also provide the vector/raster file of the samples (extracted
from a postfire Landsat5 TM 2007 imagery in a small region northern of Athens
in Greece).
The test could be useful to descrive the within- and between-groups
heterogeneity. And why not to check how the statistics change after some
transformations of the original data?
First, you need to have the "vegan" package installed. So, for (K)Ubuntu
user's this will be:
--%<---
# superuser
sudo R
# get the package
install.packges ( "vegan" , dependencies = TRUE )
# quit and run a normal R-session
q()
R #or pick your favourite gui
--->%--
Then move to the test:
(ATTENTION: this 18865 observations is an overkill for home-machines)
--%<---
# load data.frame
load ( "samples_postfire_lsat5.RData" )
# check structure
str ( samples_postfire_lsat5 )
'data.frame': 18865 obs. of 7 variables:
$ Band 1: int 199 119 175 170 110 109 102 129 138 122 ...
$ Band 2: int 120 63 104 88 57 60 53 68 75 66 ...
$ Band 3: int 132 77 118 109 70 78 68 82 89 76 ...
$ Band 4: int 109 76 96 90 71 79 72 70 70 62 ...
$ Band 5: int 151 126 140 142 118 123 127 125 120 120 ...
$ Band 7: int 90 75 91 90 72 77 76 89 86 92 ...
$ Class : Factor w/ 7 levels "Urban","Mineral extr. sites",..: 1 1 1 1 1 1 1
1 1 1 ...
# mrpp() function is in:
library ( vegan )
# mrpp on 6 variables, grouped according to column named "Class"
samples_postfire_lsat5.mrpp <- mrpp (
samples_postfire_lsat5[,1:6] ,
grouping = samples_postfire_lsat5[["Class"]]
)
# see the result!
#samples_postfire_lsat5.mrpp
# save it
save (samples_postfire_lsat5.mrpp , file = "samples_postfire_lsat5.mrpp.RData"
)
# euclidean (veg)distance - ATTENTION: default distance to estimate is "Bray"
samples_postfire_lsat5.vegdist <- vegdist (
samples_postfire_lsat5[,1:6] ,
method="euclidean"
)
# meandist
samples_postfire_lsat5.md <- meandist (
samples_postfire_lsat5.vegdist ,
samples_postfire_lsat5[["Class"]]
)
# check
summary ( samples_postfire_lsat5.md )
# save results
save ( samples_postfire_lsat5.md ,
file = "samples_bitemporal_lsat5.md.RData"
)
--->%--
To have some results for my work, I also run the procedures on smaller subsets
(say 3000 observations instead of 18865) which takes some time but is feasible
@home.
Currently there is a process running on a big cluster (thanks to a very kind
person who's always there). Hopefully we'll know soon enough how much time
this will take.
Maybe this piece of info could go in the wiki at some point if it is of any
value.
Kind regards, Nikos
More information about the grass-stats
mailing list