[GRASS-stats] Raster maps from GRASS to R and back to GRASS?
Dylan Beaudette
debeaudette at ucdavis.edu
Mon Apr 13 19:42:29 EDT 2009
On Monday 13 April 2009, Nikos Alexandris wrote:
> [
> HiYo Dylan ;-)
> ]
>
> Nikos:
> > > What I actually do is nothing but a PCA on a combination of different
> > > bands. The fact is that bands a, b, c are of equal "length" in R and
> > > have _more_ non-NULLs than bands d, e, f.
>
> Dylan:
> > I think that you are going to have to maintain a band-wise null mask in
> > R, as opposed to a single null mask for the entire set of bands.
>
> If I understand your suggestion, you mean to:
>
> # load the bands as usual (i.e. combine a, b, c, d, e, f in one
> SpatialGridDataFrame)
>
> # create NULL-"masks" (=vectors pointing to NULLs) band-wise
>
> # extract data of interest in a normal "data.frame" and do something
> with it
>
> # add new "columns" in the original SpatialGridDataFrame
>
> # fill-in the results by using the per-band NULL-masks
>
> The question that immediately pops-up is: How should I decide which
> NULL-mask to use for each PC since PCA redistributes the original data?
>
> Or is this irrelevant?
>
> :-O
> :
> > > That is, bands d, e and f are of equal length compared to each other
> > > but of different length comparing to bands a, b and c.
> > > I suspect this is the root of the problem. Any suggestion?
> > > Thanks, Nikos
> >
> > It could be. The way that sp objects and their 'bands' (columns in the
> > attached dataframe) function makes for some tricky NULL handling.
> >
> > One thing you might consider is generating a single NULL mask that
> > represents NULL values that occur in any, or all bands -- akin to an OR
> > operation.
>
> Hmmm... (thinking), ok, I am close... I think (!?)
>
> > I think that na.omit(sp_object at data) may be able to return just the
> > 'complete' data, along with a list of indices where NA values have been
> > removed.
> >
> > The approach listed on the link you posted:
> > x.vals <- which( !is.na(x at data$r1) & !is.na(x at data$r2) &
> > !is.na(x at data$r3) & !is.na(x at data$r4))
> >
> > will only generate the non-NULL maks correctly when all bands have the
> > same pattern of NULL values. Maybe I should add that hint.
>
> What happens if I just replace the & with an | (=OR) operator? Isn't it
> the same? (I'll try it out).
>
> > Hope that helps,
> > Dylan
>
> Of course it helps (me). Discussions like this one are (for me) the
> essence of the MLs :D
>
> Cheers, Nikos
I guess I wasn't clear. I should have said something like: "for input maps
that have different NULL value (spatial distributions), you will need to
generate a new NULL mask that can be used to filter out all cells that
overlap with 1 or more NULL values in your stack of images". Using
complete.cases() as Roger suggested may accomplish this.
An idea on how to apply this function:
# generate some data; sprinkle in some NA ; convert to DF
x <- rnorm(100)
x[sample(1:100, 5)] <- NA
x.mat <- round(matrix(x, ncol=5), 2)
x.df <- as.data.frame(x.mat)
# extract an NA mask
x.na_mask <- complete.cases(x.df)
# just the 'rows' with complete data (across bands)
x.df[x.na_mask,]
V1 V2 V3 V4 V5
1 1.32 -1.27 0.00 -0.87 1.52
2 -0.44 1.33 0.78 -1.27 0.00
5 2.70 -0.08 -0.77 0.38 0.32
6 -0.52 -0.07 0.13 -0.43 1.12
7 -0.63 2.02 0.45 0.48 -0.59
8 -0.52 -0.78 -0.59 -0.07 -0.08
11 -1.37 -1.15 0.23 0.73 0.07
12 -0.11 -0.66 0.50 -1.14 0.71
13 -0.38 -0.08 1.00 -0.88 -0.19
14 -1.14 -0.45 -1.37 -0.43 -2.18
15 1.92 0.46 -0.72 0.12 0.27
16 0.51 0.22 -0.39 -1.54 -1.04
17 1.14 -1.11 1.04 0.00 -1.11
18 0.53 0.62 -0.10 -0.17 0.15
19 0.84 0.27 -1.01 0.87 0.31
20 0.82 0.87 0.46 1.35 0.13
# all the data
x.df
V1 V2 V3 V4 V5
1 1.32 -1.27 0.00 -0.87 1.52
2 -0.44 1.33 0.78 -1.27 0.00
3 -0.31 -0.13 -1.14 2.08 NA
4 0.76 NA -1.20 -0.54 -0.85
5 2.70 -0.08 -0.77 0.38 0.32
6 -0.52 -0.07 0.13 -0.43 1.12
7 -0.63 2.02 0.45 0.48 -0.59
8 -0.52 -0.78 -0.59 -0.07 -0.08
9 -0.38 -0.51 0.44 NA -0.81
10 -1.38 NA -0.65 -0.50 NA
11 -1.37 -1.15 0.23 0.73 0.07
12 -0.11 -0.66 0.50 -1.14 0.71
13 -0.38 -0.08 1.00 -0.88 -0.19
14 -1.14 -0.45 -1.37 -0.43 -2.18
15 1.92 0.46 -0.72 0.12 0.27
16 0.51 0.22 -0.39 -1.54 -1.04
17 1.14 -1.11 1.04 0.00 -1.11
18 0.53 0.62 -0.10 -0.17 0.15
19 0.84 0.27 -1.01 0.87 0.31
20 0.82 0.87 0.46 1.35 0.13
# generate something cool from complete data
library(MASS)
p <- princomp(~ . , data=x.df[x.na_mask,])
# assign back to original data
x.df$pca_1[x.na_mask] <- predict(p)[,1]
This would of course be done the the @data slot on an sp object. Thanks to
Roger for this slick approach.
How about that!?
Cheers,
Dylan
--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341
More information about the grass-stats
mailing list