[GRASS-stats] Raster maps from GRASS to R and back to GRASS?

Mon Apr 13 19:42:29 EDT 2009

On Monday 13 April 2009, Nikos Alexandris wrote:
> [
> HiYo Dylan ;-)
> ]
>
> Nikos:
> > > What I actually do is nothing but a PCA on a combination of different
> > > bands. The fact is that bands a, b, c are of equal "length" in R and
> > > have _more_ non-NULLs than bands d, e, f.
>
> Dylan:
> > I think that you are going to have to maintain a band-wise null mask in
> > R, as opposed to a single null mask for the entire set of bands.
>
> If I understand your suggestion, you mean to:
>
> # load the bands as usual (i.e. combine a, b, c, d, e, f in one
> SpatialGridDataFrame)
>
> # create NULL-"masks" (=vectors pointing to NULLs) band-wise
>
> # extract data of interest in a normal "data.frame" and do something
> with it
>
> # add new "columns" in the original SpatialGridDataFrame
>
> # fill-in the results by using the per-band NULL-masks
>
> The question that immediately pops-up is: How should I decide which
> NULL-mask to use for each PC since PCA redistributes the original data?
>
> Or is this irrelevant?
>
> :-O
> :
> > > That is, bands d, e and f are of equal length compared to each other
> > > but of different length comparing to bands a, b and c.
> > > I suspect this is the root of the problem. Any suggestion?
> > > Thanks, Nikos
> >
> > It could be. The way that sp objects and their 'bands' (columns in the
> > attached dataframe) function makes for some tricky NULL handling.
> >
> > One thing you might consider is generating a single NULL mask that
> > represents NULL values that occur in any, or all bands -- akin to an OR
> > operation.
>
> Hmmm... (thinking), ok, I am close... I think (!?)
>
> > I think that na.omit(sp_object at data) may be able to return just the
> > 'complete' data, along with a list of indices where NA values have been
> > removed.
> >
> > The approach listed on the link you posted:
> > x.vals <- which( !is.na(x at data$r1) & !is.na(x at data$r2) &
> > !is.na(x at data$r3) & !is.na(x at data$r4))
> >
> > will only generate the non-NULL maks correctly when all bands have the
> > same pattern of NULL values. Maybe I should add that hint.
>
> What happens if I just replace the & with an | (=OR) operator? Isn't it
> the same? (I'll try it out).
>
> > Hope that helps,
> > Dylan
>
> Of course it helps (me). Discussions like this one are (for me) the
> essence of the MLs :D
>
> Cheers, Nikos

I guess I wasn't clear. I should have said something like: "for input maps 
that have different NULL value (spatial distributions), you will need to 
generate a new NULL mask that can be used to filter out all cells that 
overlap with 1 or more NULL values in your stack of images". Using 
complete.cases() as Roger suggested may accomplish this. 

An idea on how to apply this function:

# generate some data; sprinkle in some NA ; convert to DF
x <- rnorm(100)
x[sample(1:100, 5)] <- NA
x.mat <- round(matrix(x, ncol=5), 2)
x.df <- as.data.frame(x.mat)

# extract an NA mask
x.na_mask <- complete.cases(x.df)

# just the 'rows' with complete data (across bands)
 x.df[x.na_mask,]

      V1    V2    V3    V4    V5
1   1.32 -1.27  0.00 -0.87  1.52
2  -0.44  1.33  0.78 -1.27  0.00
5   2.70 -0.08 -0.77  0.38  0.32
6  -0.52 -0.07  0.13 -0.43  1.12
7  -0.63  2.02  0.45  0.48 -0.59
8  -0.52 -0.78 -0.59 -0.07 -0.08
11 -1.37 -1.15  0.23  0.73  0.07
12 -0.11 -0.66  0.50 -1.14  0.71
13 -0.38 -0.08  1.00 -0.88 -0.19
14 -1.14 -0.45 -1.37 -0.43 -2.18
15  1.92  0.46 -0.72  0.12  0.27
16  0.51  0.22 -0.39 -1.54 -1.04
17  1.14 -1.11  1.04  0.00 -1.11
18  0.53  0.62 -0.10 -0.17  0.15
19  0.84  0.27 -1.01  0.87  0.31
20  0.82  0.87  0.46  1.35  0.13

# all the data
x.df
      V1    V2    V3    V4    V5
1   1.32 -1.27  0.00 -0.87  1.52
2  -0.44  1.33  0.78 -1.27  0.00
3  -0.31 -0.13 -1.14  2.08    NA
4   0.76    NA -1.20 -0.54 -0.85
5   2.70 -0.08 -0.77  0.38  0.32
6  -0.52 -0.07  0.13 -0.43  1.12
7  -0.63  2.02  0.45  0.48 -0.59
8  -0.52 -0.78 -0.59 -0.07 -0.08
9  -0.38 -0.51  0.44    NA -0.81
10 -1.38    NA -0.65 -0.50    NA
11 -1.37 -1.15  0.23  0.73  0.07
12 -0.11 -0.66  0.50 -1.14  0.71
13 -0.38 -0.08  1.00 -0.88 -0.19
14 -1.14 -0.45 -1.37 -0.43 -2.18
15  1.92  0.46 -0.72  0.12  0.27
16  0.51  0.22 -0.39 -1.54 -1.04
17  1.14 -1.11  1.04  0.00 -1.11
18  0.53  0.62 -0.10 -0.17  0.15
19  0.84  0.27 -1.01  0.87  0.31
20  0.82  0.87  0.46  1.35  0.13

# generate something cool from complete data
library(MASS)
p <- princomp(~ . , data=x.df[x.na_mask,])

# assign back to original data
x.df$pca_1[x.na_mask] <-  predict(p)[,1]

This would of course be done the the @data slot on an sp object. Thanks to 
Roger for this slick approach.

How about that!?

Cheers,
Dylan

-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341