[GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Edzer Pebesma edzer.pebesma at uni-muenster.de
Wed Apr 1 13:26:10 EDT 2009


Markus Metz wrote:
>
> Edzer Pebesma wrote:
>> Markus, a few notes:
>>
>> - if you do PCA on uncentered data, by computing the eigenvalues of the
>> uncentered covariance matrix, this implies that bands with a larger mean
>> will get more influence on the final PCAs. I have sofar not managed
>> finding an argument why this would be desirable.
>>   
> Add it to wiki? E.g. bands entered in a PCA should have the same mean,
> but normalization is also an option.
>> - if you do PCA on (band-mean)/sd(band), it means that you first
>> normalize (scale) 
> I think scale and normalize are two different things.
I believe that in statistics these two words don't have a generally
accepted definition. They're useful as long as you explain what you mean
by them.
>> each variable to mean zero and unit variance. This
>> procedure is identical to doing PCA on the correlation matrix. It means
>> that, unlike for unscaled variables, variables with larger variance will
>> not get more influence on the PCA than others. For image analysis I can
>> see a place for both; if bands with low variance indicate insignificant
>> and perhaps noisy information, you may downweight them. 
> Variance is dependent on range, I would rather use something like
> coefficient of variation (stddev/mean) to get some scale-independent
> indicator on the amount of information in a given band. A downscaled
> band (e.g. MODIS scale of 0.0001) has still the same information but
> lower variance.
This may make sense for some (or many) cases, but not all.
>> - Only in case of normalized variables, or equivalently PCA on
>> correlations, it makes sense to select PC's with an eigenvalue larger
>> than 1. The reasoning is fairly weak, but goes like this: if a PC has
>> eigenvalue > 1, it explains more variance than any of the original
>> variables, which all have variance 1.
>>   
> Sounds good to me, why should I use a component that explains less
> than any of the original bands? And the whole purpose of a PCA is
> variable reduction to get a new set of variables, each explaining the
> whole dataset better than one of the original variables/bands. A PCA
> produces as many components as input variables, so some selection is
> usually necessary for further processing, could also be % explained
> variance. OTOH, sometimes only the first component is of interest.
> There may be exceptions for imagery processing, e.g. haze reduction
> (would have to read up on imagery processing too to say anything more
> about where components with eigenvalue < 1 could be useful).
>
Well, PCA only captures covariance or correlation, meaning linear
relationships, and it may be the case that the most interesting features
are non-linear. For instance, NDVI is the ratio of a sum over a
difference (or reversed?), which cannot be expressed as a linear
combination of bands. The first PCA(s?) usually express brightness, only
later ones give more interesting features resulting from more complex
interactions of bands (notably differences) -- loadings usually have the
same sign for the first PC, and mixed signs for later PC's. John C.
Davis in "statistics and data analysis for geologists" called this the
"size and shape effect". The most interesting PC's may have a EV smaller
than 1, when they come from correlation matrices. Geochemists don't shy
away from interpreting 7 or more factors.

Other interesting "features" of PCA are that it (a) ignores for two
bands whether they are adjacent in wavelength or not, and (b) ignores
for two pixels whether they are adjacent in space or not. At least for
"solving" problem (b), there's the max/min autocorrelation factors (by
Switzer et al) or MNF (max. noise fraction, by Green et al) -- are these
things available in grass, or in R?

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/
http://www.springer.com/978-0-387-78170-9 e.pebesma at wwu.de



More information about the grass-stats mailing list