[GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(),
m.eigensystem ~ princomp()
Markus Metz
markus.metz.giswork at googlemail.com
Wed Apr 1 12:57:05 EDT 2009
Edzer Pebesma wrote:
> Markus, a few notes:
>
> - if you do PCA on uncentered data, by computing the eigenvalues of the
> uncentered covariance matrix, this implies that bands with a larger mean
> will get more influence on the final PCAs. I have sofar not managed
> finding an argument why this would be desirable.
>
Add it to wiki? E.g. bands entered in a PCA should have the same mean,
but normalization is also an option.
> - if you do PCA on (band-mean)/sd(band), it means that you first
> normalize (scale)
I think scale and normalize are two different things.
> each variable to mean zero and unit variance. This
> procedure is identical to doing PCA on the correlation matrix. It means
> that, unlike for unscaled variables, variables with larger variance will
> not get more influence on the PCA than others. For image analysis I can
> see a place for both; if bands with low variance indicate insignificant
> and perhaps noisy information, you may downweight them.
Variance is dependent on range, I would rather use something like
coefficient of variation (stddev/mean) to get some scale-independent
indicator on the amount of information in a given band. A downscaled
band (e.g. MODIS scale of 0.0001) has still the same information but
lower variance.
> - Only in case of normalized variables, or equivalently PCA on
> correlations, it makes sense to select PC's with an eigenvalue larger
> than 1. The reasoning is fairly weak, but goes like this: if a PC has
> eigenvalue > 1, it explains more variance than any of the original
> variables, which all have variance 1.
>
Sounds good to me, why should I use a component that explains less than
any of the original bands? And the whole purpose of a PCA is variable
reduction to get a new set of variables, each explaining the whole
dataset better than one of the original variables/bands. A PCA produces
as many components as input variables, so some selection is usually
necessary for further processing, could also be % explained variance.
OTOH, sometimes only the first component is of interest. There may be
exceptions for imagery processing, e.g. haze reduction (would have to
read up on imagery processing too to say anything more about where
components with eigenvalue < 1 could be useful).
More information about the grass-stats
mailing list