[GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Edzer Pebesma edzer.pebesma at uni-muenster.de
Wed Apr 1 12:21:24 EDT 2009


Markus, a few notes:

- if you do PCA on uncentered data, by computing the eigenvalues of the
uncentered covariance matrix, this implies that bands with a larger mean
will get more influence on the final PCAs. I have sofar not managed
finding an argument why this would be desirable.
- if you do PCA on (band-mean)/sd(band), it means that you first
normalize (scale) each variable to mean zero and unit variance. This
procedure is identical to doing PCA on the correlation matrix. It means
that, unlike for unscaled variables, variables with larger variance will
not get more influence on the PCA than others. For image analysis I can
see a place for both; if bands with low variance indicate insignificant
and perhaps noisy information, you may downweight them. Or not, if they
contain (equally) important information. Scaling becomes urgent when you
compute PCAs from a mix of things with uncomparable units, such as image
bands and DTMs.
- Only in case of normalized variables, or equivalently PCA on
correlations, it makes sense to select PC's with an eigenvalue larger
than 1. The reasoning is fairly weak, but goes like this: if a PC has
eigenvalue > 1, it explains more variance than any of the original
variables, which all have variance 1.

Maybe I should Cc: this to the wiki.
--
Edzer

Markus Metz wrote:
>
> Edzer Pebesma wrote:
>> Markus Metz wrote:
>>  
>>>> I'm more familiar with non-spatial PCA, so it's high time I read the
>>>> manual of i.pca, and the new wiki page on it...
>>>>           
>> I think there's no such thing as spatial or non-spatial PCA. There's
>> just PCA.
>>   
> That was a feeble attempt to buy time to go through some statistics
> literature ;-)
>
> So it seems that this thread is about the different values for
> eigenvalues. AFAIKT, the answer is in the very first post of this
> thread [1]. It seems that i.pca output is supposed to be identical to
> prcomp(center=FALSE, scale=FALSE) output in R, because a PCA is
> scale-sensitive and the eigenvalue as reported by i.pca is the
> variance of the raw, unstandardised data. If outputs are not
> identical, either R or grass do some hidden modification or there is a
> bug in either grass or R (all within limits, e.g. identical up to the
> 5th digit in scientific format is fine?).
>
> Some textbooks give a rule of thumb for further analysis to use only
> components with an eigenvalue >=1 which obviously only works if the
> eigenvalue is calculated from standardised values (center=TRUE,
> scale=TRUE or e.g. r.mapcalc standardised_map = (map - mean) /
> stddev). E.g., comparing the results of MODIS raw and MODIS scaled
> with 0.0001 should give <eigenvalue #x of MODIS scaled> = 1E-8 *
> <eigenvalue #x of MODIS raw>.
>
> BTW, the rescaling method of i.pca is not very convincing, as pointed
> out by Augustin Lobo. IMHO, fool-proof would be normalization (x -
> mean) / stddev.
>
> [1] http://lists.osgeo.org/pipermail/grass-user/2009-March/049306.html

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/
http://www.springer.com/978-0-387-78170-9 e.pebesma at wwu.de



More information about the grass-stats mailing list