[GRASS-stats] Re: [GRASS-user] Calculating eigen values and % varianceexplainedafter PCA analysis

Nikos Alexandris nikos.alexandris at felis.uni-freiburg.de
Sun Mar 1 08:33:59 EST 2009


Nikos:
> > It should be the case with i.pca as well since eigen_VALUES_ (=represent
> > the variances of the original dimensions that are "kept" in each
> > component) are important for the interpretation of what exactly are each
> > of the components. But, i.pca just does not report the eigen_VALUES_.
> > 
> > At some point some C-expert needs to have a look in the code (i.pca) and
> > correct the "bug" which does not let the eigen_VALUES_ from being
> > printed.

Hamish:
> done in devbr6 (6.5svn) please test, I'm not a multivariate stats guru
> and may have done something dumb so didn't port to other branches yet.
> 
> I changed the i.pca output to be like:
> 
> Eigen (vectors) and values:
> PC1 ( -0.63 -0.65 -0.43 ) 88.07
> PC2 ( 0.23 0.37 -0.90 ) 11.48
> PC3 ( 0.75 -0.66 -0.08 ) 0.45
> 
> As it was previously sent to stderr via G_message() I don't feel bad about
> breaking output text compatibility. I wanted to add "%" to the values but
> due to the sprintf()+strcat() method in the code that was a pain, so I
> didn't.


Wesley:
> > >  If this is the case then both methods still differ significantly. Is
> > > this possible, and which should I use.

> > Please have a look at my comments/questions in link [2].
> > i.pca follows the "SVD" method. You performed the non-standartised PCA
> > using the covariance matrix. Note that you can use also the
> > standartised method by using the correlation matrix.

> does the r.mapcalc command at the end of the m.eigensystem help page*
> do that conversion, or ...? ie how can we test these against each other?
> how to do the standardized method?

We can easily cross-compare now. I will run my test in GRASS and R.
In GRASS:
 * i.pca --> SVD without data centering method

 * "r.covar" + "m.eigensystem" --> non-standardised PCA (about data
centering I am unsure here, I will investigate the _numbers_)

 * "r.covar -r" + "m.eigensystem" # note the "-r" flag --> standardised
PCA (about data centering I am unsure, willing to test).


> [*] (is "\-" there a typo or some old mapcalc syntax?)

IMHO: yes.  Maybe it's some forgotten _backslash_ !?


> also ISTR somebody (Dylan?) doing a comparison with the R-stats interface.
> 
> 
> It would be nice to run tests using the Spearfish imagery dataset. After
> my own tests I noticed it matched what was used in the m.eigensystem help
> page.

I will probably run the test with my own data. Since I just need to
copy-paste the commands. I don't have the spearfish dataset currently.


> my results follow.
> 
> Hamish
> 
> 
> ----------------------------
> #Spearfish imagery sample dataset
> g.region rast=spot.ms.1
> 
> 
> # 'by-hand-method'
> G65> echo "3" > test_m.eigensystem    # number of input maps
> G65> r.covar map=spot.ms.1,spot.ms.2,spot.ms.3 >> test_m.eigensystem
> 
> G65> cat test_m.eigensystem
> 3
> 462.876649 480.411218 281.758307 
> 480.411218 513.015646 278.914813 
> 281.758307 278.914813 336.326645 
> 
> 
> G65> m.eigensystem < test_m.eigensystem
> -----
> C The output is N sets of values. One E line and N V W lines
> C
> C  E   real  imaginary   percent-importance
> C  V   real  imaginary
> C  N   real  imaginary
> C  W   real  imaginary
> C      ...
> C
> C where E is the eigen value (and it relative importance)
> C and   V are the eigenvector for this eigenvalue.
> C       N are the normalized eigenvector for this eigenvalue.
> C       W are the N vector multiplied by the square root of the
> C         magnitude of the eigen value (E). 
> -----
> 
> E      1159.7452017844         0.0000000000    88.38
> V         0.6910021591         0.0000000000
> V         0.7205280412         0.0000000000
> V         0.4805108400         0.0000000000
> N         0.6236808478         0.0000000000
> N         0.6503301526         0.0000000000
> N         0.4336967751         0.0000000000
> W        21.2394712045         0.0000000000
> W        22.1470141296         0.0000000000
> W        14.7695575384         0.0000000000
> 
> E         5.9705414972         0.0000000000     0.45
> V         0.7119385973         0.0000000000
> V        -0.6358200627         0.0000000000
> V        -0.0703936743         0.0000000000
> N         0.7438340890         0.0000000000
> N        -0.6643053754         0.0000000000
> N        -0.0735473745         0.0000000000
> W         1.8175356507         0.0000000000
> W        -1.6232096923         0.0000000000
> W        -0.1797107407         0.0000000000
> 
> E       146.5031967184         0.0000000000    11.16
> V         0.2265837636         0.0000000000
> V         0.3474697082         0.0000000000
> V        -0.8468727535         0.0000000000
> N         0.2402770238         0.0000000000
> N         0.3684685345         0.0000000000
> N        -0.8980522763         0.0000000000
> W         2.9082771721         0.0000000000
> W         4.4598880523         0.0000000000
> W       -10.8698904856         0.0000000000
> 
> 
> # 'all-in-one method' using r.covar+m.eigensystem:
> G65> (echo 3; r.covar spot.ms.1,spot.ms.2,spot.ms.3 ) | m.eigensystem
> 
> Then, using the W vector, new maps are created: 
> r.mapcalc 'pc.1 = 21.2395*map.1 + 22.1470*map.2 + 14.7696*map.3'
> r.mapcalc 'pc.2 =  2.9083*map.1 +  4.4599*map.2  - 10.8699*map.3'
> r.mapcalc 'pc.3 =  1.8175*map.1  -  1.6232*map.2 \-  0.1797*map.3'
> 
> (is "\-" above a typo or some old mapcalc syntax?)
Probably a typo.


> which look highly similar (but not identical) to i.pca output maps.
It's ok. R print's out lot's of decimals. Perhaps the difference is due
to some rounding of the numbers.


> (after 'r.colors color=grey')

> # 'automatic method'
> imagery60:G6.5svn> i.pca in=spot.ms.1,spot.ms.2,spot.ms.3 out=spot_pca
> 
> Eigen (vectors) and values:
> PC1 ( -0.63 -0.65 -0.43 ) 88.07
> PC2 ( 0.23 0.37 -0.90 ) 11.48
> PC3 ( 0.75 -0.66 -0.08 ) 0.45



More information about the grass-stats mailing list