[GRASS-user] Calculating eigen values and %varianceexplainedafter PCA analysis

Wesley Roberts wroberts at csir.co.za
Fri Feb 27 08:12:59 EST 2009


Thanks Nikos,

Have read your mail and the associated links. Many thanks they have confirmed some things for me. 

Firstly I would like to standardise the PCA, therefore I would like each input to contribute equally to the result. As such I use the correlation matrix as opposed to the covariance. By using this method I do not need to centre the data, yes?

Secondly, using the by hand method r.covar -r -> m.eigensystem -> r.mapcalculator, in particular when applying the eigen vectors to the input imagery I can disregard the signs and take them as absolute values, yes?

Finally the size of these values indicates their relative contribution to the component, so for eg. if band 1 has an eigen vecor of 0.8 and band 2 has a value of 0.1, band 1 contributes more to the pc than band 2, yes?

I will run some tests this afternoon and continue next week and report back. Let me know if my knowledge above is correct.

Many thanks,
Wesley

Wesley Roberts MSc.
Researcher: Earth Observation (Ecosystems)
Natural Resources and the Environment
CSIR
Tel: +27 (21) 888-2490
Fax: +27 (21) 888-2693

"To know the road ahead, ask those coming back."
- Chinese proverb

>>> Nikos Alexandris <nikos.alexandris at felis.uni-freiburg.de> 02/27/09 1:27 PM >>>

Wesley
> I downloaded and installed GRASS 6.4 and after much "wailing and
> gnashing of teeth" I got m.eigensystem to work. Below are some
> comments and questions.

Nice that it worked-out finally. Hopefully my comments are useful for
you (and correct). You can have a look in the following links
[1][2][3][4].



> Over the last couple of days I have been running PCA analyses using
> the i.pca and r.covar -> m.eigensystem -> r.mapcalc. The analysis
> seeks to create a component surface where tree crowns are separated
> from understory and ground in a plantation forest. Inputs are three
> digital aerial photographs (red, green, blue), a top of canopy height
> model, and an intensity surface derived from lidar return intensity
> measures. Output from the PCA will be input into a tree couting method
> which (if all goes well) will use mathematical morphology to isolate
> tree crowns for counting purposes

Interesting stuff!



> My results are interesting and worth mentioning to the list. Firstly,
> the results from both the automated (i.pca) and the
> 'by-hand-method' (r.covar -> m.eigensystem -> r.mapcalc) differ. For
> example; the eigen values from the automated approach are as follows

> (-0.50 -0.53 -0.49 -0.47 -0.08)
> (-0.38 -0.30 -0.13 0.86 0.11)
> (-0.34 -0.35 0.86 -0.14 0.05)
> (0.70 -0.71 -0.01 0.06 0.03)
> (0.00 -0.03 0.07 0.13 -0.99)

> while the eigen values from the 'by-hand-method' are completely
> different, in fact I am a little confused with regards to the ouput
> from i.pca and the m.eigensystem. i.pca returns the n number of
> components plus the eigen values for each component (or are those
> vectors?).

Yes, those are the eigen_VECTORS_(=loadings, on other words the amount
of information that contribute each of the original dimensions in the
resulting components). Each row corresponds to one principal components.
In your example above you "know" that the 1st component (1st row) is
composed by the original dimensions (each column) and each original
dimension has "contributed" according to the _loadings_:

So dimenions 1 -> -0.50, dimension 2 -> -0.53 , dimension 3 -> -0.49,
dimension 4 -> -0.47 and dimension 5 ->  -0.08

If I understand well the PCA myself, you can disregard the "signs" and
see the loadings as absolute values.



> Would it be fair in saying that these are the coefficients which have
> been applied to the input imagery to attain the output components (in
> the same way the m.eigensystem works with r.mapcalc)?

Yes.



> Output from the m.eigensystem approach only gives one eigen value per
> component (see below).
> Are the above values from i.pca not the eigen vectors?

It should be the case with i.pca as well since eigen_VALUES_ (=represent
the variances of the original dimensions that are "kept" in each
component) are important for the interpretation of what exactly are each
of the components. But, i.pca just does not report the eigen_VALUES_.

At some point some C-expert needs to have a look in the code (i.pca) and
correct the "bug" which does not let the eigen_VALUES_ from being
printed.



>  If this is the case then both methods still differ significantly. Is
> this possible, and which should I use.

Please have a look at my comments/questions in link [2]. i.pca follows
the "SVD" method. You performed the non-standartised PCA using the
covariance matrix. Note that you can use also the standartised method by
using the correlation matrix. 



> Qualitatively, the 'by-hand-method' seems to isolate the crowns very
> nicely in PC1 while the automated (i.pca) approach isolates crowns in
> PC3?? I rescaled the output in the i.pca method, would this contribute
> to the differences seen?
> 
> I am going to run more tests on the rest of my data and will see if
> these issues arise again. In the meantime if anyone of the list can
> offer some insight into the two different pca analysis examples I
> would greatly appreciate it.

I would be happy to hear more. It's a tool I also need.
Kindest regards, Nikos

[...]


---
Links:

# in grass-user mailing list

[1] # In these posts I didn't know much about PCA #
http://n2.nabble.com/i.pca--vs.--r.covar-m.eigensystem-r.mapcalc-td1885820.html#a1885821

[2] # this is the one I have sent you already #
http://n2.nabble.com/Comparison-between-"i.pca"-and-R's-"prcomp()"%
3A-explanations-and-questions-td2283997.html#a2284070


# in grass-trac

[3] http://trac.osgeo.org/grass/ticket/341

[4] http://trac.osgeo.org/grass/ticket/430



-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.  MailScanner thanks Transtec Computers for their support.



More information about the grass-user mailing list