[GRASS-dev] PCA question

Mon Jun 25 11:21:08 PDT 2012

Markus,

This is very helpful. See below for a couple additional questions.

Thanks much
Michael

On Jun 25, 2012, at 12:03 PM, Markus Metz wrote:

> On Mon, Jun 25, 2012 at 6:47 PM, Michael Barton <Michael.Barton at asu.edu> wrote:
>> Thanks Marcus,
>> 
>> Please see my post below, starting with "I think I've figured it out..."
> 
> Yes, I saw, but the formula below "I think I've figured it out..."
> looked strange to me.
>> 
>> Since I'm working in GRASS instead of R,
> [ should be "and", not "instead of" ;) ]
> 
> Here is an example based on on the landsat imagery in mapset landsat,
> North Carolina sample dataset:
> 
> i.pca without rescaling of the output:
> i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30
> output_prefix=lsat7_2000_pc rescale=0,0
> 
> The eigenvectors are
> 0.4494, 0.5128, 0.7315
> 0.6726, 0.3447,-0.6548
> 0.5879,-0.7863, 0.1901
> 
> this is a square 3x3 matrix. R gives the inverse matrix as
> 
> 0.4493372, 0.6726549, 0.5879235
> 0.5128129, 0.3446144,-0.7862660
> 0.7315070,-0.6548316, 0.1899992
> 
> Test with band 10:
> the general formula is
> b1' = (ev1-1' * pc1 + ev1-2' * pc2 + ev1-3' * pc3) + mean(b1)
> 
> The mean of lsat7_2000_b10 is 79.925.
> r.mapcalc "lsat7_2000_10.pc = (lsat7_2000_pc.1 * 0.4493372 +
> lsat7_2000_pc.2 * 0.6726549 + lsat7_2000_pc.3 * 0.5879235) + 79.925"
> 
> Difference to original:
> r.mapcalc "diff = lsat7_2000_10 - lsat7_2000_10.pc"
> 
> r.univar diff -g
> n=250325
> null_cells=0
> cells=250325
> min=-0.00140021304849824
> max=0.00723340429107111
> range=0.00863361733956935
> mean=-1.8473599814612e-05
> 
> That's close enough, given that the eigenvectors are printed by i.pca
> with limited precision (only 4 decimal places).
> 
> Without using R, the inverse of a 3x3 matrix can be manually
> calculated as follows:
> 
> original matrix:
> a b c
> d e f
> g h k
> 
> inverse matrix:
> (ek - fh)   (ch - bk)   (bf - ce)
> (fg - dk)   (ak - cg)   (cd - af)
> (dh - eg)  (gb - ah)   (ae - bd)

Your example from R above *looks* like the inverse of matrix...

a b c
d e f
g h k

is (given rounding errors) ...

a d g
b e h
c f k

That is, transpose rows and columns.
Is this just a coincidence or a reasonable approximation?

> 
> each entry in this inverse matrix needs to be multiplied with
> 
> 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg))

by a(ek - fh)... , you mean a = factor score 1 of a cell, b = factor score 2 of a cell, c = factor score 3 of a cell (not eigenvectors a,b,c or the matrix), right?

> 
> in order to get the transformation matrix to be applied to the pc scores.
> 
> The formula is more complex for matrices with more than 3 dimensions,
> e.g. for i.pca with 6 input bands.
> 
>> 
>> So my questions are:
>> 
>> 1) Do I have the inversion correct in terms of how it needs to be calculated in GRASS?
> 
> I don't think so. Using your formula and testing for the difference gives me
> min=-546.255208609669
> max=174.771853180844
> range=721.027061790513
> mean=79.9249815357317
> 
> a bit too much IMHO.

Right. Looks like I'll need to use r.univar to calculate the mean of each band to add back in. Thanks again. I will try this out manually and test before updating the pan sharpening routine.

> 
>> 2) Even though the mean of all bands seems to be subtracted from each factor score, does inverting the matrix mean that the mean of *each* band is added back to its transformed value? Adding either the mean of all original bands or 1 original band seems to produce values that are much higher than the original, and so need to be rescaled. Maybe this is OK.
> 
> The mean of each band is added back to each recalculated band. See
> above r.mapcalc formula for band 10.
> 
>> 3) I do not normalize or rescale in i.pca. This seems to make it easier to do the inverse PCA with fewer calculations. Is there any reason I *should* rescale and/or normalize?
> 
> Normalization applies to the input of i.pca and is by default not
> done. It needs to be done for heterogenous input data such as e.g.
> rainfall, temperature, NDVI, etc. Rescaling is automatically applied
> to the output of i.pca unless explicitly disabled with rescale=0,0.

Thanks. That is what I thought.

Michael

> 
> Markus M
> 
>> 
>> Thanks much
>> Michael
>> 
>> 
>> On Jun 25, 2012, at 10:11 AM, Markus Metz wrote:
>> 
>>> An inverse PCA can be regarded as the inverse of a transformation
>>> using matrix notation. PC scores are calculated with
>>> 
>>> b = A a
>>> 
>>> with A being the transformation matrix composed of the Eigenvectors, a
>>> being the vector of the original values and b the PC scores. What you
>>> now need is inverse of A, A^-1. The original values can then be
>>> retrieved with
>>> 
>>> A^-1 b = a
>>> 
>>> A^-1 is the inverse of the transformation matrix A which you can get
>>> in R with solve(A).
>>> 
>>> For a PCA, the original values are usually shifted to the mean and
>>> optionally scaled to stddev before computing the Eigenvectors. The
>>> mean shift is always performed by i.pca, scaling is optional. That
>>> means that A^-1 b gives the original values shifted to the mean and
>>> maybe scaled, and the mean of each original band needs to be added to
>>> get the original values used as input to i.pca. With scaling applied,
>>> the shifted values need to be multiplied by the stddev for each
>>> original band.
>>> 
>>> HTH,
>>> 
>>> Markus M
>>> 
>>> 
>>> On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton <Michael.Barton at asu.edu> wrote:
>>>> The constant (i.e., the band mean) was in the pca primer that someone sent me a link to in this discussion. Using the Eigenvectors resulting from i.pca, I can achieve the results of i.pca using my formula below. This is essentially the same as your formula minus the constant--which doesn't really make much (of any) difference in the final result.
>>>> 
>>>> However, my question is about performing an *inverse pca*--getting back to the original values from the principal components maps. The idea of pca sharpening is that you perform a pca, then do an inverse pca substituting the pan band for pc1 to enhance the resolution. The equations I show below seem to work, but what I've read indicates that it is not possible to *exactly* get the original values back; you can only approximate them.
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> On Jun 17, 2012, at 10:48 AM, Duccio Rocchini wrote:
>>>> 
>>>>> Dear all,
>>>>> first, sorry for the delay...
>>>>> Here are my 2 cents to be added to the discussion. I re-took in my
>>>>> hands the John Jensen book.
>>>>> Accordingly
>>>>> 
>>>>> new brightness values1,1,1 = a1,1*BV1,1,1  +a2,1*BV1,1,2..... + an,1*BV1,1,m
>>>>> 
>>>>> where a=eigenvector and BV=original brightness value.
>>>>> 
>>>>> I found no evidence for the mean term: "- ((b1+b2+b3)/3)"
>>>>> 
>>>>> Michael: do you have a proof/reference for that?
>>>>> 
>>>>> P.S. thanks for involving me in this discussion which is really stimulating!
>>>>> 
>>>>> Duccio
>>>>> 
>>>>> 2012/6/7 Michael Barton <michael.barton at asu.edu>:
>>>>>> 
>>>>>> I think I've figured it out.
>>>>>> 
>>>>>> If (ev1-1, ev1-2, ev1-3) are the eigenvectors of the first principal component for 3 imagery bands (b1, b2, b3), the corresponding factor scores of the PC1, PC2, and PC3 maps (fs1, fs2, fs3) are calculated as:
>>>>>> 
>>>>>> fs1 = (ev1-1*b1) + (ev1-2*b2) + (ev1-3*b3) - ((b1+b2+b3)/3)
>>>>>> fs2 = (ev2-1*b1) + (ev2-2*b2) + (ev2-3*b3) - ((b1+b2+b3)/3)
>>>>>> fs3 = (ev3-1*b1) + (ev3-2*b2) + (ev3-3*b3) - ((b1+b2+b3)/3)
>>>>>> 
>>>>>> So to do an inverse PCA on the same data you need to do the following:
>>>>>> 
>>>>>> b1' = (fs1/ev1-1) + (fs2/ev2-1) + (fs3/ev3-1)
>>>>>> b2' = (fs1/ev1-2) + (fs2/ev2-2) + (fs3/ev3-2)
>>>>>> b3' = (fs1/ev1-3) + (fs2/ev2-3) + (fs3/ev3-3)
>>>>>> 
>>>>>> Adding the constant back on doesn't really seem to matter because you need to rescale b1' to b1, b2' to b2, and b3' to b3 anyway.
>>>>>> 
>>>>>> Michael
>>>>>> 
>>>>>> On Jun 7, 2012, at 1:55 AM, Markus Neteler wrote:
>>>>>> 
>>>>>>> Hi Duccio,
>>>>>>> 
>>>>>>> On Wed, Jun 6, 2012 at 11:39 PM, Michael Barton <michael.barton at asu.edu> wrote:
>>>>>>>> On Jun 6, 2012, at 2:20 PM, Markus Neteler wrote:
>>>>>>>>> On Wed, Jun 6, 2012 at 5:09 PM, Michael Barton <michael.barton at asu.edu> wrote:
>>>>>>> ...
>>>>>>>>>> I'm working on a pan sharpening script that will combine your i.fusion.brovey with options to do other pan sharpening methods. An IHS transformation is easy. I think that a PCA sharpening is doable too if I can find an equation to rotate the PCA channels back into unrotated space--since i.pca does provide the eigenvectors.
>>>>>>>>> 
>>>>>>>>> Maybe there is material in (see m.eigenvector)
>>>>>>>>> http://grass.osgeo.org/wiki/Principal_Components_Analysis
>>>>>>>> 
>>>>>>>> This has a lot of good information and ALMOST has what I need. Everything I read suggests that there is a straightforward way to get the original values from the factor scores if you have the eigenvectors. But no one I've yet found provides the equation or algorithm to do it.
>>>>>>> 
>>>>>>> @Duccio: any idea about this by chance?
>>>>>>> 
>>>>>>> thanks
>>>>>>> Markus
>>>>>> 
>>>>>> _____________________
>>>>>> C. Michael Barton
>>>>>> Visiting Scientist, Integrated Science Program
>>>>>> National Center for Atmospheric Research &
>>>>>> University Corporation for Atmospheric Research
>>>>>> 303-497-2889 (voice)
>>>>>> 
>>>>>> Director, Center for Social Dynamics & Complexity
>>>>>> Professor of Anthropology, School of Human Evolution & Social Change
>>>>>> Arizona State University
>>>>>> www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Duccio Rocchini, PhD
>>>>> 
>>>>> http://gis.cri.fmach.it/rocchini/
>>>>> 
>>>>> Fondazione Edmund Mach
>>>>> Research and Innovation Centre
>>>>> Department of Biodiversity and Molecular Ecology
>>>>> GIS and Remote Sensing Unit
>>>>> Via Mach 1, 38010 San Michele all'Adige (TN) - Italy
>>>>> Phone +39 0461 615 570
>>>>> ducciorocchini at gmail.com
>>>>> duccio.rocchini at fmach.it
>>>>> skype: duccio.rocchini
>>>> 
>>>> _____________________
>>>> C. Michael Barton
>>>> Visiting Scientist, Integrated Science Program
>>>> National Center for Atmospheric Research &
>>>> University Consortium for Atmospheric Research
>>>> 303-497-2889 (voice)
>>>> 
>>>> Director, Center for Social Dynamics & Complexity
>>>> Professor of Anthropology, School of Human Evolution & Social Change
>>>> Arizona State University
>>>> www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> grass-dev mailing list
>>>> grass-dev at lists.osgeo.org
>>>> http://lists.osgeo.org/mailman/listinfo/grass-dev
>> 
>> _____________________
>> C. Michael Barton
>> Visiting Scientist, Integrated Science Program
>> National Center for Atmospheric Research &
>> University Corporation for Atmospheric Research
>> 303-497-2889 (voice)
>> 
>> Director, Center for Social Dynamics & Complexity
>> Professor of Anthropology, School of Human Evolution & Social Change
>> Arizona State University
>> www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu
>> 

_____________________
C. Michael Barton
Visiting Scientist, Integrated Science Program
National Center for Atmospheric Research &
University Corporation for Atmospheric Research
303-497-2889 (voice)

Director, Center for Social Dynamics & Complexity 
Professor of Anthropology, School of Human Evolution & Social Change
Arizona State University
www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu