[GRASS-dev] PCA question

Mon Jun 25 11:03:02 PDT 2012

On Mon, Jun 25, 2012 at 6:47 PM, Michael Barton <Michael.Barton at asu.edu> wrote:
> Thanks Marcus,
>
> Please see my post below, starting with "I think I've figured it out..."

Yes, I saw, but the formula below "I think I've figured it out..."
looked strange to me.
>
> Since I'm working in GRASS instead of R,
[ should be "and", not "instead of" ;) ]

Here is an example based on on the landsat imagery in mapset landsat,
North Carolina sample dataset:

i.pca without rescaling of the output:
i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30
output_prefix=lsat7_2000_pc rescale=0,0

The eigenvectors are
0.4494, 0.5128, 0.7315
0.6726, 0.3447,-0.6548
0.5879,-0.7863, 0.1901

this is a square 3x3 matrix. R gives the inverse matrix as

0.4493372, 0.6726549, 0.5879235
0.5128129, 0.3446144,-0.7862660
0.7315070,-0.6548316, 0.1899992

Test with band 10:
the general formula is
b1' = (ev1-1' * pc1 + ev1-2' * pc2 + ev1-3' * pc3) + mean(b1)

The mean of lsat7_2000_b10 is 79.925.
r.mapcalc "lsat7_2000_10.pc = (lsat7_2000_pc.1 * 0.4493372 +
lsat7_2000_pc.2 * 0.6726549 + lsat7_2000_pc.3 * 0.5879235) + 79.925"

Difference to original:
r.mapcalc "diff = lsat7_2000_10 - lsat7_2000_10.pc"

r.univar diff -g
n=250325
null_cells=0
cells=250325
min=-0.00140021304849824
max=0.00723340429107111
range=0.00863361733956935
mean=-1.8473599814612e-05

That's close enough, given that the eigenvectors are printed by i.pca
with limited precision (only 4 decimal places).

Without using R, the inverse of a 3x3 matrix can be manually
calculated as follows:

original matrix:
a b c
d e f
g h k

inverse matrix:
(ek - fh)   (ch - bk)   (bf - ce)
(fg - dk)   (ak - cg)   (cd - af)
(dh - eg)  (gb - ah)   (ae - bd)

each entry in this inverse matrix needs to be multiplied with

1 / (a(ek - fh) - b(kd - fg) + c(dh - eg))

in order to get the transformation matrix to be applied to the pc scores.

The formula is more complex for matrices with more than 3 dimensions,
e.g. for i.pca with 6 input bands.

>
> So my questions are:
>
> 1) Do I have the inversion correct in terms of how it needs to be calculated in GRASS?

I don't think so. Using your formula and testing for the difference gives me
min=-546.255208609669
max=174.771853180844
range=721.027061790513
mean=79.9249815357317

a bit too much IMHO.

> 2) Even though the mean of all bands seems to be subtracted from each factor score, does inverting the matrix mean that the mean of *each* band is added back to its transformed value? Adding either the mean of all original bands or 1 original band seems to produce values that are much higher than the original, and so need to be rescaled. Maybe this is OK.

The mean of each band is added back to each recalculated band. See
above r.mapcalc formula for band 10.

> 3) I do not normalize or rescale in i.pca. This seems to make it easier to do the inverse PCA with fewer calculations. Is there any reason I *should* rescale and/or normalize?

Normalization applies to the input of i.pca and is by default not
done. It needs to be done for heterogenous input data such as e.g.
rainfall, temperature, NDVI, etc. Rescaling is automatically applied
to the output of i.pca unless explicitly disabled with rescale=0,0.

Markus M

>
> Thanks much
> Michael
>
>
> On Jun 25, 2012, at 10:11 AM, Markus Metz wrote:
>
>> An inverse PCA can be regarded as the inverse of a transformation
>> using matrix notation. PC scores are calculated with
>>
>> b = A a
>>
>> with A being the transformation matrix composed of the Eigenvectors, a
>> being the vector of the original values and b the PC scores. What you
>> now need is inverse of A, A^-1. The original values can then be
>> retrieved with
>>
>> A^-1 b = a
>>
>> A^-1 is the inverse of the transformation matrix A which you can get
>> in R with solve(A).
>>
>> For a PCA, the original values are usually shifted to the mean and
>> optionally scaled to stddev before computing the Eigenvectors. The
>> mean shift is always performed by i.pca, scaling is optional. That
>> means that A^-1 b gives the original values shifted to the mean and
>> maybe scaled, and the mean of each original band needs to be added to
>> get the original values used as input to i.pca. With scaling applied,
>> the shifted values need to be multiplied by the stddev for each
>> original band.
>>
>> HTH,
>>
>> Markus M
>>
>>
>> On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton <Michael.Barton at asu.edu> wrote:
>>> The constant (i.e., the band mean) was in the pca primer that someone sent me a link to in this discussion. Using the Eigenvectors resulting from i.pca, I can achieve the results of i.pca using my formula below. This is essentially the same as your formula minus the constant--which doesn't really make much (of any) difference in the final result.
>>>
>>> However, my question is about performing an *inverse pca*--getting back to the original values from the principal components maps. The idea of pca sharpening is that you perform a pca, then do an inverse pca substituting the pan band for pc1 to enhance the resolution. The equations I show below seem to work, but what I've read indicates that it is not possible to *exactly* get the original values back; you can only approximate them.
>>>
>>> Michael
>>>
>>>
>>> On Jun 17, 2012, at 10:48 AM, Duccio Rocchini wrote:
>>>
>>>> Dear all,
>>>> first, sorry for the delay...
>>>> Here are my 2 cents to be added to the discussion. I re-took in my
>>>> hands the John Jensen book.
>>>> Accordingly
>>>>
>>>> new brightness values1,1,1 = a1,1*BV1,1,1  +a2,1*BV1,1,2..... + an,1*BV1,1,m
>>>>
>>>> where a=eigenvector and BV=original brightness value.
>>>>
>>>> I found no evidence for the mean term: "- ((b1+b2+b3)/3)"
>>>>
>>>> Michael: do you have a proof/reference for that?
>>>>
>>>> P.S. thanks for involving me in this discussion which is really stimulating!
>>>>
>>>> Duccio
>>>>
>>>> 2012/6/7 Michael Barton <michael.barton at asu.edu>:
>>>>>
>>>>> I think I've figured it out.
>>>>>
>>>>> If (ev1-1, ev1-2, ev1-3) are the eigenvectors of the first principal component for 3 imagery bands (b1, b2, b3), the corresponding factor scores of the PC1, PC2, and PC3 maps (fs1, fs2, fs3) are calculated as:
>>>>>
>>>>> fs1 = (ev1-1*b1) + (ev1-2*b2) + (ev1-3*b3) - ((b1+b2+b3)/3)
>>>>> fs2 = (ev2-1*b1) + (ev2-2*b2) + (ev2-3*b3) - ((b1+b2+b3)/3)
>>>>> fs3 = (ev3-1*b1) + (ev3-2*b2) + (ev3-3*b3) - ((b1+b2+b3)/3)
>>>>>
>>>>> So to do an inverse PCA on the same data you need to do the following:
>>>>>
>>>>> b1' = (fs1/ev1-1) + (fs2/ev2-1) + (fs3/ev3-1)
>>>>> b2' = (fs1/ev1-2) + (fs2/ev2-2) + (fs3/ev3-2)
>>>>> b3' = (fs1/ev1-3) + (fs2/ev2-3) + (fs3/ev3-3)
>>>>>
>>>>> Adding the constant back on doesn't really seem to matter because you need to rescale b1' to b1, b2' to b2, and b3' to b3 anyway.
>>>>>
>>>>> Michael
>>>>>
>>>>> On Jun 7, 2012, at 1:55 AM, Markus Neteler wrote:
>>>>>
>>>>>> Hi Duccio,
>>>>>>
>>>>>> On Wed, Jun 6, 2012 at 11:39 PM, Michael Barton <michael.barton at asu.edu> wrote:
>>>>>>> On Jun 6, 2012, at 2:20 PM, Markus Neteler wrote:
>>>>>>>> On Wed, Jun 6, 2012 at 5:09 PM, Michael Barton <michael.barton at asu.edu> wrote:
>>>>>> ...
>>>>>>>>> I'm working on a pan sharpening script that will combine your i.fusion.brovey with options to do other pan sharpening methods. An IHS transformation is easy. I think that a PCA sharpening is doable too if I can find an equation to rotate the PCA channels back into unrotated space--since i.pca does provide the eigenvectors.
>>>>>>>>
>>>>>>>> Maybe there is material in (see m.eigenvector)
>>>>>>>> http://grass.osgeo.org/wiki/Principal_Components_Analysis
>>>>>>>
>>>>>>> This has a lot of good information and ALMOST has what I need. Everything I read suggests that there is a straightforward way to get the original values from the factor scores if you have the eigenvectors. But no one I've yet found provides the equation or algorithm to do it.
>>>>>>
>>>>>> @Duccio: any idea about this by chance?
>>>>>>
>>>>>> thanks
>>>>>> Markus
>>>>>
>>>>> _____________________
>>>>> C. Michael Barton
>>>>> Visiting Scientist, Integrated Science Program
>>>>> National Center for Atmospheric Research &
>>>>> University Corporation for Atmospheric Research
>>>>> 303-497-2889 (voice)
>>>>>
>>>>> Director, Center for Social Dynamics & Complexity
>>>>> Professor of Anthropology, School of Human Evolution & Social Change
>>>>> Arizona State University
>>>>> www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Duccio Rocchini, PhD
>>>>
>>>> http://gis.cri.fmach.it/rocchini/
>>>>
>>>> Fondazione Edmund Mach
>>>> Research and Innovation Centre
>>>> Department of Biodiversity and Molecular Ecology
>>>> GIS and Remote Sensing Unit
>>>> Via Mach 1, 38010 San Michele all'Adige (TN) - Italy
>>>> Phone +39 0461 615 570
>>>> ducciorocchini at gmail.com
>>>> duccio.rocchini at fmach.it
>>>> skype: duccio.rocchini
>>>
>>> _____________________
>>> C. Michael Barton
>>> Visiting Scientist, Integrated Science Program
>>> National Center for Atmospheric Research &
>>> University Consortium for Atmospheric Research
>>> 303-497-2889 (voice)
>>>
>>> Director, Center for Social Dynamics & Complexity
>>> Professor of Anthropology, School of Human Evolution & Social Change
>>> Arizona State University
>>> www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> grass-dev mailing list
>>> grass-dev at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/grass-dev
>
> _____________________
> C. Michael Barton
> Visiting Scientist, Integrated Science Program
> National Center for Atmospheric Research &
> University Corporation for Atmospheric Research
> 303-497-2889 (voice)
>
> Director, Center for Social Dynamics & Complexity
> Professor of Anthropology, School of Human Evolution & Social Change
> Arizona State University
> www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu
>