[GRASS-dev] PCA (i.pca) in G7: filtering and rescaling

Sun Dec 8 12:36:32 PST 2013

On Thu, Dec 5, 2013 at 1:15 PM, Nikos Alexandris
<nik at nikosalexandris.net> wrote:
> Nikos Alexandris:
>> > ...we need those extra digits to make it easy rejecting last Principal
>> > Component(s) prior to the backward PCA. Might be one, two or numerous (?)
>> > depending on the dimensions.
>
> Markus M:
>> I think it rather depends in the amount of information encoded in each PC.
>
> It does. PCA works on global stats so one has to go through, then study
> visuals and numbers, then decide what to keep or how to treat further.
>
> In my very simple example, I want to see whether I want to reject the last or
> the two last ones.  If the filtering option lets me do that, I am happy :-).
> To exemplify, currently I can't reject two last components whicih account for
> 0.06 and 0.21 of the original data variance. I tested yesterday and the filter
> does not differentiate those subtle details which might be of importance (for
> a subsequent classification of High-Res images).

I don't think filtering makes sense in this case. If important
information is encoded in components explaining only a small part of
the variance, you don't want to filter them out. The idea of the
filtering option is to discard components that do not contribute much
in terms of observed variance.

It sounds like you rather want to identify those PCs that encode
information that is important in your case. In this case filtering
does not make sense, instead the identified components could be used
for subsequent processing.

>> Alternatively, PC selection could also be based on the Eigenvalue,
>> typically all PCs with an Eigenvalue >= 1 (centered and scaled input)
>> would be used.
>
> It depends. Typically might be simply compressing data or reducing salt 'n'
> pepper. However, in change detection studies, where changes are likely to
> appear in higher order components, it's not uncommon to have several
> components which account for <= 1 of the original variance and still are the
> ones you really need.

Don't filter in these cases, use the components relevant for the study ?

You can also filter manually using the factor loadings of the
components of interest.

Markus M