[GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(),
m.eigensystem ~ princomp()
Markus Metz
markus.metz.giswork at googlemail.com
Thu Apr 2 03:49:56 EDT 2009
Edzer Pebesma wrote:
> Markus Metz wrote:
>
>> I think scale and normalize are two different things.
>>
> I believe that in statistics these two words don't have a generally
> accepted definition. They're useful as long as you explain what you mean
> by them.
>
At least in the statistics literature I use, these two methods are
differently defined. Scaling is like r.rescale, and normalization
converts data to a mean of 0 and a stddev of 1, the data distribution is
changed to a standard normal distribution. But usually I wouldn't worry
too much about terms as long as it is explained what they mean.
> Well, PCA only captures covariance or correlation, meaning linear
> relationships, and it may be the case that the most interesting features
> are non-linear.
So if a PCA does not capture non-linear relationships, I don't see how
it could help to use PC's that explain nearly no variation in the
dataset. And you could do e.g. a log transform first, or whatever else
is appropriate to convert the suspected type of non-linear relation to a
linear relation and then feed the transformed variables to a PCA.
> For instance, NDVI is the ratio of a sum over a
> difference (or reversed?), which cannot be expressed as a linear
> combination of bands.
Not directly, but being a normalized difference (should be standardised
not normalized) it can be approximated with linear combinations, i.e.
there is at least some correlation between the raw bands and a
normalized difference calculated from them.
> The first PCA(s?) usually express brightness, only
> later ones give more interesting features resulting from more complex
> interactions of bands (notably differences) -- loadings usually have the
> same sign for the first PC, and mixed signs for later PC's. John C.
> Davis in "statistics and data analysis for geologists" called this the
> "size and shape effect". The most interesting PC's may have a EV smaller
> than 1, when they come from correlation matrices. Geochemists don't shy
> away from interpreting 7 or more factors.
>
The question is not the number of factors, but what criteria to use to
select and interpret the resulting PCs. What makes a PC interesting can
be the amount of explained variance, but also the dominant variables in
it. BTW, some textbooks recommend to use only rotated PCs if a rotation
could be performed. In a mathematical sense, the sign of the loadings is
arbitrary because the absolute value as well as the result of a PCA will
stay the same after new_var = -old_var. The same sign for the first PC
and so on is not generally valid and with regard to imagery probably
only applies to surface reflectance or radiation measured at the sensor,
and I would guess is dependent on the number of bands and the wavelength
captured by each.
All this is however far from the i.pca eigenvalue problem, going towards
comments on the general use of PCAs for remote sensing and as such
probably only of interest to the grass-stats ml.
More information about the grass-stats
mailing list