[GRASS-dev] [GRASS GIS] #576: i.pca fails to center data prior to analysis

GRASS GIS trac at osgeo.org
Thu Apr 30 06:50:10 EDT 2009


#576: i.pca fails to center data prior to analysis
--------------------------------------------------------------+-------------
 Reporter:  nikos                                             |       Owner:  grass-dev at lists.osgeo.org
     Type:  defect                                            |      Status:  new                      
 Priority:  normal                                            |   Milestone:  6.5.0                    
Component:  Raster                                            |     Version:  svn-develbranch6         
 Keywords:  i.pca, data centering, prcomp(), R, eigenvectors  |    Platform:  Unspecified              
      Cpu:  Unspecified                                       |  
--------------------------------------------------------------+-------------
 I have spotted one case where ''i.pca'' does not work as expected. I have
 a set of 3 MODIS surface reflectance bands. Performing PCA on those using
 ''i.pca'' '''does not center''' the data before the analysis, that is, the
 mean of each dimension (band) is not subtracted from the dimension itself
 to give a dataset that has zero mean which is an integral part of the
 solution to PCA.
 [[BR]]


 * i.pca on the _raw_ bands gives the following Eigenvalues + Eigenvectors:
 {{{
 PC1 6307563.04 (-0.6353,-0.6485,-0.4192) [98.71%]
 PC2   78023.63 (-0.7124, 0.2828, 0.6422) [ 1.22%]
 PC3    4504.60 (-0.2979, 0.7067,-0.6417) [ 0.07%]
 }}}
 [[BR]]


 * Using the same data with the ''prcomp(x, center=TRUE, scale=FALSE)''
 function in R, which centers the dataset by default anyway if not told
 otherwise, gives different results:

 {{{
                PC1         PC2        PC3
 mod07_b2 0.4372107  0.83099407 -0.3439413
 mod07_b6 0.7210155 -0.09527873  0.6863371
 mod07_b7 0.5375718 -0.54806096 -0.6408165
 }}}

 '''Note:''' the output of ''prcomp()'' delivers the Principal Components
 column-wise, while ''i.pca'' delivers them row-wise.
 [[BR]]


 * Further checking reveals that centering the data manually in grass, e.g.
 using
 {{{
 r.mapcalc "mod_band_centered = mod_band - mean(mod_band)"
 }}}
 gives (almost) the same results as the ''prcomp()'' function with the
 parameter ''center=TRUE'' (example above). The numbers talk for themself:

 {{{
 PC1 270343.07 (-0.4403,-0.7222,-0.5335) [79.11%]
 PC2  67140.50 (-0.8275, 0.0957, 0.5533) [19.65%]
 PC3   4258.14 ( 0.3485,-0.6851, 0.6397) [ 1.25%]
 }}}

 The question is what causes ''i.pca'', in this specific case, not to
 center the dataset?[[BR]]
 Thanks, Nikos

 '''Sources:'''[[BR]]
 The data are available at: [https://git.berlios.de/cgi-
 bin/gitweb.cgi?p=gregis;a=blob_plain;f=peloponnesos/modis_peloponnese_postfire07.zip;hb=peloponnese
 grass location with MODIS bands] and [https://git.berlios.de/cgi-
 bin/gitweb.cgi?p=gregis;a=blob_plain;f=peloponnesos/modis_peloponnese_postfire07_GeoTiff.zip;hb=peloponnese
 MODIS bands as geotiff files]
 More details in the archive:
 [http://n2.nabble.com/Testing-i.pca-~-prcomp()%2C-m.eigensystem-~-princomp()-tt2413700.html#none
 Testing i.pca (continued...)]
 and in grass-wiki:
 [http://grass.osgeo.org/wiki/Principal_Component_Analysis Principal
 Component Analysis]

-- 
Ticket URL: <http://trac.osgeo.org/grass/ticket/576>
GRASS GIS <http://grass.osgeo.org>


More information about the grass-dev mailing list