i.cluster

Mon Aug 10 10:43:35 EDT 1992

The algorithm for i.cluster is basically  that  for  isodata,  as
follows:

(1) The user selects the number of classes that he/she thinks  is
contained in the image. Call this number of classes C.

(2) The user also selects a subset of the pixels from  the  image
to  be  analyzed.  The  algorithm  needs  to have these pixels in
memory so a subset of all the pixels is chosen. This is  done  by
having  the  user  select the rows and columns (ie every 4th row,
every 6th column, etc).  Call this number of pixels N.

(3) i.cluster then forms C class  centroids  by  calculating  the
mean  in  each  band  of  the  N  pixels  to  be  analyzed,  then
distributing C class  centroids  1  or  2   standard   deviations
about these means.

(4) Then i.cluster performs the following iteration:

    (a) each of the N pixels is assigned  to  nearest  the  class
    centroid (based on the Euclidean distance measure).

    (b) after all pixels have  been  assigned  to  a  class,  the
    centroid of the class is recomputed.

This iteration is performed until X% of the pixels do not  change
their  class  from  one  iteration  to the next, or the number of
iterations (I) is reached. Both X and I are chosen by the user.

(5) There is currently a merge step at this  point.  If  the  two
closest   classes  are within a user selected distance threshold,
they are combined into one class and step (4) is repeated. I  say
currently because this merge phase will be removed from i.cluster
when GRASS4.1 is released. I have not been able to  discover  the
theoretical  basis  for  the  cluster  distance  measure  used by
i.cluster. I suspect it is an add-hoc measure that only works for
4 bands (ie MSS).

The report generated by i.cluster reports all  these  values.  It
report  the initial within-band means used to compute the initial
class  centroids;  After  each  iteration  it  reports  the  new
centroids,  the number of pixels in each class, the percentage of
pixels that remained stable (ie  didn't  change  class  from  one
iteration  to  the next); it reports if/when clusters are merged;
and it displays the final class statistics, including  an  inter-
cluster distance matrix.

Probably the most important information is the %stability vs  the
number   of  iterations.  If  the maximum number of iterations of
iterations is reached without achieving %X  stability,  then  the
algorithm  didn't  converge.  This  means  that you should either
accept the resulting %X stability or rerun i.cluster and  specify
more iterations.

I should point out that i.cluster generates a signature file that
contains both the means and the covariance matrix for each class.
These  parameters  are  need  by  i.maxlik.  i.maxlik  uses   the
covariance as well as the means to decide if a pixel belongs to a
given class. However,  i.cluster  does  NOT  use  the  covariance
matrix  when  deciding  which  class a pixel should be assigned -
only distance to the centroid. (The  covariance  matrix  is  only
used in the merge phase). Anyone interested in commenting on this
discrepancy?   GRASS  isn't  the  only  system   to   use   these
algorithms.

|I have been running i.group, i.cluster and i.maxlik to perform an
|unsupervised classification of avhrr images over Asia. the output
|from i.cluster is long and detailed but no one here seems to  know
|what  the  numbers  mean,  what  the  units  are,  etc.  and  the
|documentation does not give any explanation about it either. Does
|any one on the net knows what i.cluster produces and is there any
|documentation out there on how to use the result file?

Michael