i.cluster
Michael Shapiro
shapiro at zorro.cecer.army.mil
Mon Aug 10 10:43:35 EDT 1992
The algorithm for i.cluster is basically that for isodata, as
follows:
(1) The user selects the number of classes that he/she thinks is
contained in the image. Call this number of classes C.
(2) The user also selects a subset of the pixels from the image
to be analyzed. The algorithm needs to have these pixels in
memory so a subset of all the pixels is chosen. This is done by
having the user select the rows and columns (ie every 4th row,
every 6th column, etc). Call this number of pixels N.
(3) i.cluster then forms C class centroids by calculating the
mean in each band of the N pixels to be analyzed, then
distributing C class centroids 1 or 2 standard deviations
about these means.
(4) Then i.cluster performs the following iteration:
(a) each of the N pixels is assigned to nearest the class
centroid (based on the Euclidean distance measure).
(b) after all pixels have been assigned to a class, the
centroid of the class is recomputed.
This iteration is performed until X% of the pixels do not change
their class from one iteration to the next, or the number of
iterations (I) is reached. Both X and I are chosen by the user.
(5) There is currently a merge step at this point. If the two
closest classes are within a user selected distance threshold,
they are combined into one class and step (4) is repeated. I say
currently because this merge phase will be removed from i.cluster
when GRASS4.1 is released. I have not been able to discover the
theoretical basis for the cluster distance measure used by
i.cluster. I suspect it is an add-hoc measure that only works for
4 bands (ie MSS).
The report generated by i.cluster reports all these values. It
report the initial within-band means used to compute the initial
class centroids; After each iteration it reports the new
centroids, the number of pixels in each class, the percentage of
pixels that remained stable (ie didn't change class from one
iteration to the next); it reports if/when clusters are merged;
and it displays the final class statistics, including an inter-
cluster distance matrix.
Probably the most important information is the %stability vs the
number of iterations. If the maximum number of iterations of
iterations is reached without achieving %X stability, then the
algorithm didn't converge. This means that you should either
accept the resulting %X stability or rerun i.cluster and specify
more iterations.
I should point out that i.cluster generates a signature file that
contains both the means and the covariance matrix for each class.
These parameters are need by i.maxlik. i.maxlik uses the
covariance as well as the means to decide if a pixel belongs to a
given class. However, i.cluster does NOT use the covariance
matrix when deciding which class a pixel should be assigned -
only distance to the centroid. (The covariance matrix is only
used in the merge phase). Anyone interested in commenting on this
discrepancy? GRASS isn't the only system to use these
algorithms.
|I have been running i.group, i.cluster and i.maxlik to perform an
|unsupervised classification of avhrr images over Asia. the output
|from i.cluster is long and detailed but no one here seems to know
|what the numbers mean, what the units are, etc. and the
|documentation does not give any explanation about it either. Does
|any one on the net knows what i.cluster produces and is there any
|documentation out there on how to use the result file?
Michael
More information about the grass-user
mailing list