[GRASS-user] Is "i.cluster" an implementation of the ISODATA algorithm?

Wed Oct 31 09:56:57 PDT 2012

On 31/10/12 00:48, Nikos Alexandris wrote:
> NikosA:
>
>>>>> I wonder why the term ISODATA [2] is not to be traced anywhere in the
>>>>> GRASS manuals, nor in the GRASS book (3rd ed.).  Can someone confirm
>>>>> that i.cluster is an(other) implementation of the ISODATA clustering
>>>>> algorithm?
>
> MarkusN:
>
>>>> I search in my inbox and found some earlier discussion with the
>>>> Subject: "Re: [GRASS-user] Re: algorithm used in i.cluster", see
>>>> below.
>
>>>> PS: Still we need a text snippet to improve the manual...
>
>>> Things to keep in mind from the archived discussions (below):
>
>>> - the ISODATA algorithm (Ball and Hall, 1967) is a common modification of
>>> the K-means algorithm
>
>>> - the algorithm implemented in the "i.cluster" module involves merging of
>>> classes (I_cluster_merge) though no splitting function seems to be
>>> implemented
>
> Moritz:
>
>> To add: i.cluster output is also not equivalent to ISODATA in so far as
>> it does not classify all pixels, but only creates signature files for
>> the classes. Classification is then done by i.maxlik which is not
>> equivalent to ISODATA. It might be an interesting addition to i.cluster
>> to work with all pixels (not only a subset) and to create an output
>> assigning each pixel to a given class which would be close (but not
>> equal) to ISODATA output.
>
> Moritz,
>
> please correct me if I am wrong.  I feel that the above sentences draw
> actually an unnecessary confusion.
>
> The first step is to cluster pixels according to their (similar, spectral)
> properties.  The second step is to classify the clusters, meaning labeling of
> the resulted clusters.   So, I don't see were confusions might arise, apart
> from the question whether "i.cluster" is an exact implementation of the
> ISODATA algorithm or not.

In GRASS, you don't have exactly these steps. i.cluster does not cluster 
all pixels, but only a sample (see parameter 'sample'). The result of 
that clustering is not that all pixels are assigned to a given cluster, 
but only that you have signatures that are "representative" of a given 
cluster. If you run i.cluster on the same data asking for the same 
number of classes, but with different sample sizes, you will probably 
get slightly different signatures for each cluster at each run.

In the second step, you use i.maxlik to then assign each pixel to one of 
the clusters / classes created by i.cluster.

Labelling is actually a third step in that process.

So, i.cluster is used for creating signatures of representative classes, 
just as i.gensig.

ISODATA, OTOH, clusters all pixels and thus already assigns each pixel 
to a given cluster / class, without going through the i.maxlik phase.

i.cluster with a sample=1,1 (if you can get that to run without an alloc 
error on your machine) should use all pixels for the creation of the 
clusters, but it does not allow you to produce a raster layer indicating 
for each pixel the cluster it is assigned to. You have to do i.maxlik to 
do that.

However, i.maxlik does not use the same algorithm to assign pixels to 
clusters / classes as ISODATA. So the result is not exactly the same.

> - ISODATA is a clustering algorithm, not a classification algorithm per se --
> skimming through Richards book (1999) [1], pages 182, 189, 225.  And the
> "i.cluster" module might not be identical to the ISODATA clustering algorithm,
> yet, it performs clustering. So, both do the same job, most likely in a more
> or less similar way.

Well, as explained above, i.cluster does not cluster the entire map, and 
does not show you the result of the clustering other than in form of a 
signature file, so i.cluster does not perform clustering in the same 
meaning as ISODATA.

> -  "i.maxlik" performs the classification of the clusters, which is not a
> clustering process. Thus, the module can/should not be identical to the
> ISODATA clustering algorithm.  I might have missed something (skimming through
> the manuals), but I didn't read anywhere that "i.maxlik" is performing
> clustering.

i.maxlik does not perform clustering but assignment of pixels to 
classes/clusters based on signatures of such classes that are either 
created by clustering or through training maps. ISODATA assigns pixels 
to classes through clustering.

I hope I'm being understandable...

Moritz