[GRASS-user] [GRASS-dev] What is the meaning of output from i.cluster

Nikos Alexandris nik at nikosalexandris.net
Tue Mar 26 06:11:49 PDT 2013


Michael Barton wrote:

> Thanks for the explanation Nikos.

And I re-direct this ("thanks") to the people who have done the real work -- I 
am still stealing stuff ;-)


> But see below.

Sure, i.cluster is a favourite re-call subject :-).
More below -- please have a look at a draft version of a "to-become" a Wiki 
page related to Clustering:

<http://grasswiki.osgeo.org/wiki/User:NikosA/About_Clustering> -- this link 
repeated below as I will try to justify the need for an extra page, besides 
the actual manual.

 
Michael Barton wrote:
> >> i.cluster produces a text output file that looks like this (for the
> >> landsat 2000 images from the nc_spm_08 demo data set).
> >> 
> >> #produced by i.cluster
> >> #Class 1
> >> 247
> >> 69.1174 50.3603 41.3482 18.9514 17.4534 117.049 14.2105
> >> 10.5837
> >> 12.6608 22.8737
> >> 17.1663 29.6708 45.1628
> >> 7.36345 9.28993 16.6389 47.4367
> >> 9.14573 9.22619 16.9756 47.1115 70.631
> >> 7.75037 6.67348 10.1294 13.043 14.9941 17.5505
> >> 7.63372 6.74497 12.0565 27.4981 43.3025 12.0629 30.7522
> >> #Class 2
> >> 2059
> >> 70.0257 53.2035 46.205 61.085 61.5935 121.271 35.6688
> >> 9.28262
> >> 7.5725 9.99696
> >> 11.3514 11.8868 22.0678
> >> -6.02017 -0.713125 -9.81723 54.7357
> >> -1.33501 2.43359 2.91134 15.8135 40.1529
> >> 4.98865 4.49973 6.3419 -2.73927 -1.77851 11.3745
> >> 5.31077 6.34246 10.6505 -4.60692 19.1398 3.18893 19.0243
> >> 
> >> [more for the other classes]
> >> 
> >> So what does this mean? There is no clear explanation of this in the
> >> manual, and there are no variable names in the output.
> >> 
> >> I am guessing that
> >> line 1 is possibly the number of pixels in the class
 
> If so, the report should be: 
> Number of pixels in class: #####
[..]

I agree.

> >> line 2 is the cluster mean within each original raster input file
> >> (landsat maps bands in this case)

> And here, there should be a headings line and explanation something like
> this: 
>               map 1     map 2      map 3    map 4
> Mean values: ######### ######### ######### #########

I agree!

> > *Note,* however, as previously well-explained by Moritz Lennert
> > (http://lists.osgeo.org/pipermail/grass-user/2012-October/066046.html):
> > --%<---
> > i.cluster does not cluster all pixels, but only a sample (see parameter
> > 'sample'). The result of that clustering is not that all pixels are
> > assigned to a given cluster, but only that you have signatures that are
> > "representative" of a given cluster. If you run i.cluster on the same data
> > asking for the same number of classes, but with different sample sizes,
> > you will probably get slightly different signatures for each cluster at
> > each run.
> > --->%--

> This needs to be in the manual rather than lost in a email

Sure.

> >> The remaining lines are some kind of matrix. If a correlation matrix, I
> >> don't understand why the diagonal is not 1--or at least the biggest
> >> number for the relevant input maps. Any explanation?

> > It is a variance(=the diagonal)-covariance matrix(=the off diagonal
> > elements) as described in the manual and mentioned else-where in past
> > threads in the list.
 
> So the diagonal values = the variance in the values for that band in a
> cluster The off diagonal values are the covariance between the band values
> for a cluster

> Right? This is not in the manual but should be. IMHO, it should also be in
> the report output something like this.
> 
> Variance (diagonal) and covariance scores (off-diagonal)
> 
>         map 1     map 2      map 3    map 4
> map 1 #########
> map 1 ######### #########
> map 1 ######### ######### #########
> map 1 ######### ######### ######### #########

Agreed!

> > As previously noted by Hamish Bowman, let's have a look at
> > (http://lists.osgeo.org/pipermail/grass-user/2008-June/045108.html):

> I'm not sure what the following means. Sorry. If it is necessary to delve
> into the source code to find out what is being reported in the report, the
> report needs to be changed so that is not necessary.

Absolutely.  Note, there may be the need to differentiate what the user report 
looks like and what the actual input for the classification is (e.g. for 
i.maxlik) which, I guess, is the current structure of the signature files (?).

> It would also be useful to know how much each map contributes to each
> cluster.

(shrug)

I need to "think" a lot about that, meaning take time to understand the code 
and propose...

Anyhow, I have started scratching a wikipedia page for similarities and 
differences between i.cluster and other well known clustering algorithms such 
as the k-means and the ISODATA.  I consider it as necessary after a) having 
read many related threads, b) going line-by-line through the manual and c) 
after working recently in a project which involved "simple clustering" of NDVI 
maps.

I have had already taken several notes (copy-pasted from threads, the manual 
and own wording) about constructing an explanatory/comparative overview 
(text). Never took the time to start the page...

I have started sharing (slowly) these notes at:
<http://grasswiki.osgeo.org/wiki/User:NikosA/About_Clustering>.

This is a draft version which, hopefully, will end-up being a normal GRASS-
Wiki page -- ideally mentioned in the manual of i.cluster as well.

I am not sure who has the time and the will to go through the real hard-work, 
i.e. adjusting the code so as to make it more informative for the user.  I 
will try to support the documentation efforts.

Thank you, Nikos


> > --%<---
> > I_fopen_signature_file_new() found in lib/imagery/sigfile.c
> > --->%--
> > but I can't find/understand if it helps.

> > What about looking at
> > 
> > a) /geo/osgeo/src/grass_trunk/lib/python/ctypes/OBJ.x86_64-unknown-linux-
> > gnu/imagery.py
> > 
> > or
> > 
> > b) /geo/osgeo/src/grass_trunk/lib/python/ctypes/OBJ.x86_64-unknown-linux-
> > gnu/cluster.py:
> > 
> > --%<--- a) lines 756-781 / b) lines 579-600 --%<---
> > # /geo/osgeo/src/grass_trunk/dist.x86_64-unknown-linux-
> > gnu/include/grass/imagery.h: 51
> > 
> > class struct_One_Sig(Structure):
> >    pass
> > 
> > struct_One_Sig.__slots__ = [
> > 
> >    'desc',
> >    'npoints',
> >    'mean',
> >    'var',
> >    'status',
> >    'r',
> >    'g',
> >    'b',
> >    'have_color',
> > 
> > ]
> > struct_One_Sig._fields_ = [
> > 
> >    ('desc', c_char * 100),
> >    ('npoints', c_int),
> >    ('mean', POINTER(c_double)),
> >    ('var', POINTER(POINTER(c_double))),
> >    ('status', c_int),
> >    ('r', c_float),
> >    ('g', c_float),
> >    ('b', c_float),
> >    ('have_color', c_int),
> > 
> > ]
> > --->%--
> > 
> > ?
> > 
> > And then, of course, at
> > /geo/osgeo/src/grass_trunk/dist.x86_64-unknown-linux-
> > gnu/include/grass/imagery.h:
> > 
> > --%<---
> > struct One_Sig
> > {
> > 
> >    char desc[100];
> >    int npoints;
> >    double *mean;		/* one mean for each band */
> >    double **var;		/* covariance band-band   */
> >    int status;		/* may be used to 'delete' a signature */
> >    float r, g, b;		/* color */
> >    int have_color;
> > 
> > };
> > -->%---
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20130326/b335e78d/attachment-0001.pgp>


More information about the grass-user mailing list