<div dir="ltr"><div>Hi Nikos and Moritz, <br></div><div><br></div><div>Thanks for your replies and for the recipe :)</div><br><div>Cheers,</div><div>Vero<br></div></div><br><div class="gmail_quote"><div dir="ltr">El mié., 31 oct. 2018 a las 22:09, Moritz Lennert (<<a href="mailto:mlennert@club.worldonline.be" target="_blank">mlennert@club.worldonline.be</a>>) escribió:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 31/10/18 12:19, Nikos Alexandris wrote:<br>

> * Veronica Andreo <<a href="mailto:veroandreo@gmail.com" target="_blank">veroandreo@gmail.com</a>> [2018-10-31 00:23:57 +0100]:<br>

> <br>

>> Hi devs,<br>

>><br>

> <br>

> Hi Vero,<br>

> <br>

> (not a real dev, but I'll share what I think)<br>

> <br>

>> I'm writing to ask how do one determine the best number of classes/clusters<br>

>> in a set of unsupervised classifications with different k in GRASS?<br>

> <br>

> You already know better than me I guess, but I'd like to refresh my mind<br>

> on all this a bit.<br>

> <br>

> I guess the only way to tell if the number of classes is "best", is to<br>

> judge yourself by inspecting the "quality" of the clusters returned.<br>

> <br>

> One way to tell would be to compute the "error of clusters" which would<br>

> be the overall distance between the points that are assigned to a<br>

> cluster and its center.  I guess comparing the overall errors between<br>

> different clustering settings (or even algorithms?), would give an idea<br>

> about how close points are around the centers of clusters.<br>

> Maybe we could implement something like this.<br>

> <br>

> (All this I practiced during an generic Algorithmic Thinking course.  I<br>

> guess it's applicable in our "domain" too.)<br>

> <br>

> <br>

>> I use i.cluster with different number of classes and then i.maxlik that uses a<br>

>> modified version of k-means according to the manual page. Now, I would like<br>

>> to know which unsup classif is the best within the set.<br>

> <br>

> Sorry, I guess I have to read up:  what is "unsup classif"?<br>

> <br>

>> I check the i.cluster reports (looking for separability) and then explored the<br>

>> rejection maps. But none of those seems to work as a crisp and clear<br>

>> indicator. BTW, does anyone know which separability index does i.cluster<br>

>> use?<br>

> <br>

> <br>

> I am interested to learn about the distance measure too.  I am looking<br>

> at the source code of `i.cluster`.  And then, searching around, I think<br>

> it's this file:<br>

> <br>

> grasstrunk/lib/cluster/c_sep.c<br>

> <br>

> and I/we just need to identify which distance it measures.<br>

<br>

i.cluster uses a simple k-means approach based on the spectral euclidean <br>

distance between pixels or between pixels and existing clusters. By <br>

including a min cluster size and a min cluster separation parameter, <br>

total number of clusters might change which is different from a <br>

classical k-means.<br>

<br>

i.cluster also works on a sample of the image pixels to define the <br>

clusters, so there is no guarantee that the clusters it identifies would <br>

be those one would find if using all pixels, but AFAIK it is generally <br>

reasonable close to justify the pay-off as it provides greater speed.<br>

<br>

i.maxlik does not interfere in the clustering part. It uses the <br>

signatures of classes provided as input (possibly the signatures of the <br>

clusters if the input is the output of i.cluster) to then assign each <br>

pixel to one of the classes. The reject map of i.maxlik allows you see <br>

the probability of a pixels membership in the chosen class. It does not <br>

really allow you to measure cluster "quality", nor ideal number of <br>

clusters (well you could try with many different cluster numbers and <br>

then chose the one where the reject map values are the lowest on average).<br>

<br>

If you want to use a very simple approach to Nikos' suggestion of <br>

calculating the error, you could use something like this:<br>

<br>

- For each i.cluster + i.maxlik result:<br>

        - For each original band<br>

                - Create new pseudo band with mean values of the<br>

                        original band per cluster (r.stats.zonal)<br>

        - Calculate euclidean distance in spectral space of each pixel<br>

                        to its cluster (r.mapcalc):<br>

<br>

                (pixel_value_band1 - r.stats.zonal result on band 1)^2 +<br>

                (pixel_value_band2 - r.stats.zonal result on band 2)^2 +<br>

                etc<br>

<br>

        - Calculate mean euclidean distance on the result of (or median,<br>

                or whatever you are looking for) (r.univar)<br>

<br>

- Identify the i.cluster + i.maxlik result that reaches the best score<br>

<br>

<br>

Moritz<br>

<br>

</blockquote></div>