<div dir="ltr"><div>Hi Nikos and Moritz, <br></div><div><br></div><div>Thanks for your replies and for the recipe :)</div><br><div>Cheers,</div><div>Vero<br></div></div><br><div class="gmail_quote"><div dir="ltr">El mié., 31 oct. 2018 a las 22:09, Moritz Lennert (<<a href="mailto:mlennert@club.worldonline.be" target="_blank">mlennert@club.worldonline.be</a>>) escribió:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 31/10/18 12:19, Nikos Alexandris wrote:<br>
> * Veronica Andreo <<a href="mailto:veroandreo@gmail.com" target="_blank">veroandreo@gmail.com</a>> [2018-10-31 00:23:57 +0100]:<br>
> <br>
>> Hi devs,<br>
>><br>
> <br>
> Hi Vero,<br>
> <br>
> (not a real dev, but I'll share what I think)<br>
> <br>
>> I'm writing to ask how do one determine the best number of classes/clusters<br>
>> in a set of unsupervised classifications with different k in GRASS?<br>
> <br>
> You already know better than me I guess, but I'd like to refresh my mind<br>
> on all this a bit.<br>
> <br>
> I guess the only way to tell if the number of classes is "best", is to<br>
> judge yourself by inspecting the "quality" of the clusters returned.<br>
> <br>
> One way to tell would be to compute the "error of clusters" which would<br>
> be the overall distance between the points that are assigned to a<br>
> cluster and its center. I guess comparing the overall errors between<br>
> different clustering settings (or even algorithms?), would give an idea<br>
> about how close points are around the centers of clusters.<br>
> Maybe we could implement something like this.<br>
> <br>
> (All this I practiced during an generic Algorithmic Thinking course. I<br>
> guess it's applicable in our "domain" too.)<br>
> <br>
> <br>
>> I use i.cluster with different number of classes and then i.maxlik that uses a<br>
>> modified version of k-means according to the manual page. Now, I would like<br>
>> to know which unsup classif is the best within the set.<br>
> <br>
> Sorry, I guess I have to read up: what is "unsup classif"?<br>
> <br>
>> I check the i.cluster reports (looking for separability) and then explored the<br>
>> rejection maps. But none of those seems to work as a crisp and clear<br>
>> indicator. BTW, does anyone know which separability index does i.cluster<br>
>> use?<br>
> <br>
> <br>
> I am interested to learn about the distance measure too. I am looking<br>
> at the source code of `i.cluster`. And then, searching around, I think<br>
> it's this file:<br>
> <br>
> grasstrunk/lib/cluster/c_sep.c<br>
> <br>
> and I/we just need to identify which distance it measures.<br>
<br>
i.cluster uses a simple k-means approach based on the spectral euclidean <br>
distance between pixels or between pixels and existing clusters. By <br>
including a min cluster size and a min cluster separation parameter, <br>
total number of clusters might change which is different from a <br>
classical k-means.<br>
<br>
i.cluster also works on a sample of the image pixels to define the <br>
clusters, so there is no guarantee that the clusters it identifies would <br>
be those one would find if using all pixels, but AFAIK it is generally <br>
reasonable close to justify the pay-off as it provides greater speed.<br>
<br>
i.maxlik does not interfere in the clustering part. It uses the <br>
signatures of classes provided as input (possibly the signatures of the <br>
clusters if the input is the output of i.cluster) to then assign each <br>
pixel to one of the classes. The reject map of i.maxlik allows you see <br>
the probability of a pixels membership in the chosen class. It does not <br>
really allow you to measure cluster "quality", nor ideal number of <br>
clusters (well you could try with many different cluster numbers and <br>
then chose the one where the reject map values are the lowest on average).<br>
<br>
If you want to use a very simple approach to Nikos' suggestion of <br>
calculating the error, you could use something like this:<br>
<br>
- For each i.cluster + i.maxlik result:<br>
- For each original band<br>
- Create new pseudo band with mean values of the<br>
original band per cluster (r.stats.zonal)<br>
- Calculate euclidean distance in spectral space of each pixel<br>
to its cluster (r.mapcalc):<br>
<br>
(pixel_value_band1 - r.stats.zonal result on band 1)^2 +<br>
(pixel_value_band2 - r.stats.zonal result on band 2)^2 +<br>
etc<br>
<br>
- Calculate mean euclidean distance on the result of (or median,<br>
or whatever you are looking for) (r.univar)<br>
<br>
- Identify the i.cluster + i.maxlik result that reaches the best score<br>
<br>
<br>
Moritz<br>
<br>
</blockquote></div>