[GRASS-dev] how to determine best k in a set of unsupervised classifications?

Tue Oct 30 16:23:57 PDT 2018

Hi devs,

I'm writing to ask how do one determine the best number of classes/clusters
in a set of unsupervised classifications with different k in GRASS? I use
i.cluster with different number of classes and then i.maxlik that uses a
modified version of k-means according to the manual page. Now, I would like
to know which unsup classif is the best within the set. I check the
i.cluster reports (looking for separability) and then explored the
rejection maps. But none of those seems to work as a crisp and clear
indicator. BTW, does anyone know which separability index does i.cluster
use?

In any case, I have seen some indices elsewhere (mainly R and Python) that
are used to choose the best clustering results (coming from the same or
different clustering methods). Examples of those indices are Silhouette,
Dunn, etc. Some are called internal as they do not require test data and
just characterize the compactness of clusters. On the other hand, the ones
requiring test data are called external. I have seen them in dtwclust R
package [0] (the package is oriented to time series clustering but
validation indices are more general) and in scikit-learn in Python [1].
Does any of you have something already implemented in this direction? or
how do you assess your unsup classification (clustering) results?

Any ideas or suggestions within GRASS?

Thanks much in advance!
Vero

[0] https://rdrr.io/cran/dtwclust/man/cvi.html
[1]
http://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20181031/cf2490d9/attachment.html>