[GRASS-SVN] r68576 - grass-addons/grass7/vector/v.class.mlR
svn_grass at osgeo.org
svn_grass at osgeo.org
Thu Jun 2 08:07:49 PDT 2016
Author: mlennert
Date: 2016-06-02 08:07:49 -0700 (Thu, 02 Jun 2016)
New Revision: 68576
Modified:
grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
Log:
v.class.mlR: added choice of weighting metric and improved documentation
Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
===================================================================
--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html 2016-06-02 10:40:29 UTC (rev 68575)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html 2016-06-02 15:07:49 UTC (rev 68576)
@@ -5,37 +5,55 @@
for machine learning in R to classify features using training features
by supervised learning.
-<p>The user can provide input either as vector maps, or as csv files, or
-a combination of both. Output can consist of either additional columns in
-the vector input map of features, a text file or reclassed raster maps.
+<p>The user can provide input either as vector maps (<em>segments_map</em>
+and <em>training_map</em>, or as csv files (<em>segments_file</em> and
+<em>training file</em>, or a combination of both. Csv files have to be
+formatted in line with the default output of
+<a href"v.db.select.html">v.db.select</a>, i.e. with a header and the
+pipe character as field separator. Output can consist of either
+additional columns in the vector input map of features, a text file
+(<em>classification_results</em>) or reclassed raster maps
+(<em>classified_map</em>).
+<p>The user has to provide the name of the column in the training data
+that contains the class values (<em>train_class_column</em>), the prefix
+of the columns that will contain the final class after classification
+(<em>output_class_column</em>) as well as the prefix of the columns that
+will contain the probability values linked to these classifications
+(<em>output_prob_column</em> - see below).
+
<p>Different classifiers are proposed: k-nearest neighbor (knn and knn1
for k=1), support vector machine with a radial kernel (svmRadial), random
forest (rf) and recursive partitioning (rpart). Each of these classifiers
-is tuned automatically throught repeated cross-validation. See the
-<a href="https://topepo.github.io/caret/index.html">caret webpage</a> for
+is tuned automatically throught repeated cross-validation. caret will
+automatically determine a reasonable set of values for tuning. See the
+<a href="http://topepo.github.io/caret/modelList.html">caret webpage</a> for
more information about the tuning parameters for each classifier, and
more generally for the information about how caret works.
<p>The user can chose to include the individual classifiers results in
the output using the <em>i</em> flag, but by default the output will be
the result of a voting scheme merging the results of the different
-classifiers. The voting schemes available are: simple majority vote without
-weighting (smv), simple weighted majority vote (swv), best-worst weighted
-vote (bwwv) and quadratic best-worst weighted vote (qbwwv). For more details
-about these voting schemes see [TODO: include reference].
+classifiers. Votes can be weighted according to a user-defined mode
+(<em>weighting_mode</em>): simple majority vote without weighting, i.e.
+all weights are equal (smv), simple weighted majority vote (swv),
+best-worst weighted vote (bwwv) and quadratic best-worst weighted vote
+(qbwwv). For more details about these voting modes see [TODO: include
+reference]. By default, the weights are calculated based on the accuracy
+metric, but the user can chose the kappa value as an alternative
+(<em>weighting_metric</em>).
<p>In the output (as attribute columns or text file) each weighting schemes
-result is provided accompanied by an estimation of the probability of the
-classification, based on the equation used in [TODO: include reference].
+result is provided accompanied by a value that can be considered as an
+estimation of the probability of the classification after weighted vote,
+based on the equation used in [TODO: include reference].
<p>Optional output of the module include a box-and-whisker plot indicating
-the variance of the cross-validation results for each classifier
+the resampling variance based on the cross-validation for each classifier
(<em>bw_plot_file</em>) and a csv file containing accuracy measures (overall
accuracy and kappa) for each classifier (<em>accuracy_file</em>). The user
can also chose to write the R script constructed and used internally to a text
file for study or further modification.
-
<h2>NOTES</h2>
<p>
@@ -50,7 +68,12 @@
<h2>TODO</h2>
-Add automagic installation of missing R packages.
+<ul>
+ <li>Add automagic installation of missing R packages.</li>
+ <li>Add output with confusion matrix
+ <li>Add option to manually define grid of tuning parameters</li>
+</ul>
+-
<h2>EXAMPLE</h2>
Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
===================================================================
--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py 2016-06-02 10:40:29 UTC (rev 68575)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py 2016-06-02 15:07:49 UTC (rev 68576)
@@ -100,6 +100,14 @@
#% options: smv,swv,bwwv,qbwwv
#% answer: smv
#%end
+#%option
+#% key: weighting_metric
+#% type: string
+#% description: Metric to use for weighting
+#% required: yes
+#% options: accuracy,kappa
+#% answer: accuracy
+#%end
#%option G_OPT_F_OUTPUT
#% key: classification_results
#% description: File for saving results of all classifiers
@@ -185,10 +193,10 @@
voting_function += "return(list(maj_class=maj_class, prob=prob))\n}"
weighting_functions = {}
- weighting_functions['smv'] = "weights <- rep(1/length(accuracy_means), length(accuracy_means))"
- weighting_functions['swv'] = "weights <- accuracy_means/sum(accuracy_means)"
- weighting_functions['bwwv'] = "weights <- 1-(max(accuracy_means) - accuracy_means)/(max(accuracy_means) - min(accuracy_means))"
- weighting_functions['qbwwv'] = "weights <- ((min(accuracy_means) - accuracy_means)/(max(accuracy_means) - min(accuracy_means)))**2"
+ weighting_functions['smv'] = "weights <- rep(1/length(weighting_base), length(weighting_base))"
+ weighting_functions['swv'] = "weights <- weighting_base/sum(weighting_base)"
+ weighting_functions['bwwv'] = "weights <- 1-(max(weighting_base) - weighting_base)/(max(weighting_base) - min(weighting_base))"
+ weighting_functions['qbwwv'] = "weights <- ((min(weighting_base) - weighting_base)/(max(weighting_base) - min(weighting_base)))**2"
if options['segments_map']:
allfeatures = options['segments_map']
@@ -211,6 +219,7 @@
output_probcol = options['output_prob_column']
classifiers = options['classifiers'].split(',')
weighting_modes = options['weighting_modes'].split(',')
+ weighting_metric = options['weighting_metric']
classification_results = None
if options['classification_results']:
@@ -320,6 +329,11 @@
r_file.write(voting_function)
r_file.write("\n")
+ if weighting_metric == 'kappa':
+ r_file.write("weighting_base <- kappa_means")
+ else:
+ r_file.write("weighting_base <- accuracy_means")
+ r_file.write("\n")
for weighting_mode in weighting_modes:
r_file.write(weighting_functions[weighting_mode])
r_file.write("\n")
More information about the grass-commit
mailing list