[GRASS-SVN] r70086 - grass-addons/grass7/vector/v.class.mlR
svn_grass at osgeo.org
svn_grass at osgeo.org
Sat Dec 17 11:21:45 PST 2016
Author: mlennert
Date: 2016-12-17 11:21:45 -0800 (Sat, 17 Dec 2016)
New Revision: 70086
Modified:
grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
Log:
v.class.mlR: added option to customize tunegrids and automatic installation of e1071 and doParallel R-packages
Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
===================================================================
--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html 2016-12-17 14:13:10 UTC (rev 70085)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html 2016-12-17 19:21:45 UTC (rev 70086)
@@ -36,22 +36,29 @@
using, repectively, the <em>partitions</em>, <em>folds</em> and
<em>tunelength</em> parameters.
+<p>The user can define a customized tunegrid for each classifier, using
+the <em>tunegrids</em> parameter. Any customized tunegrid has to be defined
+as a Python dictionary, with the classifiers as keys, and tunegrid data frames
+as content as defined
+<a href="http://topepo.github.io/caret/model-training-and-tuning.html#alternate-tuning-grids">
+ in the caret documentation</a>.
+
<p>The module can run the model tuning using parallel processing. In order
for this to work, the R-package <em>doParallel</em> has to be installed. The
<em>processes</em> parameter allows to chose the number of processes to
run.
<p>The user can chose to include the individual classifiers results in
-the output using the <em>i</em> flag, but by default the output will be
-the result of a voting scheme merging the results of the different
-classifiers. Votes can be weighted according to a user-defined mode
-(<em>weighting_mode</em>): simple majority vote without weighting, i.e.
-all weights are equal (smv), simple weighted majority vote (swv),
-best-worst weighted vote (bwwv) and quadratic best-worst weighted vote
-(qbwwv). For more details about these voting modes see Moreno-Seco et al
-(2006). By default, the weights are calculated based on the accuracy
-metric, but the user can chose the kappa value as an alternative
-(<em>weighting_metric</em>).
+the output (the attributes and/or the raster maps) using the <em>i</em>
+flag, but by default the output will be the result of a voting scheme
+merging the results of the different classifiers. Votes can be weighted
+according to a user-defined mode (<em>weighting_mode</em>): simple majority
+vote without weighting, i.e. all weights are equal (smv), simple weighted
+majority vote (swv), best-worst weighted vote (bwwv) and quadratic
+best-worst weighted vote (qbwwv). For more details about these voting
+modes see Moreno-Seco et al (2006). By default, the weights are calculated
+based on the accuracy metric, but the user can chose the kappa value as an
+alternative (<em>weighting_metric</em>).
<p>In the output (as attribute columns or text file) each weighting schemes
result is provided accompanied by a value that can be considered as an
@@ -89,15 +96,14 @@
<h2>DEPENDENCIES</h2>
-<p>This modules uses R. The following R-packages have to be installed to be
-able to use this module: 'caret', 'kernlab', 'randomForest', 'rpart',
-'ggplot2', 'lattice', 'doParallel' (if parallel processing is desired).
+<p>This module uses R. It tries to install necessary R packages automatically
+if necessary. These include : 'caret', 'kernlab', 'e1071', 'randomForest', and 'rpart'.
+Other packages can be necessary such as 'ggplot2', 'lattice' (for the plots),
+and 'doParallel' (if parallel processing is desired).
<h2>TODO</h2>
<ul>
- <li>Add automagic installation of missing R packages.</li>
- <li>Add option to manually define grid of tuning parameters</li>
<li>Check for existing file created by R as no overwrite check is
done in R</li>
<li>Use class probabilities determined by individual classifiers
Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
===================================================================
--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py 2016-12-17 14:13:10 UTC (rev 70085)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py 2016-12-17 19:21:45 UTC (rev 70086)
@@ -112,8 +112,8 @@
#% description: Classifiers to use
#% required: yes
#% multiple: yes
-#% options: svmRadial,rf,rpart,knn,knn1
-#% answer: svmRadial,rf,rpart,knn,knn1
+#% options: svmRadial,rf,rpart,C5.0,knn,knn1
+#% answer: svmRadial,rf,rpart,C5.0,knn,knn1
#%end
#%option
#% key: folds
@@ -140,6 +140,13 @@
#% guisection: Cross-validation and voting
#%end
#%option
+#% key: tunegrids
+#% type: string
+#% description: Python dictionary of customized tunegrids
+#% required: no
+#% guisection: Cross-validation and voting
+#%end
+#%option
#% key: weighting_modes
#% type: string
#% description: Type of weighting to use
@@ -217,6 +224,7 @@
import atexit
import subprocess
import os, shutil
+from ast import literal_eval
import grass.script as gscript
def cleanup():
@@ -253,7 +261,6 @@
model_output = model_output_desc = temptable = r_commands = None
reclass_files = None
- packages = {'svmRadial': 'kernlab', 'rf': 'randomForest', 'rpart': 'rpart'}
voting_function = "voting <- function (x, w) {\n"
voting_function += "res <- tapply(w, x, sum, simplify = TRUE)\n"
voting_function += "maj_class <- as.numeric(names(res)[which.max(res)])\n"
@@ -266,13 +273,15 @@
weighting_functions['bwwv'] = "weights <- 1-(max(weighting_base) - weighting_base)/(max(weighting_base) - min(weighting_base))"
weighting_functions['qbwwv'] = "weights <- ((min(weighting_base) - weighting_base)/(max(weighting_base) - min(weighting_base)))**2"
+ packages = {'svmRadial': 'kernlab', 'rf': 'randomForest', 'rpart': 'rpart', 'C5.0': 'C50'}
+
install_package = "if(!is.element('%s', installed.packages()[,1])){\n"
install_package += "cat('\\n\\nInstalling %s package from CRAN\n')\n"
install_package += "if(!file.exists(Sys.getenv('R_LIBS_USER'))){\n"
install_package += "dir.create(Sys.getenv('R_LIBS_USER'), recursive=TRUE)\n"
install_package += ".libPaths(Sys.getenv('R_LIBS_USER'))}\n"
install_package += "chooseCRANmirror(ind=1)\n"
- install_package += "install.packages('%s')}"
+ install_package += "install.packages('%s', dependencies=TRUE)}"
if options['segments_map']:
allfeatures = options['segments_map']
@@ -299,10 +308,17 @@
weighting_modes = options['weighting_modes'].split(',')
weighting_metric = options['weighting_metric']
processes = int(options['processes'])
+ if processes > 1:
+ install = install_package % ('doParallel', 'doParallel', 'doParallel')
+ r_file.write(install)
+ r_file.write("\n")
+
+
folds = options['folds']
partitions = options['partitions']
tunelength = options['tunelength']
separator = gscript.separator(options['separator'])
+ tunegrids = literal_eval(options['tunegrids']) if options['tunegrids'] else {}
classification_results = None
if options['classification_results']:
@@ -363,6 +379,9 @@
install = install_package % ('caret', 'caret', 'caret')
r_file.write(install)
r_file.write("\n")
+ install = install_package % ('e1071', 'e1071', 'e1071')
+ r_file.write(install)
+ r_file.write("\n")
for classifier in classifiers:
# knn is included in caret
if classifier == "knn" or classifier == "knn1":
@@ -403,8 +422,14 @@
r_file.write("models.cv$knn1 <- knn1Model.cv")
r_file.write("\n")
else:
- r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv,tuneLength=%s)" % (classifier,
- classifier, tunelength))
+ if classifier in tunegrids:
+ r_file.write("Grid <- expand.grid(%s)" % tunegrids[classifier])
+ r_file.write("\n")
+ r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv, tuneGrid=Grid)" % (classifier,
+ classifier))
+ else:
+ r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv, tuneLength=%s)" % (classifier,
+ classifier, tunelength))
r_file.write("\n")
r_file.write("models.cv$%s <- %sModel.cv" % (classifier, classifier))
r_file.write("\n")
More information about the grass-commit
mailing list