[GRASS-SVN] r70086 - grass-addons/grass7/vector/v.class.mlR

Sat Dec 17 11:21:45 PST 2016

Author: mlennert
Date: 2016-12-17 11:21:45 -0800 (Sat, 17 Dec 2016)
New Revision: 70086

Modified:
   grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
   grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
Log:
v.class.mlR: added option to customize tunegrids and automatic installation of e1071 and doParallel R-packages

Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
===================================================================

--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html	2016-12-17 14:13:10 UTC (rev 70085)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html	2016-12-17 19:21:45 UTC (rev 70086)
@@ -36,22 +36,29 @@
 using, repectively, the <em>partitions</em>, <em>folds</em> and
 <em>tunelength</em> parameters.
 
+<p>The user can define a customized tunegrid for each classifier, using
+the <em>tunegrids</em> parameter. Any customized tunegrid has to be defined
+as a Python dictionary, with the classifiers as keys, and tunegrid data frames
+as content as defined 
+<a href="http://topepo.github.io/caret/model-training-and-tuning.html#alternate-tuning-grids">
+	in the caret documentation</a>.
+
 <p>The module can run the model tuning using parallel processing. In order
 for this to work, the R-package <em>doParallel</em> has to be installed. The
 <em>processes</em> parameter allows to chose the number of processes to
 run.
 
 <p>The user can chose to include the individual classifiers results in
-the output using the <em>i</em> flag, but by default the output will be
-the result of a voting scheme merging the results of the different 
-classifiers. Votes can be weighted according to a user-defined mode 
-(<em>weighting_mode</em>): simple majority vote without weighting, i.e. 
-all weights are equal (smv), simple weighted majority vote (swv), 
-best-worst weighted vote (bwwv) and quadratic best-worst weighted vote 
-(qbwwv). For more details about these voting modes see Moreno-Seco et al 
-(2006). By default, the weights are calculated based on the accuracy 
-metric, but the user can chose the kappa value as an alternative 
-(<em>weighting_metric</em>).
+the output (the attributes and/or the raster maps) using the <em>i</em>
+flag, but by default the output will be the result of a voting scheme 
+merging the results of the different classifiers. Votes can be weighted 
+according to a user-defined mode (<em>weighting_mode</em>): simple majority
+vote without weighting, i.e. all weights are equal (smv), simple weighted 
+majority vote (swv), best-worst weighted vote (bwwv) and quadratic 
+best-worst weighted vote (qbwwv). For more details about these voting 
+modes see Moreno-Seco et al (2006). By default, the weights are calculated 
+based on the accuracy metric, but the user can chose the kappa value as an 
+alternative (<em>weighting_metric</em>).
 
 <p>In the output (as attribute columns or text file) each weighting schemes 
 result is provided accompanied by a value that can be considered as an
@@ -89,15 +96,14 @@
 
 <h2>DEPENDENCIES</h2>
 
-<p>This modules uses R. The following R-packages have to be installed to be 
-able to use this module: 'caret', 'kernlab', 'randomForest', 'rpart', 
-'ggplot2', 'lattice', 'doParallel' (if parallel processing is desired).
+<p>This module uses R. It tries to install necessary R packages automatically
+if necessary. These include : 'caret', 'kernlab', 'e1071', 'randomForest', and 'rpart'. 
+Other packages can be necessary such as 'ggplot2', 'lattice' (for the plots), 
+and 'doParallel' (if parallel processing is desired).
 
 <h2>TODO</h2>
 
 <ul>
-	<li>Add automagic installation of missing R packages.</li>
-	<li>Add option to manually define grid of tuning parameters</li>
 	<li>Check for existing file created by R as no overwrite check is 
 		done in R</li>
 	<li>Use class probabilities determined by individual classifiers

Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
===================================================================
--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py	2016-12-17 14:13:10 UTC (rev 70085)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py	2016-12-17 19:21:45 UTC (rev 70086)
@@ -112,8 +112,8 @@
 #% description: Classifiers to use
 #% required: yes
 #% multiple: yes
-#% options: svmRadial,rf,rpart,knn,knn1
-#% answer: svmRadial,rf,rpart,knn,knn1
+#% options: svmRadial,rf,rpart,C5.0,knn,knn1
+#% answer: svmRadial,rf,rpart,C5.0,knn,knn1
 #%end
 #%option
 #% key: folds
@@ -140,6 +140,13 @@
 #% guisection: Cross-validation and voting
 #%end
 #%option
+#% key: tunegrids
+#% type: string
+#% description: Python dictionary of customized tunegrids
+#% required: no
+#% guisection: Cross-validation and voting
+#%end
+#%option
 #% key: weighting_modes
 #% type: string
 #% description: Type of weighting to use
@@ -217,6 +224,7 @@
 import atexit
 import subprocess
 import os, shutil
+from ast import literal_eval
 import grass.script as gscript
 
 def cleanup():
@@ -253,7 +261,6 @@
     model_output = model_output_desc  =  temptable  =  r_commands = None
     reclass_files = None
 
-    packages = {'svmRadial': 'kernlab', 'rf': 'randomForest', 'rpart': 'rpart'}
     voting_function = "voting <- function (x, w) {\n"
     voting_function += "res <- tapply(w, x, sum, simplify = TRUE)\n"
     voting_function += "maj_class <- as.numeric(names(res)[which.max(res)])\n"
@@ -266,13 +273,15 @@
     weighting_functions['bwwv'] = "weights <- 1-(max(weighting_base) - weighting_base)/(max(weighting_base) - min(weighting_base))"
     weighting_functions['qbwwv'] = "weights <- ((min(weighting_base) - weighting_base)/(max(weighting_base) - min(weighting_base)))**2"
 
+    packages = {'svmRadial': 'kernlab', 'rf': 'randomForest', 'rpart': 'rpart', 'C5.0': 'C50'}
+
     install_package = "if(!is.element('%s', installed.packages()[,1])){\n"
     install_package += "cat('\\n\\nInstalling %s package from CRAN\n')\n"
     install_package += "if(!file.exists(Sys.getenv('R_LIBS_USER'))){\n"
     install_package += "dir.create(Sys.getenv('R_LIBS_USER'), recursive=TRUE)\n"
     install_package += ".libPaths(Sys.getenv('R_LIBS_USER'))}\n"
     install_package += "chooseCRANmirror(ind=1)\n"
-    install_package += "install.packages('%s')}"
+    install_package += "install.packages('%s', dependencies=TRUE)}"
 
     if options['segments_map']:
         allfeatures = options['segments_map']
@@ -299,10 +308,17 @@
     weighting_modes = options['weighting_modes'].split(',')
     weighting_metric = options['weighting_metric']
     processes = int(options['processes'])
+    if processes > 1:
+	install = install_package % ('doParallel', 'doParallel', 'doParallel')
+	r_file.write(install)
+	r_file.write("\n")
+
+        
     folds = options['folds']
     partitions = options['partitions']
     tunelength = options['tunelength']
     separator = gscript.separator(options['separator'])
+    tunegrids = literal_eval(options['tunegrids']) if options['tunegrids'] else {}
 
     classification_results = None
     if options['classification_results']:
@@ -363,6 +379,9 @@
     install = install_package % ('caret', 'caret', 'caret')
     r_file.write(install)
     r_file.write("\n")
+    install = install_package % ('e1071', 'e1071', 'e1071')
+    r_file.write(install)
+    r_file.write("\n")
     for classifier in classifiers:
         # knn is included in caret
 	if classifier == "knn" or classifier == "knn1":
@@ -403,8 +422,14 @@
             r_file.write("models.cv$knn1 <- knn1Model.cv")
             r_file.write("\n")
         else:
-            r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv,tuneLength=%s)" % (classifier,
-                        classifier, tunelength))
+            if classifier in tunegrids:
+                r_file.write("Grid <- expand.grid(%s)" % tunegrids[classifier])
+                r_file.write("\n")
+                r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv, tuneGrid=Grid)" % (classifier,
+                            classifier))
+            else:
+                r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv, tuneLength=%s)" % (classifier,
+                            classifier, tunelength))
             r_file.write("\n")
             r_file.write("models.cv$%s <- %sModel.cv" % (classifier, classifier))
             r_file.write("\n")