[GRASS-SVN] r68629 - grass-addons/grass7/vector/v.class.mlR

svn_grass at osgeo.org svn_grass at osgeo.org
Tue Jun 7 06:06:56 PDT 2016


Author: mlennert
Date: 2016-06-07 06:06:56 -0700 (Tue, 07 Jun 2016)
New Revision: 68629

Modified:
   grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
   grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
Log:
v.class.mlR: added customization of cross-validation and tuning, plus parallel processing of tuning


Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
===================================================================
--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html	2016-06-07 10:40:44 UTC (rev 68628)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html	2016-06-07 13:06:56 UTC (rev 68629)
@@ -30,8 +30,17 @@
 a reasonable set of values for tuning. See the 
 <a href="http://topepo.github.io/caret/modelList.html">caret webpage</a> 
 for more information about the tuning parameters for each classifier, and
-more generally for the information about how caret works.
+more generally for the information about how caret works. By default, the
+module creates 10 5-fold partitions for cross-validation and tests 10 
+possible values of the tuning parameters. These values can be changed 
+using, repectively, the <em>partitions</em>, <em>folds</em> and
+<em>tunelength</em> parameters.
 
+<p>The module can run the model tuning using parallel processing. In order
+for this to work, the R-package <em>doParallel</em> has to be installed. The
+<em>processes</em> parameter allows to chose the number of processes to
+run.
+
 <p>The user can chose to include the individual classifiers results in
 the output using the <em>i</em> flag, but by default the output will be
 the result of a voting scheme merging the results of the different 
@@ -69,10 +78,16 @@
 and the addon <em>i.segment.stats</em> for object-based classification of 
 satellite imagery.
 
+<p><em>WARNING:</em> The option output files are created by R and currently
+no checking is done of whether files of the same name already exist. If they 
+exist, they are silently overwritten, regardless of whether the GRASS GIS 
+<em>--o</em> flag is set or not.
+
 <h2>DEPENDENCIES</h2>
 
-<p>This modules uses R. The following R-packages have to be installed to be able to use this
-module: 'caret', 'kernlab', 'randomForest', 'rpart', 'ggplot2', 'lattice'.
+<p>This modules uses R. The following R-packages have to be installed to be 
+able to use this module: 'caret', 'kernlab', 'randomForest', 'rpart', 
+'ggplot2', 'lattice', 'doParallel' (if parallel processing is desired).
 
 <h2>TODO</h2>
 

Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
===================================================================
--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py	2016-06-07 10:40:44 UTC (rev 68628)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py	2016-06-07 13:06:56 UTC (rev 68629)
@@ -98,6 +98,30 @@
 #% answer: svmRadial,rf,rpart,knn,knn1
 #%end
 #%option
+#% key: folds
+#% type: integer
+#% description: Number of folds to use for cross-validation
+#% required: yes
+#% answer: 5
+#% guisection: Cross-validation and voting
+#%end
+#%option
+#% key: partitions
+#% type: integer
+#% description: Number of different partitions to use for cross-validation
+#% required: yes
+#% answer: 10
+#% guisection: Cross-validation and voting
+#%end
+#%option
+#% key: tunelength
+#% type: integer
+#% description: Number of levels to test for each tuning parameter
+#% required: yes
+#% answer: 10
+#% guisection: Cross-validation and voting
+#%end
+#%option
 #% key: weighting_modes
 #% type: string
 #% description: Type of weighting to use
@@ -105,6 +129,7 @@
 #% multiple: yes
 #% options: smv,swv,bwwv,qbwwv
 #% answer: smv
+#% guisection: Cross-validation and voting
 #%end
 #%option
 #% key: weighting_metric
@@ -113,6 +138,7 @@
 #% required: yes
 #% options: accuracy,kappa
 #% answer: accuracy
+#% guisection: Cross-validation and voting
 #%end
 #%option G_OPT_F_OUTPUT
 #% key: classification_results
@@ -144,9 +170,16 @@
 #% required: no
 #% guisection: Optional output
 #%end
+#%option
+#% key: processes
+#% type: integer
+#% description: Number of processes to run in parallel
+#% answer: 1
+#%end
 #%flag
 #% key: f
 #% description: Only write results to text file, do not update vector map
+#% guisection: Optional output
 #%end
 #%flag
 #% key: i
@@ -233,6 +266,10 @@
     classifiers = options['classifiers'].split(',')
     weighting_modes = options['weighting_modes'].split(',')
     weighting_metric = options['weighting_metric']
+    processes = int(options['processes'])
+    folds = options['folds']
+    partitions = options['partitions']
+    tunelength = options['tunelength']
 
     classification_results = None
     if options['classification_results']:
@@ -306,7 +343,13 @@
     r_file.write("\n")
     r_file.write("training$%s <- as.factor(training$%s)" % (classcol, classcol))
     r_file.write("\n")
-    r_file.write("MyFolds.cv <- createMultiFolds(training$%s, k=5, times=10)" % classcol)
+    if processes > 1:
+        r_file.write("library(doParallel)")
+        r_file.write("\n")
+        r_file.write("registerDoParallel(cores = %d)" % processes)
+        r_file.write("\n")
+    r_file.write("MyFolds.cv <- createMultiFolds(training$%s, k=%s, times=%s)" %
+            (classcol, folds, partitions))
     r_file.write("\n")
     r_file.write("MyControl.cv <- trainControl(method='repeatedCV', index=MyFolds.cv)")
     r_file.write("\n")
@@ -323,7 +366,8 @@
             r_file.write("models.cv$knn1 <- knn1Model.cv")
             r_file.write("\n")
         else:
-            r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv,tuneLength=10)" % (classifier, classifier))
+            r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv,tuneLength=%s)" % (classifier,
+                        classifier, tunelength))
             r_file.write("\n")
             r_file.write("models.cv$%s <- %sModel.cv" % (classifier, classifier))
             r_file.write("\n")



More information about the grass-commit mailing list