[GRASS-SVN] r68629 - grass-addons/grass7/vector/v.class.mlR
svn_grass at osgeo.org
svn_grass at osgeo.org
Tue Jun 7 06:06:56 PDT 2016
Author: mlennert
Date: 2016-06-07 06:06:56 -0700 (Tue, 07 Jun 2016)
New Revision: 68629
Modified:
grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
Log:
v.class.mlR: added customization of cross-validation and tuning, plus parallel processing of tuning
Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html
===================================================================
--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html 2016-06-07 10:40:44 UTC (rev 68628)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.html 2016-06-07 13:06:56 UTC (rev 68629)
@@ -30,8 +30,17 @@
a reasonable set of values for tuning. See the
<a href="http://topepo.github.io/caret/modelList.html">caret webpage</a>
for more information about the tuning parameters for each classifier, and
-more generally for the information about how caret works.
+more generally for the information about how caret works. By default, the
+module creates 10 5-fold partitions for cross-validation and tests 10
+possible values of the tuning parameters. These values can be changed
+using, repectively, the <em>partitions</em>, <em>folds</em> and
+<em>tunelength</em> parameters.
+<p>The module can run the model tuning using parallel processing. In order
+for this to work, the R-package <em>doParallel</em> has to be installed. The
+<em>processes</em> parameter allows to chose the number of processes to
+run.
+
<p>The user can chose to include the individual classifiers results in
the output using the <em>i</em> flag, but by default the output will be
the result of a voting scheme merging the results of the different
@@ -69,10 +78,16 @@
and the addon <em>i.segment.stats</em> for object-based classification of
satellite imagery.
+<p><em>WARNING:</em> The option output files are created by R and currently
+no checking is done of whether files of the same name already exist. If they
+exist, they are silently overwritten, regardless of whether the GRASS GIS
+<em>--o</em> flag is set or not.
+
<h2>DEPENDENCIES</h2>
-<p>This modules uses R. The following R-packages have to be installed to be able to use this
-module: 'caret', 'kernlab', 'randomForest', 'rpart', 'ggplot2', 'lattice'.
+<p>This modules uses R. The following R-packages have to be installed to be
+able to use this module: 'caret', 'kernlab', 'randomForest', 'rpart',
+'ggplot2', 'lattice', 'doParallel' (if parallel processing is desired).
<h2>TODO</h2>
Modified: grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py
===================================================================
--- grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py 2016-06-07 10:40:44 UTC (rev 68628)
+++ grass-addons/grass7/vector/v.class.mlR/v.class.mlR.py 2016-06-07 13:06:56 UTC (rev 68629)
@@ -98,6 +98,30 @@
#% answer: svmRadial,rf,rpart,knn,knn1
#%end
#%option
+#% key: folds
+#% type: integer
+#% description: Number of folds to use for cross-validation
+#% required: yes
+#% answer: 5
+#% guisection: Cross-validation and voting
+#%end
+#%option
+#% key: partitions
+#% type: integer
+#% description: Number of different partitions to use for cross-validation
+#% required: yes
+#% answer: 10
+#% guisection: Cross-validation and voting
+#%end
+#%option
+#% key: tunelength
+#% type: integer
+#% description: Number of levels to test for each tuning parameter
+#% required: yes
+#% answer: 10
+#% guisection: Cross-validation and voting
+#%end
+#%option
#% key: weighting_modes
#% type: string
#% description: Type of weighting to use
@@ -105,6 +129,7 @@
#% multiple: yes
#% options: smv,swv,bwwv,qbwwv
#% answer: smv
+#% guisection: Cross-validation and voting
#%end
#%option
#% key: weighting_metric
@@ -113,6 +138,7 @@
#% required: yes
#% options: accuracy,kappa
#% answer: accuracy
+#% guisection: Cross-validation and voting
#%end
#%option G_OPT_F_OUTPUT
#% key: classification_results
@@ -144,9 +170,16 @@
#% required: no
#% guisection: Optional output
#%end
+#%option
+#% key: processes
+#% type: integer
+#% description: Number of processes to run in parallel
+#% answer: 1
+#%end
#%flag
#% key: f
#% description: Only write results to text file, do not update vector map
+#% guisection: Optional output
#%end
#%flag
#% key: i
@@ -233,6 +266,10 @@
classifiers = options['classifiers'].split(',')
weighting_modes = options['weighting_modes'].split(',')
weighting_metric = options['weighting_metric']
+ processes = int(options['processes'])
+ folds = options['folds']
+ partitions = options['partitions']
+ tunelength = options['tunelength']
classification_results = None
if options['classification_results']:
@@ -306,7 +343,13 @@
r_file.write("\n")
r_file.write("training$%s <- as.factor(training$%s)" % (classcol, classcol))
r_file.write("\n")
- r_file.write("MyFolds.cv <- createMultiFolds(training$%s, k=5, times=10)" % classcol)
+ if processes > 1:
+ r_file.write("library(doParallel)")
+ r_file.write("\n")
+ r_file.write("registerDoParallel(cores = %d)" % processes)
+ r_file.write("\n")
+ r_file.write("MyFolds.cv <- createMultiFolds(training$%s, k=%s, times=%s)" %
+ (classcol, folds, partitions))
r_file.write("\n")
r_file.write("MyControl.cv <- trainControl(method='repeatedCV', index=MyFolds.cv)")
r_file.write("\n")
@@ -323,7 +366,8 @@
r_file.write("models.cv$knn1 <- knn1Model.cv")
r_file.write("\n")
else:
- r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv,tuneLength=10)" % (classifier, classifier))
+ r_file.write("%sModel.cv <- train(fmla,training,method='%s', trControl=MyControl.cv,tuneLength=%s)" % (classifier,
+ classifier, tunelength))
r_file.write("\n")
r_file.write("models.cv$%s <- %sModel.cv" % (classifier, classifier))
r_file.write("\n")
More information about the grass-commit
mailing list