[GRASS-SVN] r70254 - grass-addons/grass7/raster/r.learn.ml

svn_grass at osgeo.org svn_grass at osgeo.org
Wed Jan 4 16:16:14 PST 2017


Author: spawley
Date: 2017-01-04 16:16:14 -0800 (Wed, 04 Jan 2017)
New Revision: 70254

Modified:
   grass-addons/grass7/raster/r.learn.ml/r.learn.ml.html
   grass-addons/grass7/raster/r.learn.ml/r.learn.ml.py
Log:
'minor tweaks to parameter tuning for earth classifier'

Modified: grass-addons/grass7/raster/r.learn.ml/r.learn.ml.html
===================================================================
--- grass-addons/grass7/raster/r.learn.ml/r.learn.ml.html	2017-01-04 23:11:03 UTC (rev 70253)
+++ grass-addons/grass7/raster/r.learn.ml/r.learn.ml.html	2017-01-05 00:16:14 UTC (rev 70254)
@@ -29,11 +29,11 @@
 	
 	<li><em>max_features</em> controls the number of variables that are allowed to be chosen from at each node split in the tree-based models, and can be considered to control the degree of correlation between the trees in ensemble tree methods. Tuning occurs over 1 to all of the features being available for random forests and gradient boosting. Single decision trees are not tuned on this parameter.</li>
 	
-	<li><em>min_samples_split</em> and <em>min_samples_leaf</em> control the number of samples required to split a node, or form a leaf node, respectively. Tuning varies these parameters by allowing up to 20% of the samples to be required form a node split or leaf node.</li>
+	<li><em>min_samples_split</em> and <em>min_samples_leaf</em> control the number of samples required to split a node, or form a leaf node, respectively. Tuning varies these parameters by allowing up to 2% of the samples to be required form a node split or leaf node.</li>
 	
-	<li>The <em>learning_rate</em> and <em>subsample</em> parameters apply only to Gradient Boosting. <em>learning_rate</em> shrinks the contribution of each tree, and <em>subsample</em> is the fraction of randomly selected samples for each tree. <em>learning_rate</em> is tuning over 0.001-0.1, and <em>subsample</em> is tuned over 0-1.0.</li>
+	<li>The <em>learning_rate</em> and <em>subsample</em> parameters apply only to Gradient Boosting. <em>learning_rate</em> shrinks the contribution of each tree, and <em>subsample</em> is the fraction of randomly selected samples for each tree. <em>learning_rate</em> is tuning over 0.01-0.1, and <em>subsample</em> is tuned over 0-1.0.</li>
 	
-	<li>Parameters relating to the Earth classifier consist of: <em>max_degree</em> which is the maximum degree of terms generated by the forward pass; <em>penalty</em> is a smoothing parameter; and <em>minspan_alpha</em> is the probability between 0 and 1 that controls the number of data points between knots. These are tuned over 1-5 for <em>max_degree</em>, 0.5-2 for <em>penalty</em>, and 0-1 for <em>minspan_alpha.</em></li>
+	<li>Parameters relating to the Earth classifier consist of: <em>max_degree</em> which is the maximum degree of terms generated by the forward pass; <em>penalty</em> is a smoothing parameter; and <em>minspan_alpha</em> is the probability between 0 and 1 that controls the number of data points between knots. These are tuned over 1-5 for <em>max_degree</em>, 0.5-2.0 for <em>penalty</em>, and 0.05-1.0 for <em>minspan_alpha</em>. Note that the Earth classifier is slow when using max_degree > 1, although performance is generally improved with max_degree between 2-3.</li>
 </ul>
 
 <p>In addition to model fitting and prediction, feature selection can be performed using the <em>-f</em> flag. The feature selection method employed consists of a custom permutation-based method that can be applied to all of the classifiers as part of a cross-validation. The method consists of: (1) determining a performance metric on a test partition of the data; (2) permuting each variable and assessing the difference in performance between the original and permutation; (3) repeating step 2 for <em>n_permutations</em>; (4) averaging the results. Steps 1-4 are repeated on each k partition. The feature importance represent the average decrease in performance of each variable when permuted. For binary classifications, the AUC is used as the metric. Multiclass classifications use accuracy, and regressions use R2.</p>

Modified: grass-addons/grass7/raster/r.learn.ml/r.learn.ml.py
===================================================================
--- grass-addons/grass7/raster/r.learn.ml/r.learn.ml.py	2017-01-04 23:11:03 UTC (rev 70253)
+++ grass-addons/grass7/raster/r.learn.ml/r.learn.ml.py	2017-01-05 00:16:14 UTC (rev 70254)
@@ -915,22 +915,22 @@
 
     LogisticRegressionOpts = {'C': randint(1, 1000)}
     DecisionTreeOpts = {'max_depth': randint(2, 20),
-                        'min_samples_split': uniform(0, 0.2)}
+                        'min_samples_split': uniform(0, 0.02)}
     RandomForestOpts = {'max_features': uniform()}
-    GradientBoostingOpts = {'learning_rate': uniform(0.001, 0.1),
+    GradientBoostingOpts = {'learning_rate': uniform(0.01, 0.1),
                             'max_depth': randint(3, 10),
                             'max_features': uniform(),
                             'n_estimators': randint(50, 500),
-                            'min_samples_split': uniform(0, 0.2),
-                            'min_samples_leaf': uniform(0, 0.2),
+                            'min_samples_split': uniform(0, 0.02),
+                            'min_samples_leaf': uniform(0, 0.02),
                             'subsample': uniform()}
     SVCOpts = {'C': randint(1, 100), 'shrinking': [True, False]}
-    EarthOpts = {'max_degree': randint(1,10),
-                 'penalty': uniform(0.5, 5),
+    EarthOpts = {'max_degree': randint(1,5),
+                 'penalty': uniform(0.5, 2),
                  'minspan_alpha': uniform(0.05, 1.0)}
     EarthClassifierOpts = {'Earth__max_degree': randint(1,5),
-                         'Earth__penalty': uniform(0.5, 5),
-                         'Earth__minspan_alpha': uniform()}
+                           'Earth__penalty': uniform(0.5, 2),
+                           'Earth__minspan_alpha': uniform(0.05, 1.0)}
 
     param_grids = {
         'SVC': SVCOpts,



More information about the grass-commit mailing list