[GRASS-SVN] r72637 - grass-addons/grass7/raster/r.learn.ml

Fri Apr 20 05:39:07 PDT 2018

Author: neteler
Date: 2018-04-20 05:39:07 -0700 (Fri, 20 Apr 2018)
New Revision: 72637

Modified:
   grass-addons/grass7/raster/r.learn.ml/r.learn.ml.html
   grass-addons/grass7/raster/r.learn.ml/r.learn.ml.py
Log:
r.learn.ml addon: explain model file compression option

Modified: grass-addons/grass7/raster/r.learn.ml/r.learn.ml.html
===================================================================

--- grass-addons/grass7/raster/r.learn.ml/r.learn.ml.html	2018-04-18 13:05:40 UTC (rev 72636)
+++ grass-addons/grass7/raster/r.learn.ml/r.learn.ml.html	2018-04-20 12:39:07 UTC (rev 72637)
@@ -24,7 +24,7 @@
 <p>Cross validation can be performed by setting the <em>cv</em> parameters to &gt 1. Cross-validation is performed using stratified kfolds, and multiple global and per-class accuracy measures are produced depending on whether the response variable is binary or multiclass, or the classifier is for regression or classification. The <em>cvtype</em> parameter can also be changed from 'non-spatial' to either 'clumped' or 'kmeans' to perform spatial cross-validation. Clumped spatial cross-validation is used if the training pixels represent polygons, and then cross-validation will be effectively performed on a polygon basis. Kmeans spatial cross-validation will partition the training pixels into <em>n_partitions</em> by kmeans clustering of the pixel coordinates. These partitions will then be used for cross-validation, which should provide more realistic performance measures if the data are spatially correlated. If these partioning schemes are not sufficient then a raster containing the gr
 oup_ids of the partitions can be supplied using the <em>group_raster</em> option.</p>
 <p>Although tree-based classifiers are insensitive to the scaling of the input data, other classifiers such as linear models may not perform optimally if some predictors have variances that are orders of magnitude larger than others. The <em>-s</em> flag adds a standardization preprocessing step to the classification and prediction to reduce this effect. Additionally, most of the classifiers do not perform well if there is a large class imbalance in the training data. Using the <em>-b</em> flag balances the training data by weighting of the minority classes relative to the majority class. This does not apply to the Naive Bayes or LinearDiscriminantAnalysis classifiers.</p> 
 <p>Non-ordinal, categorical predictors are also not specifically recognized by scikit-learn. Some classifiers are not very sensitive to this (i.e. decision trees) but generally, categorical predictors need to be converted to a suite of binary using onehot encoding (i.e. where each value in a categorical raster is parsed into a separate binary grid). Entering the indices (comma-separated) of the categorical rasters as they are listed in the imagery group as 0...n in the <em>categorymaps</em> option will cause onehot encoding to be performed on the fly during training and prediction. The feature importances are returned as per the original imagery group and represent the sum of the feature importances of the onehot-encoded variables. Note: it is important that the training samples all of the categories in the rasters, otherwise the onehot-encoding will fail when it comes to the prediction.</p>
-<p>The module also offers the ability to save and load a classification or regression model. Saving and loading a model allows a model to be fitted on one imagery group, with the prediction applied to additional imagery groups. This approach is commonly employed in species distribution or landslide susceptibility modelling whereby a classification or regression model is built with one set of predictors (e.g. present-day climatic variables) and then predictions can be performed on other imagery groups containing forecasted climatic variables.</p>
+<p>The module also offers the ability to save and load a classification or regression model (<b>save_model</b>=<em>name[.gz]</em>). Note that the model file size can become quite large; when using a supported filename extensions (incl. '.gz', '.bz2', '.xz' or '.lzma') the model file will be automatically compressed. Saving and loading a model allows a model to be fitted on one imagery group, with the prediction applied to additional imagery groups. This approach is commonly employed in species distribution or landslide susceptibility modelling whereby a classification or regression model is built with one set of predictors (e.g. present-day climatic variables) and then predictions can be performed on other imagery groups containing forecasted climatic variables.</p>
 <p>For convenience when performing repeated classifications using different classifiers or parameters, the training data can be saved to a csv file using the <em>save_training</em> option. This data can then be loaded into subsequent classification runs, saving time by avoiding the need to repeatedly query the predictors.</p>
 <h2>NOTES</h2>
 <p><em>r.learn.ml</em> uses the "scikit-learn" machine learning python package along with the "pandas" package. These packages need to be installed within your GRASS GIS Python environment. For Linux users, these packages should be available through the linux package manager. For MS-Windows users using a 64 bit GRASS, the easiest way of installing the packages is by using the precompiled binaries from <a href="http://www.lfd.uci.edu/~gohlke/pythonlibs/">Christoph Gohlke</a> and by using the <a href="https://grass.osgeo.org/download/software/ms-windows/">OSGeo4W</a> installation method of GRASS, where the python setuptools can also be installed. You can then use 'easy_install pip' to install the pip package manager. Then, you can download the NumPy+MKL and scikit-learn .whl files and install them using 'pip install packagename.whl'. For MS-Windows with a 32 bit GRASS, scikit-learn is available in the OSGeo4W installer.</p>
@@ -63,4 +63,4 @@
 <p>Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.</p>
 <h2>AUTHOR</h2>
 Steven Pawley
-<p><em>Last changed: $Date$</em></p>
\ No newline at end of file
+</p><em>Last changed: $Date$</em></p>

Modified: grass-addons/grass7/raster/r.learn.ml/r.learn.ml.py
===================================================================
--- grass-addons/grass7/raster/r.learn.ml/r.learn.ml.py	2018-04-18 13:05:40 UTC (rev 72636)
+++ grass-addons/grass7/raster/r.learn.ml/r.learn.ml.py	2018-04-20 12:39:07 UTC (rev 72637)
@@ -381,7 +381,7 @@
 
 #%option G_OPT_F_OUTPUT
 #% key: save_model
-#% label: Save model to file
+#% label: Save model to file (for compression use e.g. '.gz' extension)
 #% required: no
 #% guisection: Optional
 #%end