[GRASS-dev] RandomForest classifier for imagery groups add-on
Paulo van Breugel
p.vanbreugel at gmail.com
Sun Mar 27 11:34:47 PDT 2016
On 27-03-16 16:58, Steven Pawley wrote:
> Hello Paulo,
> Many thanks for this. I updated the mode last night to include the
> ability to force regression mode, as well as including some more error
> checking for valid combinations of input parameters. Classification
> mode also checks that the input labelled pixels are CELL type. I'm not
> outputting all of the appropriate uncertainty measures like RSQ yet
> for regression mode yet, but I'll add those in.
Great, I'll check it out.
> That is interesting that you had better performance when using
> regression. I will have to check that for my application using scikit
> learn. In R using the randomforest package, the results were pretty
> much identical but my classes were balanced already, which I think is
> one factor that can lead to significant differences between binary
> classification probabilities vs regression.
It was a study by somebody else, I can't remember which one right now,
but it will come back to me. But yes, the fact that for species
distribution modeling the sampling is often highly unbalanced (with
large number of pseudo-absence) is likely to play a role.
> Yes definitely will use this as a template to include other methods. I
> Only recently switched my work from R to Python but am just submitting
> a paper based on R which uses a range of classifiers like
> randomforest, GLM, GAM, and MARS which it was useful to evaluate the
It sometimes seems there are almost as many different conclusions about
the best method as there are publications (OK, I might exaggerate a bit
here), so comparing difference models is very useful. So very glad you
are doing this (as I said, I have looked at scipy before and how it
could be implemented in GRASS, but my Python skills are just not up to it).
> From: Paulo van Breugel <p.vanbreugel at gmail.com
> <mailto:p.vanbreugel at gmail.com>>
> Sent: Sunday, March 27, 2016 3:11 AM
> Subject: Re: [GRASS-dev] RandomForest classifier for imagery groups add-on
> To: Vaclav Petras <wenzeslaus at gmail.com
> <mailto:wenzeslaus at gmail.com>>, Steven Pawley
> <dr.stevenpawley at gmail.com <mailto:dr.stevenpawley at gmail.com>>
> Cc: <grass-dev at lists.osgeo.org <mailto:grass-dev at lists.osgeo.org>>
> Hi Steve
> Yes, your user case will not differ methodologically from species
> modeling based on presence/absence. One reason I was asking for the
> regression randomForest is that in one article (can't remember the
> title, will look it up) it was found that the regression approach
> yielded better results, even though the response variable is binary.
> One your help page, you write that r.randomforest performs random
> forest classification and regression, and the regression mode can be
> used by setting the mode to the regression option. But I am not seeing
> that option?
> Great you are planning other methods as well. Giving model
> uncertainties (quite an issue in species distribution modeling),
> having multiple methods is really a plus, especially as it allows one
> to build consensus models  and combine them to create uncertainty
> Marmion, M., Parviainen, M., Luoto, M., Heikkinen, R.K., &
> Thuiller, W. 2009. Evaluation of consensus methods in predictive
> species distribution modelling. /Diversity and Distributions/ 15: 59–69.
> On 27-03-16 00:47, Steven Pawley wrote:
> Hi Vaclaw and Paulo,
> Thanks for those pointers re. lazy technique and documentation. I
> have a RandomForest diagram to explain the process, as well as
> some examples, so I'll update documentation next week.
> Paulo thanks for running a few tests. It looks there is an error
> with the class_weight parameter, I'll check into that.
> In terms of species distribution modelling, I have been using the
> tool for landslide susceptibility modelling, which I believe is
> methodologically similar to SDM in terms of having a binary
> response variable. I have been doing this for the area of Alberta,
> using an 8000 x 14000 pixel and 17 band stack of predictors. In
> the case of a binary response variable, the usual approach is to
> run random forest in classification mode, i.e. with fully grown
> trees, but use the class probabilities to represent the 'species'
> or 'landslide' index.
> I am planning to implement other methods in the scikit learn
> package, which represents a trivial change to the module once he
> bugs are ironed out. I will probably look to create modules for
> SVM and logistic regression, and maybe nearest neighbours
> classification. Certainly open to any suggestions.
> From: Vaclav Petras < wenzeslaus at gmail.com
> <mailto:wenzeslaus at gmail.com>>
> Sent: Saturday, March 26, 2016 11:21 AM
> Subject: Re: [GRASS-dev] RandomForest classifier for imagery
> groups add-on
> To: Steven Pawley < dr.stevenpawley at gmail.com
> <mailto:dr.stevenpawley at gmail.com>>
> Cc: < grass-dev at lists.osgeo.org <mailto:grass-dev at lists.osgeo.org>>
> On Sat, Mar 26, 2016 at 12:40 PM, Steven Pawley
> <dr.stevenpawley at gmail.com> wrote:
> I would like to draw your attention to a new GRASS add-on,
> r.randomforest, which uses the scikit-learn and pandas Python
> packages to classify GRASS rasters.
> Thanks, this looks good. Please consider adding an image to the
> documentation to better promote the module  and also an example
> which would work with the NC SPM dataset . For the addon to
> generate documentation on the server and work well at few other
> special occasions, it is advantageous to employ lazy import
> technique for the non-standard dependencies, see for example
> v.class.ml <http://v.class.ml> and v.class.mlpy .
>  https://trac.osgeo.org/grass/wiki/Submitting/Docs#Images
>  https://grass.osgeo.org/download/sample-data/
>  https://trac.osgeo.org/grass/changeset/66482/
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the grass-dev