<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body smarttemplateinserted="true" text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 27-03-16 16:58, Steven Pawley wrote:<br>
</div>
<blockquote
cite="mid:5B30AE9B1DA9F412.28C3AF84-E344-4225-BC58-ABB5C28FCEDA@mail.outlook.com"
type="cite">
<div id="compose" style="padding-left: 16px; padding-right: 16px;
padding-bottom: 8px;" contenteditable="true">
<div>Hello Paulo,</div>
<div><br>
</div>
<div>Many thanks for this. I updated the mode last night to
include the ability to force regression mode, as well as
including some more error checking for valid combinations of
input parameters. Classification mode also checks that the
input labelled pixels are CELL type. I'm not outputting all of
the appropriate uncertainty measures like RSQ yet for
regression mode yet, but I'll add those in.</div>
</div>
</blockquote>
Great, I'll check it out.
<blockquote
cite="mid:5B30AE9B1DA9F412.28C3AF84-E344-4225-BC58-ABB5C28FCEDA@mail.outlook.com"
type="cite">
<div id="compose" style="padding-left: 16px; padding-right: 16px;
padding-bottom: 8px;" contenteditable="true">
<div><br>
</div>
<div>That is interesting that you had better performance when
using regression. I will have to check that for my application
using scikit learn. In R using the randomforest package, the
results were pretty much identical but my classes were
balanced already, which I think is one factor that can lead to
significant differences between binary classification
probabilities vs regression.<br>
</div>
</div>
</blockquote>
It was a study by somebody else, I can't remember which one right
now, but it will come back to me. But yes, the fact that for species
distribution modeling the sampling is often highly unbalanced (with
large number of pseudo-absence) is likely to play a role. <br>
<blockquote
cite="mid:5B30AE9B1DA9F412.28C3AF84-E344-4225-BC58-ABB5C28FCEDA@mail.outlook.com"
type="cite">
<div id="compose" style="padding-left: 16px; padding-right: 16px;
padding-bottom: 8px;" contenteditable="true">
<div><br>
Yes definitely will use this as a template to include other
methods. I Only recently switched my work from R to Python but
am just submitting a paper based on R which uses a range of
classifiers like randomforest, GLM, GAM, and MARS which it was
useful to evaluate the differences.</div>
</div>
</blockquote>
It sometimes seems there are almost as many different conclusions
about the best method as there are publications (OK, I might
exaggerate a bit here), so comparing difference models is very
useful. So very glad you are doing this (as I said, I have looked at
scipy before and how it could be implemented in GRASS, but my Python
skills are just not up to it). <br>
<blockquote
cite="mid:5B30AE9B1DA9F412.28C3AF84-E344-4225-BC58-ABB5C28FCEDA@mail.outlook.com"
type="cite">
<div id="compose" style="padding-left: 16px; padding-right: 16px;
padding-bottom: 8px;" contenteditable="true">
<div><br>
</div>
<div>Steve<br>
<br>
</div>
</div>
<div class="gmail_quote">_____________________________<br>
From: Paulo van Breugel <<a moz-do-not-send="true" dir="ltr"
href="mailto:p.vanbreugel@gmail.com"
x-apple-data-detectors="true"
x-apple-data-detectors-type="link"
x-apple-data-detectors-result="0">p.vanbreugel@gmail.com</a>><br>
Sent: Sunday, March 27, 2016 3:11 AM<br>
Subject: Re: [GRASS-dev] RandomForest classifier for imagery
groups add-on<br>
To: Vaclav Petras <<a moz-do-not-send="true" dir="ltr"
href="mailto:wenzeslaus@gmail.com"
x-apple-data-detectors="true"
x-apple-data-detectors-type="link"
x-apple-data-detectors-result="2">wenzeslaus@gmail.com</a>>,
Steven Pawley <<a moz-do-not-send="true" dir="ltr"
href="mailto:dr.stevenpawley@gmail.com"
x-apple-data-detectors="true"
x-apple-data-detectors-type="link"
x-apple-data-detectors-result="3">dr.stevenpawley@gmail.com</a>><br>
Cc: <<a moz-do-not-send="true" dir="ltr"
href="mailto:grass-dev@lists.osgeo.org"
x-apple-data-detectors="true"
x-apple-data-detectors-type="link"
x-apple-data-detectors-result="4">grass-dev@lists.osgeo.org</a>><br>
<br>
<br>
<meta content="text/html; charset=utf-8">
Hi Steve <br>
<br>
Yes, your user case will not differ methodologically from
species modeling based on presence/absence. One reason I was
asking for the regression randomForest is that in one article
(can't remember the title, will look it up) it was found that
the regression approach yielded better results, even though the
response variable is binary. One your help page, you write that
r.randomforest performs random forest classification and
regression, and the regression mode can be used by setting the
mode to the regression option. But I am not seeing that option?
<br>
<br>
Great you are planning other methods as well. Giving model
uncertainties (quite an issue in species distribution modeling),
having multiple methods is really a plus, especially as it
allows one to build consensus models [1] and combine them to
create uncertainty maps. <br>
<br>
Cheers, <br>
<br>
Paulo <br>
<br>
[1]Marmion, M., Parviainen, M., Luoto, M., Heikkinen, R.K.,
& Thuiller, W. 2009. Evaluation of consensus methods in
predictive species distribution modelling. <i>Diversity and
Distributions</i> 15: 59–69. <br>
<br>
<div style="line-height: 1.35; padding-left: 2em;
text-indent:-2em;" class="csl-bib-body"> <span class="Z3988"
title="url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fzotero.org%3A2&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluation%20of%20consensus%20methods%20in%20predictive%20species%20distribution%20modelling&rft.jtitle=Diversity%20and%20Distributions&rft.volume=15&rft.issue=1&rft.aufirst=M.&rft.aulast=Marmion&rft.au=M.%20Marmion&rft.au=M.%20Parviainen&rft.au=M.%20Luoto&rft.au=R.%20K%20Heikkinen&rft.au=W.%20Thuiller&rft.date=2009&rft.pages=59%E2%80%9369"></span>
</div>
<br>
<div class="moz-cite-prefix"> On 27-03-16 00:47, Steven Pawley
wrote: <br>
</div>
<blockquote>
<div id="compose" style="padding-left: 20px; padding-right:
20px; padding-bottom: 8px;">
<div> Hi Vaclaw and Paulo, </div>
<div> <br>
</div>
<div> Thanks for those pointers re. lazy technique and
documentation. I have a RandomForest diagram to explain
the process, as well as some examples, so I'll update
documentation next week. </div>
<div> <br>
</div>
<div> Paulo thanks for running a few tests. It looks there
is an error with the class_weight parameter, I'll check
into that. </div>
<div> <br>
</div>
<div> In terms of species distribution modelling, I have
been using the tool for landslide susceptibility
modelling, which I believe is methodologically similar to
SDM in terms of having a binary response variable. I have
been doing this for the area of Alberta, using an 8000 x
14000 pixel and 17 band stack of predictors. In the case
of a binary response variable, the usual approach is to
run random forest in classification mode, i.e. with fully
grown trees, but use the class probabilities to represent
the 'species' or 'landslide' index. </div>
<div> <br>
</div>
<div> I am planning to implement other methods in the scikit
learn package, which represents a trivial change to the
module once he bugs are ironed out. I will probably look
to create modules for SVM and logistic regression, and
maybe nearest neighbours classification. Certainly open
to any suggestions. </div>
<div> <br>
</div>
<div> Steve </div>
</div>
<div class="gmail_quote"> _____________________________ <br>
From: Vaclav Petras < <a moz-do-not-send="true"
dir="ltr" href="mailto:wenzeslaus@gmail.com">wenzeslaus@gmail.com</a>>
<br>
Sent: Saturday, March 26, 2016 11:21 AM <br>
Subject: Re: [GRASS-dev] RandomForest classifier for imagery
groups add-on <br>
To: Steven Pawley < <a moz-do-not-send="true" dir="ltr"
href="mailto:dr.stevenpawley@gmail.com">dr.stevenpawley@gmail.com</a>>
<br>
Cc: < <a moz-do-not-send="true" dir="ltr"
href="mailto:grass-dev@lists.osgeo.org">grass-dev@lists.osgeo.org</a>>
<br>
<br>
<br>
<div dir="ltr">
<div class="gmail_extra"> <br>
<div class="gmail_quote"> On Sat, Mar 26, 2016 at 12:40
PM, Steven Pawley <span dir="ltr"><<a
moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:dr.stevenpawley@gmail.com"><a class="moz-txt-link-abbreviated" href="mailto:dr.stevenpawley@gmail.com">dr.stevenpawley@gmail.com</a></a>></span>
wrote: <br>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"> I would like to
draw your attention to a new GRASS add-on,
r.randomforest, which uses the scikit-learn and
pandas Python packages to classify GRASS rasters. </blockquote>
</div>
<br>
</div>
<div class="gmail_extra"> Thanks, this looks good. Please
consider adding an image to the documentation to better
promote the module [1] and also an example which would
work with the NC SPM dataset [2]. For the addon to
generate documentation on the server and work well at
few other special occasions, it is advantageous to
employ lazy import technique for the non-standard
dependencies, see for example <a moz-do-not-send="true"
href="http://v.class.ml">v.class.ml</a> and
v.class.mlpy [3]. <br>
<br>
</div>
<div class="gmail_extra"> Vaclav <br>
</div>
<div class="gmail_extra"> <br>
[1] <a moz-do-not-send="true"
href="https://trac.osgeo.org/grass/wiki/Submitting/Docs#Images">https://trac.osgeo.org/grass/wiki/Submitting/Docs#Images</a>
<br>
[2] <a moz-do-not-send="true"
href="https://grass.osgeo.org/download/sample-data/">https://grass.osgeo.org/download/sample-data/</a>
<br>
[3] <a moz-do-not-send="true"
href="https://trac.osgeo.org/grass/changeset/66482/">https://trac.osgeo.org/grass/changeset/66482/</a>
<br>
</div>
</div>
<br>
<br>
</div>
</blockquote>
<br>
<br>
<br>
</div>
</blockquote>
<br>
</body>
</html>