[GRASS-dev] RandomForest classifier for imagery groups add-on
Paulo van Breugel
p.vanbreugel at gmail.com
Sat Mar 26 10:42:47 PDT 2016
Hi Steve
Great news! I gave it a quick try (on Ubuntu 14.04, GRASS 7 master).
Size input raster layers: rows: 1578, columns: 1436
*1st try - input full map, classes 1/0, *
I had to stop as it took too much time. Stopping it did not stop the
python processes however, I had to kill the processes.
*2nd try - input random sample of 100 points, 1 (12) and 0 (88), with b
flag*
r.randomforest -b igroup=predictors at SampleSize roi=test2
output=test2_output ntrees=500 mfeatures=-1 minsplit=2 randst=1 lines=100
Group <predictors> references the following raster maps:
Traceback (most recent call last):
File "/home/paulo/.grass7/addons/scripts/r.randomforest",
line 335, in <module>
main()
File "/home/paulo/.grass7/addons/scripts/r.randomforest",
line 243, in main
class_weight = "balanced", max_features = mfeatures,
min_samples_split = minsplit, random_state = randst)
TypeError: __init__() got an unexpected keyword argument
'class_weight'
Removing raster <tmp_jNyNcqZa>
*3rd try**- input random sample of 100 points, 1 (#12) and 0 (#88), with
b flag*
r.randomforest igroup=predictors at SampleSize roi=test2
output=test2_output ntrees=500 mfeatures=-1 minsplit=2 randst=1 lines=100
Group <predictors> references the following raster maps:
Our OOB prediction of accuracy is: 89.0%
Raster Importance
0 bio1_wc30s at SampleSize 0.183670
1 bio2_wc30s at SampleSize 0.139914
2 bio3_wc30s at SampleSize 0.105035
3 bio4_wc30s at SampleSize 0.106413
4 bio13_wc30s at SampleSize 0.087399
5 bio14_wc30s at SampleSize 0.146495
6 dm_wc30s at SampleSize 0.104575
7 llds_wc30s at SampleSize 0.126499
Removing raster <tmp_RhTllKlA>
*Questions*
* I am using it for species distribution modeling (presence/absence
input map), but I prefer to use the regression mode. Is there a way to
force it to use the regression mode?
* Are you planning to implement other classification methods? Seems if
this works it shouldn't be too hard to replace the randomforest method
by any of the other methods in scipy? I have for som time been thinking
about using scipy, but my programming skills are not up to standards.
But perhaps it is easier using your addon as template?
Cheers,
Paulo
On Sat, Mar 26, 2016 at 5:40 PM, Steven Pawley
<dr.stevenpawley at gmail.com <mailto:dr.stevenpawley at gmail.com>> wrote:
Hello developers,
I would like to draw your attention to a new GRASS add-on,
r.randomforest, which uses the scikit-learn and pandas Python
packages to classify GRASS rasters. Similar to existing GRASS
classification methods, it uses an imagery group and a raster of
labelled pixels as the inputs for the classification. It also reads
the rasters row-by-row, and then bundles these rows based on a user
specified row increment to the classifier to keep memory
requirements low, but also allow efficient classification because
the scikit-learn implementation is multithreaded by default, and
row-by-row results in too much stop-start behaviour. The feature
importance scores and out-of-bag error are displayed in the command
window.
I would appreciate testing - you need to have scikit-learn and
pandas installed in your Python environment which is easy on Linux
and OS X, and instructions are provided in the tool for Windows.
I have another add-on that I will upload soon, r.roc, which
generates ROC and AUROC for prediction models.
Steve
Sent from Outlook Mobile <https://aka.ms/sdimjr>
_______________________________________________
grass-dev mailing list
grass-dev at lists.osgeo.org <mailto:grass-dev at lists.osgeo.org>
http://lists.osgeo.org/mailman/listinfo/grass-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20160326/66da91ed/attachment.html>
More information about the grass-dev
mailing list