[GRASS-dev] Object-based image classification in GRASS

Tue Jan 7 09:33:48 PST 2014

Dear all,

Some news about the machine learning classification of image segments.

The process described below has been used to classify some RGB images 
for two different regions with more than 1 billions of pixels, and more 
than 2.7 millions  of segments.
Working with such challenging figures requires to optimize/rewrite part 
of the pygrass library [r58622-r58628 and r58634/r58635] and
to adapt/add new GRASS modules, below is briefly reported the sequence 
of modules used/developed:

    1. i.segment.hierarchical [r58137] => extract the segments 
        from the raster group splitting the domain in tiles 
        (in grass-addons);

    2. r.to.vect => convert the segments to a vector map;

    3. v.category => to transfer the categories of the geometry
        features to the new layers, the module was not working 
        for areas but know is fixed [r58202].

    3. v.stats [r58637] => Extract statistics from a vector map
       (statistics about shape and about raster maps). 
       v.stats internally use (in grass-addons):
        - v.area.stats [r58636] => extract some statistics about
          the shape (in grass-addons);
        - v.to.rast => re-convert the vector to a raster map using the
          vector categories to be sure that there is a correspondence
          between vector and raster categories (zones).
        - r.univar2 [r58439] => extract some general statistics from
          raster using the zones (consume much less memory than
          r.univar, and compute more general statistics like:
          skewness, kurtosis, and mode (in grass-addons);

    4. v.class.ml [r58638] => classify a vector map, at the moment
        only a supervisionate classification is tested/supported. 
        To select the segment that must use for training the different 
        machine-learning techniques you can define a training 
        map using the g.gui.iclass.
        The v.class.ml module can:
        - extract the training, 
        - balance and scale the training set;
        - optimize the training set;
        - test several machine learning techniques;
        - explore the SVC domain;
        - export the accuracy of different ML to a csv file;
        - find and export the optimum training set,
        - classify the vector map using several ML techniques and
          export to a new layer of the vector map with the results
          of the classification;
        - export the classification results to several raster maps,
          the vector map coming from a segment raster map is too
          big to be exported to a shape file (the limit for a shape file    
          is 4Gb [0]).
        The module accept as input a python file with a list of custom
        classifiers defined by the user, and support both:
        scikit-learn[1] and mlpy[2] libraries.

Known Issues:
* not all the classifiers are working (but I hope to be able to fix this 
during the next weeks);
* so far, only a supervised classification is supported.

Best regards

Pietro

[0] http://www.gdal.org/ogr/drv_shapefile.html
[1] http://scikit-learn.org/
[2] http://mlpy.sourceforge.net/