[GRASS-SVN] r66859 - in grass-addons/grass7/raster: . r.mregression.series

Wed Nov 18 07:10:39 PST 2015

Author: DmitryKolesov
Date: 2015-11-18 07:10:39 -0800 (Wed, 18 Nov 2015)
New Revision: 66859

Added:
   grass-addons/grass7/raster/r.mregression.series/
   grass-addons/grass7/raster/r.mregression.series/Makefile
   grass-addons/grass7/raster/r.mregression.series/r.mregression.series.html
   grass-addons/grass7/raster/r.mregression.series/r.mregression.series.py
Log:
Add r.mregression.series module

Added: grass-addons/grass7/raster/r.mregression.series/Makefile
===================================================================

--- grass-addons/grass7/raster/r.mregression.series/Makefile	                        (rev 0)
+++ grass-addons/grass7/raster/r.mregression.series/Makefile	2015-11-18 15:10:39 UTC (rev 66859)
@@ -0,0 +1,7 @@
+MODULE_TOPDIR =../..
+
+PGM = r.mregression.series
+
+include $(MODULE_TOPDIR)/include/Make/Script.make
+
+default: script

Added: grass-addons/grass7/raster/r.mregression.series/r.mregression.series.html
===================================================================
--- grass-addons/grass7/raster/r.mregression.series/r.mregression.series.html	                        (rev 0)
+++ grass-addons/grass7/raster/r.mregression.series/r.mregression.series.html	2015-11-18 15:10:39 UTC (rev 66859)
@@ -0,0 +1,101 @@
+<h2>DESCRIPTION</h2>
+
+<em>r.mregression.series</em> is a module to calculate multiple 
+linear regression parameters between several time series, e.g. NDVI and
+elevation, precipitation.
+<p>
+The module makes each output cell value a function of the values
+assigned to the corresponding cells in the input raster map series.
+
+<h2>DESCRIPTION</h2>
+
+The module assumes a simple linear regression of the form
+<div class="code"><pre>
+    Y = b1 * X1 + b2 * X2 + ... + bn * Xn 
+</pre></div>
+<p>
+
+<h2>NOTES</h2>
+
+The number of maps in <em>X</em> and <em>Y</em> must be 
+identical.
+<p>
+The list of inputs for each cell (including NULLs) is passed to the 
+regression function. The functin compute the parameters over the 
+non-NULL values, producing a NULL result only if there aren't enough 
+non-NULL values for computing.
+<p>
+Linear regression  requires an equal number of <em>X</em> and 
+<em>Y</em> maps.  If the different time series have irregular 
+time intervals, NULL raster maps can be inserted into time series to
+make time intervals equal. 
+
+<h2>EXAMPLES</h2>
+The most important paramether is <em>samples</em>, it provides the list
+of <em>X</em> and <em>Y</em> maps. The parameter is the name of csv file
+of the next structure: the first line is a header, other lines provide names
+of the <em>X</em> and <em>Y</em> maps. The header contains the names of the
+input and output variables.
+<p>
+For example the csv file for regression between NDVI and (elevation, 
+precipitation)
+<div class="code"><pre>
+    NDVI = b1*Elevation + b2*Precipitation 
+</pre></div>
+could be the next file:
+<div class="code"><pre>
+y,elevation,precipipation
+ndvi_1,elev_1,precip_1
+ndvi_2,elev_2,precip_2
+...
+ndvi_n,elev_n,precip_n
+</pre></div>
+"ndvi_t" are names of the NDVI rasters, "precip_t" are names 
+of precipitation rasters. The names of the first and the 
+second variables are "elevation" and "precipitation" accordingly.
+<p>
+The second paramether is  <em>result_prefix</em>. It is used for 
+coefficient names construction. For example if result_prefix="coef.",
+the names of the regression coefficient names are "coef.elevation"
+and "coef.precipitation".
+
+<div class="code"><pre>
+r.mregression.series samples=settings result_prefix="coef."
+</pre></div>
+
+<p>
+If the regression model includes the intercept 
+<div class="code"><pre>
+    NDVI = b0 + b1*Elevation + b2*Precipitation 
+</pre></div>
+then the constant map should be used:
+<div class="code"><pre>
+r.mapcalc "ones = 1.0"
+</pre></div>
+and the csv file is:
+<div class="code"><pre>
+y,offset,elevation,precipipation
+ndvi_1,ones,elev_1,precip_1
+ndvi_2,ones,elev_2,precip_2
+...
+ndvi_n,ones,elev_n,precip_n
+</pre></div>
+Then the command
+<div class="code"><pre>
+r.mregression.series samples=settings result_prefix="coef."
+</pre></div>
+produces three raster maps: "coef.offset", "coef.elevation", "coef.precipitation".    
+
+<h2>SEE ALSO</h2>
+
+<em><a href="addons/r.regression.series.html">r.regression.series</a></em>,
+<em><a href="r.series.html">r.series</a></em>,
+<em><a href="r.regression.line.html">r.regression.line</a></em>,
+<em><a href="g.list.html">g.list</a></em>,
+<em><a href="g.region.html">g.region</a></em>
+
+<h2>AUTHOR</h2>
+
+Dmitry Kolesov
+
+<p><i>Last changed: $Date: 2014-10-29 02:11:08 +0300 (Ср., 29 окт. 2014) $</i>

Added: grass-addons/grass7/raster/r.mregression.series/r.mregression.series.py
===================================================================
--- grass-addons/grass7/raster/r.mregression.series/r.mregression.series.py	                        (rev 0)
+++ grass-addons/grass7/raster/r.mregression.series/r.mregression.series.py	2015-11-18 15:10:39 UTC (rev 66859)
@@ -0,0 +1,265 @@
+#!/usr/bin/env python
+# -*- coding: utf-8  -*-
+#
+############################################################################
+#
+# MODULE:       r.mregression.series
+#
+# AUTHOR(S):    Dmitry Kolesov <kolesov.dm at gmail.com>
+#
+# PURPOSE:      multiple regression between two time series
+#
+# COPYRIGHT:    (C) 2015 by the GRASS Development Team
+#
+#               This program is free software under the GNU General Public
+#               License (>=v2). Read the file COPYING that comes with GRASS
+#               for details.
+#
+#############################################################################
+
+#%Module
+#% description: Program calculates multiple regression between two series: Y = a1*X1 + ... + an*Xn
+#% overwrite: yes
+#%End
+#%option
+#% key: samples
+#% type: string
+#% gisprompt: file with settings in csv format
+#% description: File contains list of input and output rasters
+#% required : yes
+#% multiple: no
+#%end
+#%option
+#% key: result_prefix
+#% type: string
+#% gisprompt: prefix for names of result rasters
+#% description: Prefix for names of result raster (rasters of regression coefficients)
+#% required : yes
+#% multiple: no
+#%end
+
+
+import os
+import sys
+
+import csv
+import numpy as np
+from numpy.linalg.linalg import LinAlgError
+
+
+if "GISBASE" not in os.environ:
+    sys.stderr.write("You must be in GRASS GIS to run this program.\n")
+    sys.exit(1)
+
+import grass.script as grass
+from grass.pygrass import raster
+from grass.pygrass.gis.region import Region
+
+try:
+    import statsmodels.api as sm
+except ImportError:
+    grass.error("Can't import statsmodels. Install scipy package.")
+    sys.exit(1)
+
+
+CNULL = -2147483648  # null value for CELL maps
+FNULL = np.nan       # null value for FCELL and DCELL maps
+
+
+def get_val_or_null(map, row, col):
+    """
+    Return map value of the cell or FNULL (if the cell is null)
+    """
+    value = map.get(row, col)
+    if map.mtype == "CELL" and value == CNULL:
+        value = FNULL
+    return value
+
+
+def ols(y, x):
+    """Ordinary least squares.
+
+    :param x:   MxN matrix of data points
+    :param x:   numpy.array
+    :param y:   vector of M output values
+    :param y:   numpy.array
+    :return:    vector b of coefficients (x * b = y)
+    """
+    # if a X sample has nan value,
+    # then the fitting procedure crashes.
+    # Delete no-data samples from the data
+    factor_count = x.shape[1]
+    nan_idx = np.logical_or(np.isnan(y), np.isnan(x).any(axis=1))
+    if nan_idx.any():
+        y = y[~nan_idx]
+        x = x[~nan_idx]
+    if x.shape[0] < x.shape[1]:
+        # The system can't be solved
+        return [FNULL for i in range(factor_count)]
+
+    model = sm.OLS(y, x)
+    try:
+        results = model.fit()
+        coefs = results.params
+    except LinAlgError:
+        coefs = [FNULL for i in range(factor_count)]
+    return coefs
+
+
+def get_sample_names(filename, delimiter=','):
+    """
+    Analyse settings file, returns
+    """
+    with open(filename) as settings:
+        reader = csv.reader(settings, delimiter=delimiter)
+        headers = reader.next()
+        inputs = []
+        outputs = []
+        for row in reader:
+            outputs.append(row[0])
+            inputs.append(row[1:])
+
+    return headers, outputs, inputs
+
+
+class DataModel(object):
+    def __init__(self, headers, Y, X, prefix, restype='FCELL'):
+        """Linear Least Square model  Y = X * b + e
+        X are [[],.., []] of raster names
+        Y are [] of raster names
+
+        b are rasters of the regression coeficients, the rasters
+            have type restype
+        e are the errors of the model (the class doesn't compute e)
+
+        header is [] of the variable names.
+        names of the 'b' rasters are constructed as
+            prefix+variable_name (see header)
+        """
+        if len(set(headers)) != len(headers):
+            grass.error('The names of the variables are not unique!')
+
+        self.mtype = restype
+
+        self.x_headers = headers[1:]    # Names of the coefficient
+        self.b_names = [prefix + name for name in self.x_headers]
+        self.y_names = Y                # Names of Y rasters
+        self.x_names = X                # Names of X rasters
+
+        self.sample_count = len(self.y_names)
+        self.factor_count = len(self.x_names[0])
+
+        self._y_rasters = []
+        self._x_rasters = []
+        self._b_rasters = []
+        self._init_rasters()
+
+    def x(self, s, f):
+        """Return input map (X) from sample number s and factor number f.
+        """
+        return self._x_rasters[s][f]
+
+    def y(self, s):
+        """Return output map (Y) from sample number s.
+        """
+        return self._y_rasters[s]
+
+    def b(self, s):
+        return self._b_rasters[s]
+
+    def _init_rasters(self):
+        for name in self.y_names:
+            map = raster.RasterSegment(name)
+            if not map.exist():
+                raise ValueError("Raster map %s doesn't exist" % (name, ))
+            self._y_rasters.append(map)
+
+        for names in self.x_names:
+            maps = []
+            for name in names:
+                map = raster.RasterSegment(name)
+                if not map.exist():
+                    raise ValueError("Raster map %s doesn't exist" % (name, ))
+                maps.append(map)
+            # Check count of X samples
+            assert len(maps) == self.factor_count
+
+            self._x_rasters.append(maps)
+
+        # Rasters of the regression coefitients
+        for i in range(self.factor_count):
+            name = self.b_names[i]
+            map = raster.RasterSegment(name)
+            self._b_rasters.append(map)
+
+    def open_rasters(self,  overwrite):
+        for i in range(self.sample_count):
+            map = self.y(i)
+            map.open()
+
+        for j in range(self.factor_count):
+            map = self.b(j)
+            map.open('w', mtype=self.mtype, overwrite=overwrite)
+            for i in range(self.sample_count):
+                map = self.x(i, j)
+                map.open()
+
+    def close_rasters(self):
+        for i in range(self.sample_count):
+            map = self.y(i)
+            map.close()
+
+        for j in range(self.factor_count):
+            map = self.b(j)
+            map.close()
+            for i in range(self.sample_count):
+                map = self.x(i, j)
+                map.close()
+
+    def get_sample(self, row, col):
+        """Return X and Y matrices for one pixel sample
+        """
+        X = np.empty((self.sample_count, self.factor_count))
+        Y = np.empty(self.sample_count)
+        for snum in range(self.sample_count):
+            y = self.y(snum)
+            Y[snum] = get_val_or_null(y, row, col)
+            for fnum in range(self.factor_count):
+                x = self.x(snum, fnum)
+                X[snum, fnum] = get_val_or_null(x, row, col)
+
+        return Y, X
+
+    def ols(self, overwrite=None):
+        try:
+            reg = Region()
+            self.open_rasters(overwrite=overwrite)
+            rows, cols = reg.rows, reg.cols
+            for r in range(rows):
+                for c in range(cols):
+                    Y, X = self.get_sample(r, c)
+                    coefs = ols(Y, X)
+                    for i in range(self.factor_count):
+                        b = self.b(i)
+                        b.put(r, c, coefs[i])
+        finally:
+            self.close_rasters()
+        print 'ols'
+
+
+def main(options, flags):
+    samples = options['samples']
+    res_pref = options['result_prefix']
+    if not os.path.isfile(samples):
+        sys.stderr.write("File '%s' doesn't exist.\n" % (samples, ))
+        sys.exit(1)
+
+    headers, outputs, inputs = get_sample_names(samples)
+
+    model = DataModel(headers, outputs, inputs, res_pref)
+    model.ols(overwrite=grass.overwrite())
+    sys.exit(0)
+
+if __name__ == "__main__":
+    options, flags = grass.parser()
+    main(options, flags)