[GRASS-SVN] r66891 - grass-addons/grass7/raster/r.vif

svn_grass at osgeo.org svn_grass at osgeo.org
Sun Nov 22 08:42:40 PST 2015


Author: pvanbosgeo
Date: 2015-11-22 08:42:40 -0800 (Sun, 22 Nov 2015)
New Revision: 66891

Modified:
   grass-addons/grass7/raster/r.vif/r.vif.html
   grass-addons/grass7/raster/r.vif/r.vif.py
Log:
Added option to retain variables during stepwise selection + some small improvements

Modified: grass-addons/grass7/raster/r.vif/r.vif.html
===================================================================
--- grass-addons/grass7/raster/r.vif/r.vif.html	2015-11-22 16:42:34 UTC (rev 66890)
+++ grass-addons/grass7/raster/r.vif/r.vif.html	2015-11-22 16:42:40 UTC (rev 66891)
@@ -1,6 +1,6 @@
 <h2>DESCRIPTION</h2>
 
-<em>r.vif</em> This module will compute a stepwise variance inflation factor 
+The <em>r.vif</em> module computes a stepwise variance inflation factor 
 (VIF) and the square root of the VIF. The VIF quantifies how much 
 the variance (the square of the estimate's standard deviation) of an 
 estimated regression coefficient is increased because of 
@@ -10,25 +10,85 @@
 model. 
 
 <p>By default the VIF is calculated for each variable. If the user 
-sets a VIF threshold value (maxvif) the VIF will calculated again 
+sets a VIF threshold value (maxvif) the VIF will be calculated again 
 after removing the variable with the highest VIF. This will be 
 repeated till the VIF is smaller than maxvif. This can thus be used 
 to select a sub-set of variables for e.g., multiple regression 
-analysis.
+analysis. 
 
-<p>The user has the option to compute the VIF over a random subset 
-of the raster cells (to speed up computation). Like in the r.random 
-module (which is used to select the random set of raster cells), the 
-user may specify the quantity of random cells either as a positive 
-integer (e.g., 10), or as a percentage of the raster map layer's 
-cells (e.g., 10%, or 3.05%). Percentages less than one percent may 
-be stated as decimals. 
+<p>The user can optionally select one or more variables to be 
+retained in the stepwise selection. For example, the user selects 
+the variable <i>bio_5</i> to be retained. If in any step this 
+variable has the highest VIF, the variable with the next highet VIF 
+will be removed instead (see the examples).
 
-<h2>Notes</h2>
+<h2>EXAMPLES</h2>
 
-This add-on doesn't work in GRASS 6.* as it depends on 
-r.regression.multi, which is available in GRASS 7+ main only. 
+Run VIF for five variables, setting the maximum VIF at 10
 
+<div class="code"><pre>r.vif maps=bio_1,bio_2,bio_3,bio_4,bio_5 
+maxvif=10 file=results.txt
+
+VIF bio_1 = 76.6694778809
+VIF bio_2 = 13.0044084945
+VIF bio_3 = 5.30160850802
+VIF bio_4 = 4.24765529428
+VIF bio_5 = 96.27418889
+VIF bio_1 = 1.09855463167
+VIF bio_2 = 1.99532296297
+VIF bio_3 = 4.36471563878
+VIF bio_4 = 3.76425240065
+
+selected variables are:
+bio_1, bio_2, bio_3, bio_4
+with as maximum VIF: 4.36471563878
+
+Statistics are written to results.txt
+</pre></div>
+
+Run VIF for five variables, setting the maximum VIF at 10, and opt 
+to retain the variable bio_5.
+
+<div class="code"><pre>r.vif maps=bio_1,bio_2,bio_3,bio_4,bio_5 
+maxvif=10 retain=bio_5 file=results.txt
+
+VIF bio_1 = 76.6694778809
+VIF bio_2 = 13.0044084945
+VIF bio_3 = 5.30160850802
+VIF bio_4 = 4.24765529428
+VIF bio_5 = 96.27418889
+VIF bio_2 = 2.11511290473
+VIF bio_3 = 4.34705117783
+VIF bio_4 = 3.83640053556
+VIF bio_5 = 1.37949300873
+
+selected variables are:
+bio_2, bio_3, bio_4, bio_5
+with as maximum VIF: 4.34705117783
+
+Statistics are written to results.txt
+</pre></div>
+
+In the second example, the bio_5 variable is retained, and the 
+variable bio_1, which has the next highest VIF, is removed. Note 
+that the final VIF of the second example is in fact (slightly) 
+lower, showing that default procedure may not always result in the 
+lowest possible VIF.
+
+<h2>SEE ALSO</h2>
+
+This add-on depends on <em><a href="http://grass.osgeo.org/grass70/manuals/r.regression.multi.html">
+r.regression.multi</a></em>
+
+<h2>Citation</h2>
+
+Suggested citation:
+
+<p>van Breugel, P., Friis, I., 
+Sebsebe Demissew, Lillesø, J-P.B., Kindt, R., [in publ.] Current and 
+future fire regimes and their influence on natural vegetation in 
+Ethiopia. Ecosystems (accepted for publication)
+
 <h2>AUTHOR</h2>
 
 Paulo van Breugel, paulo at ecodiv.org

Modified: grass-addons/grass7/raster/r.vif/r.vif.py
===================================================================
--- grass-addons/grass7/raster/r.vif/r.vif.py	2015-11-22 16:42:34 UTC (rev 66890)
+++ grass-addons/grass7/raster/r.vif/r.vif.py	2015-11-22 16:42:40 UTC (rev 66891)
@@ -16,7 +16,7 @@
 #               variables
 # Dependency:    r.regression.multi
 #
-# COPYRIGHT: (C) 2014 Paulo van Breugel
+# COPYRIGHT: (C) 2015 Paulo van Breugel
 #            http://ecodiv.org
 #            http://pvanb.wordpress.com/
 #
@@ -27,9 +27,10 @@
 ########################################################################
 #
 #%Module
-#% description: Calculate the variance inflation factor
+#% description: To calculate the stepwise variance inflation factor.
 #% keyword: raster
 #% keyword: variance inflation factor
+#% keyword: VIF
 #%End
 
 #%option
@@ -63,22 +64,12 @@
 #% key: maxvif
 #% type: string
 #% description: Maximum vif
-#% key_desc: string
+#% key_desc: number
 #%end
 
-#%option
-#% key: nsp
-#% type: string
-#% gisprompt: new
-#% description: random sample size
-#% key_desc: number or percentage
-#% required: no
-#%end
+# Testing
+#options = {"maps":"bio_1_Africa,bio_2_Africa,bio_3_Africa,bio_4_Africa,bio_5_Africa", "output":"bb.txt", retain":"bio_1_Africa", "maxvif":"20", "file":"aa.txt"}
 
-# Test purposes
-#options = {"maps":"bio1,bio5,bio6", "output":"bb", "nsp":"100", "retain":"bio1,bio2", "maxvif":"100", "file":"aa.txt"}
-#flags = {"m":True, "k":True, "n":False, "i":True, "k":True}
-
 #=======================================================================
 ## General
 #=======================================================================
@@ -86,12 +77,9 @@
 # import libraries
 import os
 import sys
-import uuid
 import atexit
 import math
 import tempfile
-import string
-import collections
 import numpy as np
 import grass.script as grass
 
@@ -100,6 +88,13 @@
     grass.message("You must be in GRASS GIS to run this program.")
     sys.exit(1)
 
+# Check if layers exist
+def CheckLayer(envlay):
+    for chl in xrange(len(envlay)):
+        ffile = grass.find_file(envlay[chl], element = 'cell')
+        if ffile['fullname'] == '':
+            grass.fatal("The layer " + envlay[chl] + " does not exist.")
+
 # create set to store names of temporary maps to be deleted upon exit
 clean_rast = set()
 
@@ -114,28 +109,22 @@
     # Variables
     IPF = options['maps']
     IPF = IPF.split(',')
+    CheckLayer(IPF)
     IPFn = [i.split('@')[0] for i in IPF]
+    MXVIF = options['maxvif']
+    if MXVIF != '':
+        MXVIF = float(MXVIF)
     IPR = options['retain']
     if IPR != '':
         IPR = IPR.split(',')
-        IPRn = [i.split('@')[0] for i in IPR]
-        iprn = collections.Counter(IPRn)
-        ipfn = collections.Counter(IPFn)
-        IPFn = list((ipfn-iprn).elements())
-        ipr = collections.Counter(IPR)
-        ipf = collections.Counter(IPF)
-        IPF = list((ipf-ipr).elements())
-        if len(IPFn) == 0:
-            grass.fatal("No variables to remove")
+        CheckLayer(IPR)
+    IPRn = [i.split('@')[0] for i in IPR]
     OPF = options['file']
     if OPF == '':
         OPF = tempfile.mkstemp()[1]
-    NSP = options['nsp']
-    MXVIF = float(options['maxvif'])
 
     # Test if text file exists. If so, append _v1 to file name
     k = 0
-    OPFold = OPF[:]
     while os.path.isfile(OPF):
         k = k + 1
         opft = OPF.split('.')
@@ -143,85 +132,87 @@
             OPF = opft[0] + "_v" + str(k)
         else:
             OPF = opft[0] + "_v" + str(k) + "." + opft[1]
-    if k > 0:
-        grass.info("there is already a file " + OPFold + ".")
-        grass.info("Using " + OPF + "_v" + str(k) + " instead")
 
-#=======================================================================
-## Calculate VIF and write to standard output (& optionally to file)
-#=======================================================================
+    #=======================================================================
+    ## Calculate VIF and write to standard output (& optionally to file)
+    #=======================================================================
 
-    # Create mask if nsp is set replace existing MASK if exist)
-    citiam = grass.find_file('MASK', element='cell',
-                             mapset=grass.gisenv()['MAPSET'])
-    if NSP != '':
-        if citiam['fullname'] != '':
-            tmpf0 = "rvif_mask_" + str(uuid.uuid4())
-            tmpf0 = string.replace(tmpf0, '-', '_')
-            grass.run_command("g.copy", quiet=True, raster=("MASK", tmpf0))
-            clean_rast.add(tmpf0)
-        tmpf1 = "rvif_" + str(uuid.uuid4())
-        tmpf1 = string.replace(tmpf1, '-', '_')
-        grass.run_command("r.random", quiet=True, input=IPF[0], n=NSP, raster=tmpf1)
-        grass.run_command("r.mask", quiet=True, overwrite=True, raster=tmpf1)
-        clean_rast.add(tmpf1)
-
-    # Open text file for writing and write heading
-    text_file = open(OPF, "w")
-
-HIER GEBLEVEN... NU NOG UITVOGELEN HOE DE VARIABLES ALTIJD BEWAARD BLIJVEN
-
     # Calculate VIF and write results to text file
-    rvifmx = MXVIF + 1
-    IPF2 = IPF[:]
-    IPFn2 = IPFn[:]
-    while MXVIF < rvifmx:
-        rvif = np.zeros(len(IPF2))
+    if MXVIF =='':
+        text_file = open(OPF, "w")
         text_file.write("variable\tvif\tsqrtvif\n")
-        for k in xrange(len(IPF2)):
-            MAPy = IPF2[k]
-            nMAPy = IPFn2[k]
-            MAPx = IPF2[:]
+        for k in xrange(len(IPF)):
+            MAPy = IPF[k]
+            nMAPy = IPFn[k]
+            MAPx = IPF[:]
             del MAPx[k]
             vifstat = grass.read_command("r.regression.multi",
                                flags="g", quiet=True,
                                mapx=MAPx, mapy=MAPy)
             vifstat = vifstat.split('\n')
             vifstat = [i.split('=') for i in vifstat]
-            vif = 1 / (1 - float(vifstat[1][1]))
+            if float(vifstat[1][1]) > 0.9999999999:
+                rsqr = 0.9999999999
+            else:
+                rsqr = float(vifstat[1][1])
+            vif = 1 / (1 - rsqr)
             sqrtvif = math.sqrt(vif)
             text_file.write(nMAPy + "\t" + str(round(vif, 3)) + "\t" + str(round(sqrtvif, 3)) + "\n")
-            rvif[k] = vif
+            print("VIF " + MAPy + " = " + str(vif))
+        text_file.close()
+    else:
+        text_file = open(OPF, "w")
+        rvifmx = MXVIF + 1
+        while MXVIF < rvifmx:
+            rvif = np.zeros(len(IPF))
+            text_file.write("variable\tvif\tsqrtvif\n")
+            for k in xrange(len(IPF)):
+                MAPy = IPF[k]
+                nMAPy = IPFn[k]
+                MAPx = IPF[:]
+                del MAPx[k]
+                vifstat = grass.read_command("r.regression.multi",
+                                   flags="g", quiet=True,
+                                   mapx=MAPx, mapy=MAPy)
+                vifstat = vifstat.split('\n')
+                vifstat = [i.split('=') for i in vifstat]
+                if float(vifstat[1][1]) > 0.9999999999:
+                    rsqr = 0.9999999999
+                else:
+                    rsqr = float(vifstat[1][1])
+                vif = 1 / (1 - rsqr)
+                sqrtvif = math.sqrt(vif)
+                text_file.write(nMAPy + "\t" + str(round(vif, 3)) + "\t" + str(round(sqrtvif, 3)) + "\n")
+                if IPFn[k] in IPRn:
+                    rvif[k] = -9999
+                else:
+                    rvif[k] = vif
+                print("VIF " + MAPy + " = " + str(vif))
 
-        rvifmx = max(rvif)
-        if rvifmx >= MXVIF:
-            rvifindex = np.argmax(rvif, axis=None)
-            varremove = IPF2[rvifindex]
-            text_file.write("\n")
-            text_file.write("Removing " + varremove)
-            text_file.write("\n---------------------------\n")
-            text_file.write("\n")
-            del IPF2[rvifindex]
-            del IPFn2[rvifindex]
-        else:
-            text_file.write("\n\n")
-            text_file.write("Final selection (above) has maximum VIF = " + str(round(rvifmx, 3)))
-    text_file.close()
+            rvifmx = max(rvif)
+            if rvifmx >= MXVIF:
+                rvifindex = np.argmax(rvif, axis=None)
+                varremove = IPF[rvifindex]
+                text_file.write("\n")
+                text_file.write("Removing " + varremove)
+                text_file.write("\n---------------------------\n")
+                text_file.write("\n")
+                del IPF[rvifindex]
+                del IPFn[rvifindex]
+            else:
+                text_file.write("\n\n")
+                text_file.write("Final selection (above) has maximum VIF = " + str(round(rvifmx, 3)))
+        text_file.close()
 
-    # Recover original mask
-    if NSP != '':
-        grass.run_command("r.mask", flags="r", quiet=True)
-        if citiam['fullname'] != '':
-            grass.mapcalc("MASK = $tmpf0",tmpf0=tmpf0, quiet=True)
-            grass.run_command("g.remove", quiet=True, flags="f", type="raster", name=tmpf0)
-        grass.run_command("g.remove", quiet=True, flags="f", type="raster", name=tmpf1)
+        # Write to std output
+        grass.info("")
+        grass.info("selected variables are: ")
+        grass.info(', '.join(IPFn))
+        grass.info("with as maximum VIF: " + str(rvifmx))
+    grass.info("")
+    grass.info("Statistics are written to " + OPF)
+    grass.info("")
 
-    # Write to std output
-    grass.info("selected variables are: ")
-    grass.info(', '.join(IPFn2))
-    grass.info("with as maximum VIF: " + str(rvifmx))
-    grass.info("Full statistics are in " + OPF)
-
 if __name__ == "__main__":
     options, flags = grass.parser()
     atexit.register(cleanup)



More information about the grass-commit mailing list