[GRASS-SVN] r68284 - grass-addons/grass7/raster/r.vif

svn_grass at osgeo.org svn_grass at osgeo.org
Tue Apr 19 07:20:49 PDT 2016


Author: pvanbosgeo
Date: 2016-04-19 07:20:49 -0700 (Tue, 19 Apr 2016)
New Revision: 68284

Added:
   grass-addons/grass7/raster/r.vif/r_vif_example1.png
Modified:
   grass-addons/grass7/raster/r.vif/r.vif.html
   grass-addons/grass7/raster/r.vif/r.vif.py
Log:
r.vif addon: improvements in how output is handled; adding option to only print finally selected variables; updated html file accordingly

Modified: grass-addons/grass7/raster/r.vif/r.vif.html
===================================================================
--- grass-addons/grass7/raster/r.vif/r.vif.html	2016-04-19 14:03:28 UTC (rev 68283)
+++ grass-addons/grass7/raster/r.vif/r.vif.html	2016-04-19 14:20:49 UTC (rev 68284)
@@ -26,6 +26,11 @@
 variable has the highest VIF, the variable with the next highest VIF 
 will be removed instead (see the examples).
 
+<p>The user can set the 's' flag to only print the finally selected
+variables to the standard output (console). Note that this only
+works when the stepwise selection procedure is invoked, i.e., when
+the maxvif is set.
+
 <h2>EXAMPLES</h2>
 
 The following examples are based on the nc_climate_spm_2000_2012
@@ -38,57 +43,132 @@
 
 <div class="code"><pre>g.region n=226000 s=168500 w=229500 e=298500</pre></div>
 
-<p>Example 1: Run VIF, setting the maximum VIF at 10. The function
-will print the VIF computed at each step to the console. The same
-will also be written to a text file. Below only the last part is
-shown with the VIF values of the finally selected variables are
-shown. The maximum VIF of 7.58 below is thus based on the R^2 of the
-regression of 2011_07_precip against the other 5 selected variables.
+<h3>Example 1</h3>
 
+Run VIF, setting the maximum VIF at 10. The function will print the
+VIF computed at each step to the console. The same will also be
+written to a text file. 
+
 <div class="code"><pre>MAPS=`g.list type=raster pattern=*2011*precip sep=,`
-r.vif maps=$MAPS file=results1.txt maxvif=10
+r.vif maps=$MAPS file=results1.csv maxvif=10
+</pre></div>
 
+Below the results are shown (here only the first and last part of the
+output is shown to save space):
+
+<div class="code"><pre>
+VIF round 1
+--------------------------------------
+variable            vif  sqrtvif
+2011_01_precip    65.03     8.06
+2011_02_precip    29.10     5.39
+2011_03_precip    40.20     6.34
+2011_04_precip    13.12     3.62
+2011_05_precip     6.81     2.61
+
 ...
-VIF 2011_01_precip = 4.24671728754
-VIF 2011_04_precip = 5.22212300189
-VIF 2011_05_precip = 4.86092882628
-VIF 2011_06_precip = 7.12641548428
-VIF 2011_07_precip = 7.5778242551
-VIF 2011_08_precip = 3.3151772128
+...
 
-selected variables are:
-2011_01_precip, 2011_04_precip, 2011_05_precip, 2011_06_precip,
-2011_07_precip, 2011_08_precip
-with as maximum VIF: 7.5778242551
+VIF round 7
+--------------------------------------
+variable            vif  sqrtvif
+2011_01_precip     4.25     2.06
+2011_04_precip     5.22     2.29
+2011_05_precip     4.86     2.20
+2011_06_precip     7.13     2.67
+2011_07_precip     7.58     2.75
+2011_08_precip     3.32     1.82
 
-Statistics are written to results1.txt
+selected variables are: 
+--------------------------------------
+2011_01_precip, 2011_04_precip, 2011_05_precip, 2011_06_precip, 2011_07_precip, 2011_08_precip
+
+Statistics are written to results.csv
 </pre></div>
 
-<p>Example 2: Run the same VIF analysis as above, but this time tell
-the function to retain the variable 2011_02_precip. The maximum VIF
-of 9.72 below is based on the R^2 of the regression of
-2011_06_precip against the other 6 selected variables.
+The same results are (optionally) written to a comma delimited file
+(csv). It contains variables and the corresponding vif and sqrt(vif)
+for each round in the stepwise variable selection. The column
+'removed' gives the variable that was removed in the previous round.
 
+<p><img src="r_vif_example1.png">
+
+<h3>Example 2</h3>
+
+Run the same VIF analysis as above, but this time tell
+the function to retain the variable 2011_02_precip. Only the last
+few lines of the results are shown below. As you can see, a
+different set of variables is selected, which includes the variable
+'2011_02_precip'.
+
 <div class="code"><pre>MAPS=`g.list type=raster pattern=*2011*precip sep=,`
-r.vif maps=$MAPS maxvif=10 retain=2011_02_precip file=results2.txt
+r.vif maps=$MAPS maxvif=10 retain=2011_02_precip file=results2.csv
+</pre></div>
 
+The output is:
+
+<div class="code"><pre>
 ...
-VIF 2011_02_precip = 9.29298936882
-VIF 2011_03_precip = 9.01810836159
-VIF 2011_04_precip = 6.2991729186
-VIF 2011_05_precip = 4.99445615367
-VIF 2011_06_precip = 9.71883412866
-VIF 2011_07_precip = 8.35889762858
-VIF 2011_08_precip = 3.30091831548
+...
+VIF round 6
+--------------------------------------
+variable            vif  sqrtvif
+2011_02_precip     9.29     3.05
+2011_03_precip     9.02     3.00
+2011_04_precip     6.30     2.51
+2011_05_precip     4.99     2.23
+2011_06_precip     9.72     3.12
+2011_07_precip     8.36     2.89
+2011_08_precip     3.30     1.82
 
-selected variables are:
-2011_02_precip, 2011_03_precip, 2011_04_precip, 2011_05_precip,
-2011_06_precip, 2011_07_precip, 2011_08_precip
-with as maximum VIF: 9.71883412866
+selected variables are: 
+--------------------------------------
+2011_02_precip, 2011_03_precip, 2011_04_precip, 2011_05_precip, 2011_06_precip, 2011_07_precip, 2011_08_precip
 
-Statistics are written to results2.txt
+Statistics are written to results2.csv
+
 </pre></div>
 
+<h3>Example 3</h3>
+
+Like example 1, but without writing the results to
+file, and with the 's' flag set, which means only the list with
+finally selected variables are printed to screen. This output can be
+directl parsed in a script. 
+
+<div class="code"><pre>MAPS=`g.list type=raster pattern=*2011*precip sep=,`
+r.vif -s maps=$MAPS maxvif=10
+
+</pre></div>
+
+The output is:
+
+<div class="code"><pre>
+2011_01_precip,2011_04_precip,2011_05_precip,2011_06_precip,2011_07_precip,2011_08_precip
+</pre></div>
+
+This output can be captured in a variable 'SELECTION', which is used
+as input in <i>i.group</i> to create a group.
+
+<div class="code"><pre>MAPS=`g.list type=raster pattern=*2011*precip sep=,`
+SELECTION=`r.vif -s maps=$MAPS maxvif=10`
+i.group group=group_example input=$SELECTION
+
+</pre></div>
+
+This selects raster layers using the r.vif functions, and adds these
+raster layers to the group 'group_example':
+
+<div class="code"><pre>
+Adding raster map <2011_01_precip at climate_1970_2012> to group
+Adding raster map <2011_04_precip at climate_1970_2012> to group
+Adding raster map <2011_05_precip at climate_1970_2012> to group
+Adding raster map <2011_06_precip at climate_1970_2012> to group
+Adding raster map <2011_07_precip at climate_1970_2012> to group
+Adding raster map <2011_08_precip at climate_1970_2012> to group
+
+</pre></div>
+
 <h2>Citation</h2>
 
 Suggested citation:

Modified: grass-addons/grass7/raster/r.vif/r.vif.py
===================================================================
--- grass-addons/grass7/raster/r.vif/r.vif.py	2016-04-19 14:03:28 UTC (rev 68283)
+++ grass-addons/grass7/raster/r.vif/r.vif.py	2016-04-19 14:20:49 UTC (rev 68284)
@@ -66,15 +66,23 @@
 #% guisection: Input
 #%end
 
-#=======================================================================
+#%flag
+#% key: s
+#% description: Only print selected variables
+#%end
+
+#%rules
+#%requires_all: -s,maxvif
+#%end
+
+#==============================================================================
 ## General
-#=======================================================================
+#==============================================================================
 
 # import libraries
 import os
 import sys
 import math
-import tempfile
 import numpy as np
 import grass.script as grass
 
@@ -112,22 +120,32 @@
     if MXVIF != '':
         MXVIF = float(MXVIF)
     OPF = options['file']
-    if OPF == '':
-        idf, OPF = tempfile.mkstemp()
+    flag_s = flags['s']
 
-    #=======================================================================
+    #==========================================================================
     ## Calculate VIF and write to standard output (& optionally to file)
-    #=======================================================================
+    #==========================================================================
 
-    # Calculate VIF and write results to text file
+    # Determine maximum width of the columns to be printed to std output
     name_lengths = []
     for i in IPF:
         name_lengths.append(len(i))
     nlength = max(name_lengths)
+
+    # Create arrays to hold results (which will be written to file at end)
+    out_vif = []
+    out_sqrt = []
+    out_variable = []
+    out_round = []
+
+    # VIF is computed for each variable
+    #--------------------------------------------------------------------------
     if MXVIF =='':
-        text_file = open(OPF, "w")
-        text_file.write("variable,vif,sqrtvif\n")
-        grass.info('{0[0]:{1}s} {0[1]:8s} {0[2]:8s}'.format(['variable', 'vif', 'sqrtvif'], nlength))
+        # Print header of table to std output
+        print('{0[0]:{1}s} {0[1]:8s} {0[2]:8s}'.format(
+            ['variable', 'vif', 'sqrtvif'], nlength))
+
+        # Compute the VIF
         for k in xrange(len(IPF)):
             MAPy = IPF[k]
             MAPx = IPF[:]
@@ -144,22 +162,46 @@
                 rsqr = float(vifstat[1][1])
                 vif = 1 / (1 - rsqr)
                 sqrtvif = math.sqrt(vif)
-            RES = [MAPy, vif, sqrtvif]
-            text_file.write('{0[0]}, {0[1]}, {0[2]}\n'.format(RES))
-            grass.info('{0[0]:{1}s} {0[1]:8.2f} {0[2]:8.2f}'.format(RES, nlength))
-        text_file.close()
+
+            # Write results to array (this will be printed to file at end
+            # of script)
+            out_vif.append(vif)
+            out_sqrt.append(sqrtvif)
+            out_variable.append(MAPy)
+            print('{0[0]:{1}s} {0[1]:8.2f} {0[2]:8.2f}'.format([MAPy, vif,
+                  sqrtvif], nlength))
+        print
+        if len(OPF) > 0:
+            print("Statistics are written to " + OPF + "\n")
+
+    # The stepwise variable selection procedure is run
+    #--------------------------------------------------------------------------
     else:
-        text_file = open(OPF, "w")
         rvifmx = MXVIF + 1
         m = 0
+        remove_variable = 'none'
+        out_removed = []
+        txt_message = "Working"
+
+        # The VIF will be computed across all variables. Next, the variable
+        # with highest vif is removed and the procedure is repeated. This
+        # continuous till the maximum vif across the variables > maxvif
         while MXVIF < rvifmx:
             m += 1
-            grass.info("\n")
-            grass.info("VIF round " + str(m))
-            grass.info("----------------------------------------")
             rvif = np.zeros(len(IPF))
-            text_file.write("variable\tvif\tsqrtvif\n")
-            grass.info('{0[0]:{1}s} {0[1]:8s} {0[2]:8s}'.format(['variable', 'vif', 'sqrtvif'], nlength))
+
+            # print the header of the output table to the console
+            if not flag_s:
+                print
+                print("VIF round " + str(m))
+                print("--------------------------------------")
+                print('{0[0]:{1}s} {0[1]:>8s} {0[2]:>8s}'.format(
+                    ['variable', 'vif', 'sqrtvif'], nlength))
+            else:
+                txt_message = txt_message + "."
+                grass.message(txt_message)
+
+            # Compute the VIF and sqrt(vif) for all variables in this round
             for k in xrange(len(IPF)):
                 MAPy = IPF[k]
                 MAPx = IPF[:]
@@ -176,40 +218,67 @@
                     rsqr = float(vifstat[1][1])
                     vif = 1 / (1 - rsqr)
                     sqrtvif = math.sqrt(vif)
-                RES = [MAPy, vif, sqrtvif]
-                text_file.write('{0[0]}, {0[1]}, {0[2]}\n'.format(RES))
-                grass.info('{0[0]:{1}s} {0[1]:8.2f} {0[2]:8.2f}'.format(RES, nlength))
+
+                # Write results to arrays (which will be written to csv file
+                # at end of script)
+                out_vif.append(vif)
+                out_sqrt.append(sqrtvif)
+                out_variable.append(MAPy)
+                out_round.append(m)
+                out_removed.append(remove_variable)
+
+                # print result to console
+                if not flag_s:
+                    print('{0[0]:{1}s} {0[1]:8.2f} {0[2]:8.2f}'.format([MAPy,
+                          vif, sqrtvif], nlength))
+
+                # If variable is set to be retained by the user, the VIF
+                # is set to -9999 to ensure it will not have highest VIF
                 if IPFn[k] in IPRn:
                     rvif[k] = -9999
                 else:
                     rvif[k] = vif
 
+            # Compute the maximum vif across the variables for this round and
+            # remove the variable with the highest VIF
             rvifmx = max(rvif)
             if rvifmx >= MXVIF:
                 rvifindex = np.argmax(rvif, axis=None)
-                varremove = IPF[rvifindex]
-                text_file.write("\n")
-                text_file.write("Removing " + varremove)
-                text_file.write("\n---------------------------\n")
-                text_file.write("\n")
+                remove_variable = IPFn[rvifindex]
                 del IPF[rvifindex]
                 del IPFn[rvifindex]
+
+
+        # Write final selected variables to std output
+        if not flag_s:
+            print
+            print("selected variables are: ")
+            print("--------------------------------------")
+            print(', '.join(IPFn))
+            print
+            if len(OPF) > 0:
+                print("Statistics are written to " + OPF + "\n")
+            print
+        else:
+            print(','.join(IPFn))
+
+    if len(OPF) > 0:
+        try:
+            text_file = open(OPF, "w")
+            if MXVIF =='':
+                text_file.write("variable,vif,sqrtvif\n")
+                for i in xrange(len(out_vif)):
+                    text_file.write('{0:s},{1:.6f},{2:.6f}\n'.format(
+                        out_variable[i], out_vif[i], out_sqrt[i]))
             else:
-                text_file.write("\n\n")
-                text_file.write("Final selection (above) has maximum VIF = " + str(round(rvifmx, 3)))
-        text_file.close()
+                text_file.write("round,removed,variable,vif,sqrtvif\n")
+                for i in xrange(len(out_vif)):
+                    text_file.write('{0:d},{1:s},{2:s},{3:.6f},{4:.6f}\n'.format(
+                        out_round[i], out_removed[i],out_variable[i],
+                         out_vif[i], out_sqrt[i]))
+        finally:
+            text_file.close()
 
-        # Write to std output
-        grass.info("")
-        grass.info("selected variables are: ")
-        grass.info("----------------------------------------")
-        grass.info(', '.join(IPFn))
-        grass.info("with as maximum VIF: " + str(rvifmx))
-    grass.info("")
-    if options['file'] != '':
-        grass.info("Statistics are written to " + OPF + "\n")
-    grass.info("")
-
 if __name__ == "__main__":
     options, flags = grass.parser()
     sys.exit(main())
@@ -217,5 +286,3 @@
 
 
 
-
-

Added: grass-addons/grass7/raster/r.vif/r_vif_example1.png
===================================================================
(Binary files differ)


Property changes on: grass-addons/grass7/raster/r.vif/r_vif_example1.png
___________________________________________________________________
Added: svn:mime-type
   + application/octet-stream



More information about the grass-commit mailing list