[GRASS-SVN] r68284 - grass-addons/grass7/raster/r.vif
svn_grass at osgeo.org
svn_grass at osgeo.org
Tue Apr 19 07:20:49 PDT 2016
Author: pvanbosgeo
Date: 2016-04-19 07:20:49 -0700 (Tue, 19 Apr 2016)
New Revision: 68284
Added:
grass-addons/grass7/raster/r.vif/r_vif_example1.png
Modified:
grass-addons/grass7/raster/r.vif/r.vif.html
grass-addons/grass7/raster/r.vif/r.vif.py
Log:
r.vif addon: improvements in how output is handled; adding option to only print finally selected variables; updated html file accordingly
Modified: grass-addons/grass7/raster/r.vif/r.vif.html
===================================================================
--- grass-addons/grass7/raster/r.vif/r.vif.html 2016-04-19 14:03:28 UTC (rev 68283)
+++ grass-addons/grass7/raster/r.vif/r.vif.html 2016-04-19 14:20:49 UTC (rev 68284)
@@ -26,6 +26,11 @@
variable has the highest VIF, the variable with the next highest VIF
will be removed instead (see the examples).
+<p>The user can set the 's' flag to only print the finally selected
+variables to the standard output (console). Note that this only
+works when the stepwise selection procedure is invoked, i.e., when
+the maxvif is set.
+
<h2>EXAMPLES</h2>
The following examples are based on the nc_climate_spm_2000_2012
@@ -38,57 +43,132 @@
<div class="code"><pre>g.region n=226000 s=168500 w=229500 e=298500</pre></div>
-<p>Example 1: Run VIF, setting the maximum VIF at 10. The function
-will print the VIF computed at each step to the console. The same
-will also be written to a text file. Below only the last part is
-shown with the VIF values of the finally selected variables are
-shown. The maximum VIF of 7.58 below is thus based on the R^2 of the
-regression of 2011_07_precip against the other 5 selected variables.
+<h3>Example 1</h3>
+Run VIF, setting the maximum VIF at 10. The function will print the
+VIF computed at each step to the console. The same will also be
+written to a text file.
+
<div class="code"><pre>MAPS=`g.list type=raster pattern=*2011*precip sep=,`
-r.vif maps=$MAPS file=results1.txt maxvif=10
+r.vif maps=$MAPS file=results1.csv maxvif=10
+</pre></div>
+Below the results are shown (here only the first and last part of the
+output is shown to save space):
+
+<div class="code"><pre>
+VIF round 1
+--------------------------------------
+variable vif sqrtvif
+2011_01_precip 65.03 8.06
+2011_02_precip 29.10 5.39
+2011_03_precip 40.20 6.34
+2011_04_precip 13.12 3.62
+2011_05_precip 6.81 2.61
+
...
-VIF 2011_01_precip = 4.24671728754
-VIF 2011_04_precip = 5.22212300189
-VIF 2011_05_precip = 4.86092882628
-VIF 2011_06_precip = 7.12641548428
-VIF 2011_07_precip = 7.5778242551
-VIF 2011_08_precip = 3.3151772128
+...
-selected variables are:
-2011_01_precip, 2011_04_precip, 2011_05_precip, 2011_06_precip,
-2011_07_precip, 2011_08_precip
-with as maximum VIF: 7.5778242551
+VIF round 7
+--------------------------------------
+variable vif sqrtvif
+2011_01_precip 4.25 2.06
+2011_04_precip 5.22 2.29
+2011_05_precip 4.86 2.20
+2011_06_precip 7.13 2.67
+2011_07_precip 7.58 2.75
+2011_08_precip 3.32 1.82
-Statistics are written to results1.txt
+selected variables are:
+--------------------------------------
+2011_01_precip, 2011_04_precip, 2011_05_precip, 2011_06_precip, 2011_07_precip, 2011_08_precip
+
+Statistics are written to results.csv
</pre></div>
-<p>Example 2: Run the same VIF analysis as above, but this time tell
-the function to retain the variable 2011_02_precip. The maximum VIF
-of 9.72 below is based on the R^2 of the regression of
-2011_06_precip against the other 6 selected variables.
+The same results are (optionally) written to a comma delimited file
+(csv). It contains variables and the corresponding vif and sqrt(vif)
+for each round in the stepwise variable selection. The column
+'removed' gives the variable that was removed in the previous round.
+<p><img src="r_vif_example1.png">
+
+<h3>Example 2</h3>
+
+Run the same VIF analysis as above, but this time tell
+the function to retain the variable 2011_02_precip. Only the last
+few lines of the results are shown below. As you can see, a
+different set of variables is selected, which includes the variable
+'2011_02_precip'.
+
<div class="code"><pre>MAPS=`g.list type=raster pattern=*2011*precip sep=,`
-r.vif maps=$MAPS maxvif=10 retain=2011_02_precip file=results2.txt
+r.vif maps=$MAPS maxvif=10 retain=2011_02_precip file=results2.csv
+</pre></div>
+The output is:
+
+<div class="code"><pre>
...
-VIF 2011_02_precip = 9.29298936882
-VIF 2011_03_precip = 9.01810836159
-VIF 2011_04_precip = 6.2991729186
-VIF 2011_05_precip = 4.99445615367
-VIF 2011_06_precip = 9.71883412866
-VIF 2011_07_precip = 8.35889762858
-VIF 2011_08_precip = 3.30091831548
+...
+VIF round 6
+--------------------------------------
+variable vif sqrtvif
+2011_02_precip 9.29 3.05
+2011_03_precip 9.02 3.00
+2011_04_precip 6.30 2.51
+2011_05_precip 4.99 2.23
+2011_06_precip 9.72 3.12
+2011_07_precip 8.36 2.89
+2011_08_precip 3.30 1.82
-selected variables are:
-2011_02_precip, 2011_03_precip, 2011_04_precip, 2011_05_precip,
-2011_06_precip, 2011_07_precip, 2011_08_precip
-with as maximum VIF: 9.71883412866
+selected variables are:
+--------------------------------------
+2011_02_precip, 2011_03_precip, 2011_04_precip, 2011_05_precip, 2011_06_precip, 2011_07_precip, 2011_08_precip
-Statistics are written to results2.txt
+Statistics are written to results2.csv
+
</pre></div>
+<h3>Example 3</h3>
+
+Like example 1, but without writing the results to
+file, and with the 's' flag set, which means only the list with
+finally selected variables are printed to screen. This output can be
+directl parsed in a script.
+
+<div class="code"><pre>MAPS=`g.list type=raster pattern=*2011*precip sep=,`
+r.vif -s maps=$MAPS maxvif=10
+
+</pre></div>
+
+The output is:
+
+<div class="code"><pre>
+2011_01_precip,2011_04_precip,2011_05_precip,2011_06_precip,2011_07_precip,2011_08_precip
+</pre></div>
+
+This output can be captured in a variable 'SELECTION', which is used
+as input in <i>i.group</i> to create a group.
+
+<div class="code"><pre>MAPS=`g.list type=raster pattern=*2011*precip sep=,`
+SELECTION=`r.vif -s maps=$MAPS maxvif=10`
+i.group group=group_example input=$SELECTION
+
+</pre></div>
+
+This selects raster layers using the r.vif functions, and adds these
+raster layers to the group 'group_example':
+
+<div class="code"><pre>
+Adding raster map <2011_01_precip at climate_1970_2012> to group
+Adding raster map <2011_04_precip at climate_1970_2012> to group
+Adding raster map <2011_05_precip at climate_1970_2012> to group
+Adding raster map <2011_06_precip at climate_1970_2012> to group
+Adding raster map <2011_07_precip at climate_1970_2012> to group
+Adding raster map <2011_08_precip at climate_1970_2012> to group
+
+</pre></div>
+
<h2>Citation</h2>
Suggested citation:
Modified: grass-addons/grass7/raster/r.vif/r.vif.py
===================================================================
--- grass-addons/grass7/raster/r.vif/r.vif.py 2016-04-19 14:03:28 UTC (rev 68283)
+++ grass-addons/grass7/raster/r.vif/r.vif.py 2016-04-19 14:20:49 UTC (rev 68284)
@@ -66,15 +66,23 @@
#% guisection: Input
#%end
-#=======================================================================
+#%flag
+#% key: s
+#% description: Only print selected variables
+#%end
+
+#%rules
+#%requires_all: -s,maxvif
+#%end
+
+#==============================================================================
## General
-#=======================================================================
+#==============================================================================
# import libraries
import os
import sys
import math
-import tempfile
import numpy as np
import grass.script as grass
@@ -112,22 +120,32 @@
if MXVIF != '':
MXVIF = float(MXVIF)
OPF = options['file']
- if OPF == '':
- idf, OPF = tempfile.mkstemp()
+ flag_s = flags['s']
- #=======================================================================
+ #==========================================================================
## Calculate VIF and write to standard output (& optionally to file)
- #=======================================================================
+ #==========================================================================
- # Calculate VIF and write results to text file
+ # Determine maximum width of the columns to be printed to std output
name_lengths = []
for i in IPF:
name_lengths.append(len(i))
nlength = max(name_lengths)
+
+ # Create arrays to hold results (which will be written to file at end)
+ out_vif = []
+ out_sqrt = []
+ out_variable = []
+ out_round = []
+
+ # VIF is computed for each variable
+ #--------------------------------------------------------------------------
if MXVIF =='':
- text_file = open(OPF, "w")
- text_file.write("variable,vif,sqrtvif\n")
- grass.info('{0[0]:{1}s} {0[1]:8s} {0[2]:8s}'.format(['variable', 'vif', 'sqrtvif'], nlength))
+ # Print header of table to std output
+ print('{0[0]:{1}s} {0[1]:8s} {0[2]:8s}'.format(
+ ['variable', 'vif', 'sqrtvif'], nlength))
+
+ # Compute the VIF
for k in xrange(len(IPF)):
MAPy = IPF[k]
MAPx = IPF[:]
@@ -144,22 +162,46 @@
rsqr = float(vifstat[1][1])
vif = 1 / (1 - rsqr)
sqrtvif = math.sqrt(vif)
- RES = [MAPy, vif, sqrtvif]
- text_file.write('{0[0]}, {0[1]}, {0[2]}\n'.format(RES))
- grass.info('{0[0]:{1}s} {0[1]:8.2f} {0[2]:8.2f}'.format(RES, nlength))
- text_file.close()
+
+ # Write results to array (this will be printed to file at end
+ # of script)
+ out_vif.append(vif)
+ out_sqrt.append(sqrtvif)
+ out_variable.append(MAPy)
+ print('{0[0]:{1}s} {0[1]:8.2f} {0[2]:8.2f}'.format([MAPy, vif,
+ sqrtvif], nlength))
+ print
+ if len(OPF) > 0:
+ print("Statistics are written to " + OPF + "\n")
+
+ # The stepwise variable selection procedure is run
+ #--------------------------------------------------------------------------
else:
- text_file = open(OPF, "w")
rvifmx = MXVIF + 1
m = 0
+ remove_variable = 'none'
+ out_removed = []
+ txt_message = "Working"
+
+ # The VIF will be computed across all variables. Next, the variable
+ # with highest vif is removed and the procedure is repeated. This
+ # continuous till the maximum vif across the variables > maxvif
while MXVIF < rvifmx:
m += 1
- grass.info("\n")
- grass.info("VIF round " + str(m))
- grass.info("----------------------------------------")
rvif = np.zeros(len(IPF))
- text_file.write("variable\tvif\tsqrtvif\n")
- grass.info('{0[0]:{1}s} {0[1]:8s} {0[2]:8s}'.format(['variable', 'vif', 'sqrtvif'], nlength))
+
+ # print the header of the output table to the console
+ if not flag_s:
+ print
+ print("VIF round " + str(m))
+ print("--------------------------------------")
+ print('{0[0]:{1}s} {0[1]:>8s} {0[2]:>8s}'.format(
+ ['variable', 'vif', 'sqrtvif'], nlength))
+ else:
+ txt_message = txt_message + "."
+ grass.message(txt_message)
+
+ # Compute the VIF and sqrt(vif) for all variables in this round
for k in xrange(len(IPF)):
MAPy = IPF[k]
MAPx = IPF[:]
@@ -176,40 +218,67 @@
rsqr = float(vifstat[1][1])
vif = 1 / (1 - rsqr)
sqrtvif = math.sqrt(vif)
- RES = [MAPy, vif, sqrtvif]
- text_file.write('{0[0]}, {0[1]}, {0[2]}\n'.format(RES))
- grass.info('{0[0]:{1}s} {0[1]:8.2f} {0[2]:8.2f}'.format(RES, nlength))
+
+ # Write results to arrays (which will be written to csv file
+ # at end of script)
+ out_vif.append(vif)
+ out_sqrt.append(sqrtvif)
+ out_variable.append(MAPy)
+ out_round.append(m)
+ out_removed.append(remove_variable)
+
+ # print result to console
+ if not flag_s:
+ print('{0[0]:{1}s} {0[1]:8.2f} {0[2]:8.2f}'.format([MAPy,
+ vif, sqrtvif], nlength))
+
+ # If variable is set to be retained by the user, the VIF
+ # is set to -9999 to ensure it will not have highest VIF
if IPFn[k] in IPRn:
rvif[k] = -9999
else:
rvif[k] = vif
+ # Compute the maximum vif across the variables for this round and
+ # remove the variable with the highest VIF
rvifmx = max(rvif)
if rvifmx >= MXVIF:
rvifindex = np.argmax(rvif, axis=None)
- varremove = IPF[rvifindex]
- text_file.write("\n")
- text_file.write("Removing " + varremove)
- text_file.write("\n---------------------------\n")
- text_file.write("\n")
+ remove_variable = IPFn[rvifindex]
del IPF[rvifindex]
del IPFn[rvifindex]
+
+
+ # Write final selected variables to std output
+ if not flag_s:
+ print
+ print("selected variables are: ")
+ print("--------------------------------------")
+ print(', '.join(IPFn))
+ print
+ if len(OPF) > 0:
+ print("Statistics are written to " + OPF + "\n")
+ print
+ else:
+ print(','.join(IPFn))
+
+ if len(OPF) > 0:
+ try:
+ text_file = open(OPF, "w")
+ if MXVIF =='':
+ text_file.write("variable,vif,sqrtvif\n")
+ for i in xrange(len(out_vif)):
+ text_file.write('{0:s},{1:.6f},{2:.6f}\n'.format(
+ out_variable[i], out_vif[i], out_sqrt[i]))
else:
- text_file.write("\n\n")
- text_file.write("Final selection (above) has maximum VIF = " + str(round(rvifmx, 3)))
- text_file.close()
+ text_file.write("round,removed,variable,vif,sqrtvif\n")
+ for i in xrange(len(out_vif)):
+ text_file.write('{0:d},{1:s},{2:s},{3:.6f},{4:.6f}\n'.format(
+ out_round[i], out_removed[i],out_variable[i],
+ out_vif[i], out_sqrt[i]))
+ finally:
+ text_file.close()
- # Write to std output
- grass.info("")
- grass.info("selected variables are: ")
- grass.info("----------------------------------------")
- grass.info(', '.join(IPFn))
- grass.info("with as maximum VIF: " + str(rvifmx))
- grass.info("")
- if options['file'] != '':
- grass.info("Statistics are written to " + OPF + "\n")
- grass.info("")
-
if __name__ == "__main__":
options, flags = grass.parser()
sys.exit(main())
@@ -217,5 +286,3 @@
-
-
Added: grass-addons/grass7/raster/r.vif/r_vif_example1.png
===================================================================
(Binary files differ)
Property changes on: grass-addons/grass7/raster/r.vif/r_vif_example1.png
___________________________________________________________________
Added: svn:mime-type
+ application/octet-stream
More information about the grass-commit
mailing list