[GRASS-user] Cross-validation scripts for v.surf.rst?

Sun Apr 29 14:28:04 EDT 2007

Helena (and others):

    I am trying to create a bathymetry surface for the entire San
Francisco/Sacramento Bay Delta -- I'm trying to optimize the processing
speed since I'm guessing its going to be fairly slow -- is it possible to
use a mask so the algorithm isn't interpolating areas outside of the rivers?
The output is going to be at 3m resolution, which is why I'm concerned about
the amount of processing ahead of me.

--j 

On 4/27/07 5:48 PM, "Helena Mitasova" <hmitaso at unity.ncsu.edu> wrote:

> 
> On Apr 27, 2007, at 1:52 AM, Hamish wrote:
> 
>> Jonathan Greenberg wrote:
>>> 
>>> I was wondering if anyone had written/acquired any cross-validation
>>> scripts for v.surf.rst to optimize the tension/smoothing parameters
>>> (they are alluded to in the documentation)?
>> 
>> 
>> I was thinking the same. A shell script loop to test many values isn't
>> hard, although it might take a long time try all possibilities, log
>> v.univar results for each attempt, then search the result matrix
>> for the
>> smallest error combo.
> 
> Markus pointed out the script, few months ago, I have added links to
> pdf versions of
> the relevant papers to the man page
> http://grass.itc.it/grass63/manuals/html63_user/v.surf.rst.html
> (2002 and 2005 papers have some answers to the below questions)
> 
>> 
>> 
>> Questions:
>> 
>> * how does changing the region resolution affect the cross-validation
>> result? could you drop down to a half or quarter of the target raster
>> resolution to do the cross-validation tests and find the optimum
>> value,
>> then when back at full res will the best values for those still be the
>> same?
> 
> it should not affect it at all - crossvalidation should completely skip
> any raster computation and the values are computed only in the skipped
> points. Let me know if changing the resolution influences the
> computation
> or result in any way.
>> 
>> also how does changing the region res affect computational time? is
>> most
>> of the time is spent computing the splines, or by making the res
>> coarser
>> are you effectively changing npmin & segmax settings?
> 
> you are not changing npmin &segmax, but if you use default dmin -
> it is set to half cell size - you would be changing density of points
> if you
> have several points per cell and that in turn changes dnorm
> (normalization
> factor) that scales the tension - I have just put a hint into the book
> how to keep tension constant if your dnorm changes - I will add it to
> manual
> as time allows.
> 
>> should they be
>> adjusted in tandem with the resolution?
> 
> no, just keep dmin constant
> 
>> 
>> is choosing a small (representative) subregion at the original
>> resolution preferred? how much of an art is there to picking a
>> representative subregion? could the script first scan the map for a
>> subregion with similar morphometric indices/fractal depth/stdev/
>> point density/whatever/ as the overall map, to do the trials on?
> 
> with crossvalidation, bigger issue than choosing a representative
> subregion is to have representative input data in the first place -
> otherwise
> the parameters found by crossvalidation are not optimal - there is
> a lot of literature on when crossvalidation works and when it does not.
>> 
>> 
>> * are the smoothing and tension variables independent? (roughly):
>> min(f(smooth)) + min(f(tension)) == min( f(smooth,tension) )  ?
> 
> no - see the 2005 paper, they are linked (as you lower tension,
> smoothing effectively increases, preventing potential overshoots).
> 
>> can you hold one of those terms steady, find the best fit using the
>> other, then hold the other steady while you vary the first? will the
>> variables found in that way be the final answer, or if they are
>> somewhat
>> dependent should you use the result of the first set of tests as
>> hinting
>> to help repeat the experiment and thus spiral towards the center?
> 
> both approaches should work (see what Jaro used in the 2002 paper)
>> 
>> Are the effects simple/smooth enough that the script could be "smart"
>> and dynamically adjust step size by rate of change of the cross-
>> validation variance to quick hone in on the best parameters?
> 
> see the 2005 and 2002 papers to see how CV error changes with
> parameters - it is pretty smooth.
> 
> Regarding your question on what takes most computational time
> I am somewhat puzzled how much time the linear equation solver takes -
> it used to be the computation of the grid that took up most time
> (so crossvalidation was very fast because in each run you would compute
> value in just a single point). Now it is very slow and segmax & npmin
> that control the size of the system of equations make a huge difference
> in speed (so if you have dense enough points use e.g. segmax 30 and
> npmin=150 rather than the defaults and it will run much faster).
>   At some point for GRASS5* we have
> replaced the function that we have used (a C-rewrite of some
> old fortran program) by G_ludcmp which I assumed would be faster,
> but I am not sure that is the reason for the slow down. It can also be
> my fantasy because the data sets are now so much larger and it might
> have been the same.
> 
> Helena
> 
> 
>> 
>> 
>> Hamish

-- 
Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Cell: 415-794-5043
AIM: jgrn307
MSN: jgrn307 at hotmail.com