[GRASS-dev] question about any July changes in hydrology functions

Thu Aug 2 12:22:27 EDT 2007

Glynn and Helena,

Thanks very much for the information and advice.

So the upshot is, for this kind of simulation modeling, we avoid using a
cut-off function and avoid using a median filter. Both create slight but
abrupt discontinuities in the landscape which can amplify in recursive runs.

I think we'll try just using the r.neighbors smoothed map (using a mean
smoother) rather than some function of the original and smoothed map.

I have a couple further questions before we launch into revising these
scripts.

1. Is there any speed advantage/disadvantage to using r.mfilter over
r.neighbors? At the moment, because we are trying to get rid of occasional
spikes, I don't think that a weighted mean will give better results than an
unweighted mean. But the only way to get functions like a weighted mean is
to use r.mfilter, so it would be good to know if there are any known
performance hits or if it is faster.

2. We are starting with a depressionless DEM, but now I'm wondering if it
would be a good idea to fill any depressions between each iteration of the
simulation. If so, it looks like r.terraflow builds this into its routine
and could save considerable time over running r.fill.dir each time. Any
thoughts along those lines?

Thanks again
Michael

On 8/2/07 5:14 AM, "Glynn Clements" <glynn at gclements.plus.com> wrote:

> 
> Michael Barton wrote:
> 
>>> I thought that smoothing will take care of it but apparently just the
>>> opposite
>> 
>> This does seem strange. Even odder is the fact that the anomalies get bigger
>> the larger the smoothing neighborhood. Usually, increasing the neighborhood
>> creates greater smoothing.
> 
> That's the case if the neighbours are being averaged, but it doesn't
> hold for a quantile.
> 
>> It might be the way we are using smoothing. We check to see if an
>> erosion/deposition value exceeds the smoothed value by some user-determined
>> amount. It is does, the smoothed value is used instead of the original
>> value. The goal was to get rid of extreme values but leave the rest alone.
> 
> That will typically cause gradient discontinuities as you switch
> between filtered and unfiltered data. That wouldn't necessarily be an
> issue in itself, but it will be if you start calculating derivatives
> on the result.
> 
> Actually, ignore my earlier suggestion of using a
> median-of-filtered-values approach. That will have similar problems
> regarding derivatives.
> 
> Moreover, any mechanism that has a discrete "cut" (i.e. if <criterion>
> then <value1> else <value2>) will have the same issue. You need to
> find some way to introduce a continuous transition. E.g. rather than
> selecting filtered vs unfiltered based upon a threshold, interpolate
> between filtered and unfiltered based upon the distance from the
> threshold, turning the discrete step into an ogive (S-curve).
> 
> Even then, any kind of nonlinearity risks introducing artifacts when
> it occurs as part of a complex process, particularly if it's
> iterative.
> 
> In general, iterative processes won't end up being stable just because
> you want them to be stable. If there's any kind of "amplification"
> involved, instability is likely to be the "default" behaviour;
> stability has to be forced.
> 
>>> BTW as I understand it median is computed by dividing the values into
>>> a finite number of classes
>>> so the median from a continuous field would not change continuously
>>> from cell to cell and
>>> the effect you are getting could be expected especially for larger
>>> number of cells.
>> 
>> I don't know how r.neighbors computes the median. However, it is supposed to
>> be a value such that half the neighbor cells have larger values and half
>> have smaller values. I would have thought that the way to compute would be
>> to order the neighborhood values from smallest to largest and find the
>> middle. Since the neighborhood is always an odd number, the middle is always
>> a single cell. 
> 
> This is essentially what happens, although if there are any null
> cells, the median may be computed over an even number of cells
> resulting in the mean of the two middle cells. The actual
> implementation (from lib/stats/c_median.c) is:
> 
> void c_median(DCELL *result, DCELL *values, int n)
> {
> n = sort_cell(values, n);
> 
> if (n < 1)
> G_set_d_null_value(result, 1);
> else
> *result = (values[(n-1)/2] + values[n/2]) / 2;
> }
> 
> where sort_cell() sorts the cell values into ascending order.
> 
> [If the number of cells is odd, (n-1)/2 and n/2 will be equal, so the
> middle value is averaged with itself.]
> 
>>> Mean is computed directly from the floating point values - is that
>>> right?
>> 
>> This ought to be the sum of the neighbor cell values divided by the number
>> of cells. 
>> 
>> The median is less affected by a single extreme cell than the mean, which is
>> why we preferred the median. However, if the median calculation algorithm is
>> problematic, then we'll have to switch to the mean.
> 
> It's problematic if you're going to differentiate the result, because
> it's a discontinuous function; it's essentially a tree of if/then/else
> choices.

__________________________________________
Michael Barton, Professor of Anthropology
Director of Graduate Studies
School of Human Evolution & Social Change
Center for Social Dynamics and Complexity
Arizona State University

phone: 480-965-6213
fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton