[GRASS-dev] Re: [GRASS-user] r.watershed speed-up

Tue Jul 29 19:09:57 EDT 2008

I just want to note something very tactfully mentioned below. That is,  
all such algorithms are estimates of real watershed behavior. AFAICT,  
r.watershed produces a very good estimate of these parameters and  
dynamics. Nevertheless, we should be careful about trying to exactly  
replicate its results--as if it IS the real world. We should not  
assume a priori that because they differ slightly that the new one is  
worse. We should at least consider the possibility that the new  
r.watershed.fast algorithm is equally good as the original in that it  
can replicate it to such a high degree. The small differences between  
the two could simply be where both are less able to model real  
landscapes (flat areas) and produce equally good estimates even though  
they differ slightly. It is also possible that the new algorithm makes  
r.watershed.fast an even closer match to reality than r.watershed and  
the differences are where the new modules is better than the old one.  
These are empirical questions.

Michael

On Jul 29, 2008, at 3:41 PM, <grass-user-request at lists.osgeo.org> wrote:

> Date: Tue, 29 Jul 2008 20:46:02 +0200
> From: Markus Metz <markus_metz at gmx.de>
> Subject: Re: [GRASS-user] r.watershed speed-up
> To: Charles Ehlschlaeger <c.ehlschlaeger at comcast.net>
> Cc: grass-user at lists.osgeo.org
> Message-ID: <488F656A.2040805 at gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Dear Chuck,
>
> r.watershed is a much valued tool in GRASS, for me the best watershed
> analsis tool not only in GRASS, therefore I thought about a a way to
> keep the results identical too. I am also aware that the closer the
> results produced by changes in the algorithm are to the results  
> produced
> by original algorithm, the higher the chances that it will be accepted
> by the community.
>
> With regard to your suggestion, I would not adjust DEM values, because
> in larger regions the minimum possible increment is already there in  
> the
> data, i.e. there are no gaps in the data distribution that can be  
> filled
> with adjusted values. One theoretical way out would be to read in DEMs
> as FCELL or DCELL, but then there is the floating point comparison
> problem. (I tried against better knowledge, it doesn't work).  
> Regarding
> the breadth first search, where do you see breadth first <when the DEM
> values are different>? You lost me there. I don't see differences in  
> how
> points are searched between the two versions, but maybe I have not  
> fully
> understood the original algorithm. As far as I have understood the
> original algorithm, the list of astar_pts following astar_pts.nxt is
> kept in ascending order using elevation. If there are already points
> with equal elevation, the new point is inserted after all other points
> with the same elevation (line 91 in original do_astar.c), so that the
> point inserted first (of several points with equal elevation) will be
> removed first (line 19 in original do_astar.c). This is still the case
> in the new algorithm (insertion: line 136, removal: lines 31 and 192)
> most of the time. If the binary heap becomes fairly large and there  
> are
> many points with equal elevation, there might be an exception. Please
> let me know if I got something wrong there!
>
> Another possibility to produce the exact same results like in the
> original version would be to go recursively down the heap and pick the
> point added earliest from all points with elevation equal to the root
> point. This is easy to implement, but it would have slowed down the
> search algorithm somewhat and I wanted to get something lightening  
> fast.
>
> I have one main argument why it is not a disaster if the results are  
> not
> 100% identical:
> The order in which neighbouring cells are added is in both versions,
> with respect to the focus cell:
> low, up, left, right, upper right corner, lower left corner, lower  
> right
> corner, upper left corner
> This order is always kept, irrespective of the already established  
> flow
> direction, thus it is a random order and there is not really a reason
> why the algorithm should stick to that order. I think a rare  
> replacement
> of that random order (2% difference of flow direction in Moritz
> Lennert's test) with another random order (binary heap shuffling) is  
> not
> a disaster and the result is still valid. I did build in a check to  
> make
> results more similar, but there are still scenarios when this check
> doesn't catch.
>
> So my main question to you, the original author of r.watershed, is,  
> if a
> rare violation to the (in my opinion random) order in which neighbours
> are added to the list would cause the results to be no longer valid.
> The other question is if I should provide now a version that really
> produces identical results, or if I first sort out the problem of how
> neighbours are (should be) added to and removed from the list. BTW, I
> tried to change the order of adding neighbours to the list too, taking
> into account the already established flow direction. It produces very
> straight lines in flat terrain, which is ok in hydrological terms, but
> some randomness looked better. Flat terrain in the DEM must not be  
> flat
> in reality because of problems with DEM resolution and accuracy,
> randomness produces there more naturally looking results.
>
> Sorry for the long reply!
>
> Regards,
>
> Markus