[GRASS-user] r.watershed speed-up

Tue Jul 29 14:46:02 EDT 2008

Dear Chuck,

r.watershed is a much valued tool in GRASS, for me the best watershed 
analsis tool not only in GRASS, therefore I thought about a a way to 
keep the results identical too. I am also aware that the closer the 
results produced by changes in the algorithm are to the results produced 
by original algorithm, the higher the chances that it will be accepted 
by the community.

With regard to your suggestion, I would not adjust DEM values, because 
in larger regions the minimum possible increment is already there in the 
data, i.e. there are no gaps in the data distribution that can be filled 
with adjusted values. One theoretical way out would be to read in DEMs 
as FCELL or DCELL, but then there is the floating point comparison 
problem. (I tried against better knowledge, it doesn't work). Regarding 
the breadth first search, where do you see breadth first <when the DEM 
values are different>? You lost me there. I don't see differences in how 
points are searched between the two versions, but maybe I have not fully 
understood the original algorithm. As far as I have understood the 
original algorithm, the list of astar_pts following astar_pts.nxt is 
kept in ascending order using elevation. If there are already points 
with equal elevation, the new point is inserted after all other points 
with the same elevation (line 91 in original do_astar.c), so that the 
point inserted first (of several points with equal elevation) will be 
removed first (line 19 in original do_astar.c). This is still the case 
in the new algorithm (insertion: line 136, removal: lines 31 and 192) 
most of the time. If the binary heap becomes fairly large and there are 
many points with equal elevation, there might be an exception. Please 
let me know if I got something wrong there!

Another possibility to produce the exact same results like in the 
original version would be to go recursively down the heap and pick the 
point added earliest from all points with elevation equal to the root 
point. This is easy to implement, but it would have slowed down the 
search algorithm somewhat and I wanted to get something lightening fast.

I have one main argument why it is not a disaster if the results are not 
100% identical:
The order in which neighbouring cells are added is in both versions, 
with respect to the focus cell:
low, up, left, right, upper right corner, lower left corner, lower right 
corner, upper left corner
This order is always kept, irrespective of the already established flow 
direction, thus it is a random order and there is not really a reason 
why the algorithm should stick to that order. I think a rare replacement 
of that random order (2% difference of flow direction in Moritz 
Lennert's test) with another random order (binary heap shuffling) is not 
a disaster and the result is still valid. I did build in a check to make 
results more similar, but there are still scenarios when this check 
doesn't catch.

So my main question to you, the original author of r.watershed, is, if a 
rare violation to the (in my opinion random) order in which neighbours 
are added to the list would cause the results to be no longer valid.
The other question is if I should provide now a version that really 
produces identical results, or if I first sort out the problem of how 
neighbours are (should be) added to and removed from the list. BTW, I 
tried to change the order of adding neighbours to the list too, taking 
into account the already established flow direction. It produces very 
straight lines in flat terrain, which is ok in hydrological terms, but 
some randomness looked better. Flat terrain in the DEM must not be flat 
in reality because of problems with DEM resolution and accuracy, 
randomness produces there more naturally looking results.

Sorry for the long reply!

Regards,

Markus

Charles Ehlschlaeger wrote:
> Dear Markus and other r.watershed enthusiasts,
> I've thought of a way to make your faster version of r.watershed give
> identical results as the older r.watershed. r.watershed.old uses a breadth
> first search in section 2. r.watershed.fast is breadth first <when the DEM
> values are different>. The trick would be to slightly adjust DEM values by
> when they were added to the heap. Here is some pseudo-code
> cellsAddedToHeap = 0
> minCellIncrement = Double.MIN_VALUE # (java) This constant represents 
>                                     # the smallest increment a CELL 
>                                     # can be, if the CELLs are doubles.
> # At the time a cell is added to heap, the DEM value placed 
> # in the heap would be:
> DemOfCellToHeap = DEM + (minCellIncrement * cellsAddedToHeap++))
>
> Sincerely, chuck
>