[GRASS-user] Repeated r.watershed runs

Fri Sep 1 00:33:52 PDT 2017

On 01/09/17 06:41, Ken Mankoff wrote:
> Hi Micha,
> 
> We are getting closer to the issue. Unfortunately I'm also becoming more 
> certain the limitation is real and in GRASS and not my mental model.  If 
> this email does not clarify it, I will draw a picture which may help.
> 
>> On 01 Sep 2017, at 00:08, Micha Silver <tsvibar at gmail.com 
>> <mailto:tsvibar at gmail.com>> wrote:
>>
>> You won't have many upstream cells for those cells along the basin 
>> boundary, only the few that drain exactly along the watershed divide.
> 
> I think you are picturing the NC data set and mountains. Greenland is 
> flatter. Why can't a divide in an extreme case be near or across a lake? 
> Or alternatively, why can't a major stream flow along the boundary 
> outside of a divide? In these cases the boundary cells contribute 49%, 
> but have large (massive?) upstream catchments themselves, all of which 
> is excluded if a mask is generated from r.water.outlet. If I were only 
> losing the boundary cells (max of n of them, for a boundary n cells 
> long), I would not worry. This seems like the case in mountainous 
> regions, but perhaps not on the flatter Greenland ice sheet.
> 
> 
>> The only way that r.watershed can return different results is if you 
>> input a different elevation grid. 
> 
> R.watershed w/o flow -> r.water.outlet produces a "minimum" basin where 
> partial contributor cells (and the upstream catchments of those cells) 
> are not included.
> 
> R.watershed WITH flow produces runoff at point x,y with the contribution 
> from other catchments that partially contribute to this catchment.
> 
> Correct?

I think so. If you know how much overland flow you have in each cell (= 
flow parameter), then r.watershed should calculate the accumulation of 
that flow along the way based on the extremely simplified assumption 
that all that flow remains overland across such a large area.

But I won't get more into the actual sens of doing this as others have 
done this very well.

However, to get to you original problem, you wrote:

"If I do "r.watershed" ~14,000 times I'll get the results, but it will 
take 3 days."

First of all, 3 days running time is not that bad for such an amount of 
data. We're talking less than 20 seconds for each run...

My answer to this would be parallelization. If you have enough cores and 
enough memory (but IIRC r.watershed does not need that much), you can 
make this go much faster, as each run is independent of the others 
(again IIUC). So, if you have 8 cores to spare, you can divide the time 
by (probably a bit less than) 8...

You can look at [1] for some general information and [2] for a specific 
GRASS Python API module for that.

Moritz

[1] https://grasswiki.osgeo.org/wiki/Parallelizing_Scripts
[2] 
https://grass.osgeo.org/grass73/manuals/libpython/pygrass.modules.interface.html?highlight=parallelmodulequeue#pygrass.modules.interface.module.ParallelModuleQueue

Moritz