[GRASS-dev] Re: grass-dev Digest, Vol 30, Issue 31

Mon Oct 13 10:43:27 EDT 2008

>
> ------------------------------
>
> Message: 7
> Date: Mon, 13 Oct 2008 09:12:54 +0200
> From: Markus Metz <markus_metz at gmx.de>
> Subject: [GRASS-dev] Re: big region r.watershed
> To: hamish_b at yahoo.com, grass-dev at lists.osgeo.org
> Message-ID: <48F2F4F6.1010006 at gmx.de>
> Content-Type: text/plain; charset=ISO-8859-1
>
>
>
> Hamish wrote:
>> Markus Metz wrote:
>>
>>> The original version uses very little memory, so assuming that GRASS
>>> runs today on systems where at least 500MB RAM are available I  
>>> changed
>>> the parameters for the seg mode, more data are kept in memory,  
>>> speeding
>>> up the seg mode. Looking at other modules using the segment library
>>> (e.g. v.surf.contour, r.cost), it seems that there is not one  
>>> universally
>>> used setting, instead the segment parameters are tuned to each  
>>> module.
>>> The new settings work for me, but not necessarily for others, and  
>>> maybe
>>> using 500MB is a bit much.
>>>
>>
>> fwiw r.terraflow has a memory= option, the default is 300mb.
>> AFAIU, the bigger you make that, the smaller the on-disk temp  
>> files need
>> to be (ie work-around to keep tmp files <2gb for 32bit filesystems).
>>
>> a number of modules like r.in.poly have a rows= option, which I  
>> didn't
>> really understand until I got into the code. (hold at most that many
>> region rows (all columns) in memory at once). Interestingly the  
>> default
>> value has scaled quite well over the years.
>>
>> and other modules like r.in.xyz have percent= (0-100) for how much  
>> of the
>> map to keep in memory at once.
>>
> A default value that scales well over the years would be  
> preferable, but
> performance of r.watershed.fast -m is really poor if whole columns (or
> rows ) are kept in memory and much better if segments have equal
> dimensions. Interestingly, segments of 200 rows and 200 columns are
> processed fastest, faster than e.g. 150 rows and columns or 250  
> rows and
> columns. The more segments are kept in memory the better.
> Right now I don't want to introduce a new option to give the user
> control over how much memory is used (be it MB memory, number of  
> rows or
> percent of the map) because I want to keep all options of
> r.watershed.fast identical to the original version. I'm still not  
> happy
> with the speed of the segmented version of r.watershed.fast, but at
> least it is magnitudes faster than the in-memory version of the  
> original
> r.watershed. Maybe the iostream library that came with r.terraflow can
> be used for r.waterhed -m as well.
>
> Markus
>
>

To use the Iostream library  you need to change the underlying  
algorithm of watershed.  Iostream implements streams (files on disk)  
and sorting streams.   If you use Iostream  you need to store the  
grids in streams on disk, rather than 2d-arrays in memory.   On  
streams random access is very expensive, so you need a way to express  
the computation as a sequence of sorting streams followed by  
sequential accesses to streams.  This usually requires a complete  
rewrite of the algorithm.

-Laura