[GRASS-user] How does GRASS do tiled processing?

Thu Jul 9 03:55:29 EDT 2009

Jonathan wrote:
> > I was curious -- how is tiled processing realized in GRASS GIS?
> > Is there a fixed input tile size (in MB of RAM or # of lines)?
> > Is there some documentation buried on the GRASS site that
> > describes the algorithm?  I'm trying to replicate an efficient
> > tiled approach in R -- I was basing it off the ENVI approach
> > (precalculate the input data memory footprint per line of data,
> > read in as many lines as the memory cap allows, process, write
> > those lines, rinse, repeat), but I was curious if GRASS had a
> > different approach.
>
> [...] I'm more curious in just the
> generic way GRASS does tiled processing (say, in
> mapcalc).  I assume there is a low-level processing
> layer GRASS uses (or no?).  I'm not doing a direct
> grass-to-R link, I'm doing the processing completely within
> R with rgdal, but I'm interested in various solutions to the
> tiled processing problem.

most GRASS raster modules operate row by row. i.e. only one row of
map data is held in memory at a time. This goes a long way to explain
why it scales so well with minimal memory requirements.

This is mostly how r.mapcalc works, but due to its neighborhood operator
it may use the rows ahead & behind as well. (I'm not really sure) Also
note that r.mapcalc in GRASS 7 has experimental pthreads support multi-
threading on a multi-processor machine.

For modules which need to deal with all rows at once and hold all
those rows in memory, there is often a rows= option or memory= option
which lets you tell it how much of the map to store in memory at one
time, and then it does the job in a number of passes. e.g. v.to.rast,
r.in.xyz.

Other modules use a fixed amount of RAM. e.g. r.terraflow, r.watershed -m,
r.proj[.seg]. These will only use that much if actually needed, otherwise
they will use less. How that is done behind the scenes is module dependent.

e.g., some of them use the GRASS segment library to split the map up into
tiles during processing.  Actually, this is what 'r.watershed -m' is doing,
the max. memory option just tells it how big the segments can be. The
default r.watershed just tries to do everything in memory. r.los and
r.cost also use the segment library. It is considered to be slightly
inefficient.

may I inquire if you are more concerned with processing time (one tile per
processor on a multi-core) or memory use?  (sounds like memory use)

see raster/r.in.poly/raster.c in the source code for an example.

Hamish