[GRASS-dev] Re: parallelizing GRASS modules

Mon Dec 5 03:54:29 EST 2011

Hamish wrote:
> Michael wrote:
>> I lost the previous thread but wanted to respond
>> to your question about which modules might
>> benefit from speedup.
>> In our recursive landscape evolution module
>> (r.landscape.evol.py), the two GRASS modules that
>> take the most time are r.watershed, r.stats, and
>> r.walk, especially r.watershed and r.stats since
>> we need to run these every model cycle.
>> The speedup of r.watershed of a few years back
>> made an enormous difference in our model run
>> times. But it is still time consuming on
>> landscapes with large numbers of cells. If
>> parallelization could speed this up, it would be
>> great. I'm not sure that r.stats can be
>> parallelized or not, but speedup would be helpful.
>
>
> wrt r.walk: I would think to start with parallelizing r.cost, then porting
> the method over to r.walk. Both modules use the segment library, so coding
> it to process each segment in its own thread seems like the way to do that.

Note that r.terracost follows this approach. Unfortunately, 1) you
will have to cross segment boundaries at some stage to accumulate
costs, 2) the r.terracost implementation using disk swap mode is
unsuccessful in the sense that it is magnitudes slower than r.cost in
trunk, because r.terracost needs to cross segment boundaries. With
regard to the knight's move in r.cost, that method relies on the fact
that all directly adjacent cells are processed first.

> (and more generally formulate + document a method to parallelize things that use the segment library.
> perhaps a simple proof-of-method module to do that should come first)

The segment library in trunk is mostly IO bound, CPU overhead is very
low. Parallelizing the segment library could actually slow down the
system because several parts of a file are accessed in parallel that
can not be cached by the system. That's why the iostream lib does
strictly sequential reading. The segment library, although designed to
allow random access, should be used such that access is not too random
(sweep-line concept for searches as done by r.cost, r.walk,
r.watershed).
>
>
> wrt r.watershed: I guess we'd want a segment mode like the -m flag, but
> keeping things in RAM instead of using disk swap? Segment mode does make
> use of the segment library..   MarkusM?

The costliest (most time consuming) parts of r.watershed are the A*
Search and flow accumulation. The A* Search can not really be
parallelized because the order in which cells are processed is of
vital importance for everything that follows in r.watershed. This
order could easily get messed up with parallelization. Processing the
8 neighbours of the current focus cell could be parallelized, but not
without rewriting parts of the A* Search component. Flow accumulation
could be parallelized using an approach similar to r.terraflow, but
this would increase disk space and memory requirements by a factor of
nearly 8. Alternatively, there may be potential to parallelize some of
the inner loops for (MFD) flow accumulation, but I guess that the
overhead introduced by paralellization is rather high and may not be
leveled out with more cores because there are only 8 neighbours to
each cell. The current approach for the segmented mode of r.watershed
is a compromise between keeping the size of intermediate data low and
reducing disk IO. The easiest way to speed up things is to get a
really fast harddrive and use that only for grass databases.

Sorry for the lack of enthusiasm. IMHO, code should be optimized first
before it is parallelized, otherwise you run ineffective code on
several cores. I'm sure there is still some potential for optimization
in the code base...

Markus M
>
>
> wrt r.stats: I suspect it is mostly I/O bound so I don't know how much faster
> it can be made, but I ran it through the valgrind profiler anyway just to see.
> (see the wiki Bugs page for recipe)
>
> 22% of the time is spent in G_quant_get_cell_value()
>
> 13% of the time is spent in lib/gis/get_row.c's cell_values_double()
>
> 10% of the time is spent in r.stats/stats.c's update_cell_stats() updating a
> hash table. may be possible to parallelize that.
>
> if multiple input maps are used perhaps each could be processed in their own
> thread, but I don't think you are doing that with LandDyn.
>
> perhaps the way that r.stats is called/used by LandDyn could be refined? ie
> is there a lot of unnecessary calculations going on just to get a simple
> stat out which could more efficiently be answered in another way? (no idea,
> but worth exploring)
>
>
> Hamish
> _______________________________________________
> grass-dev mailing list
> grass-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev