[GRASS-dev] multi-threaded r.mapcalc

Fri Jan 9 11:51:50 EST 2009

Markus Neteler wrote:

> > I have added experimental multi-threading support to r.mapcalc in 7.0
> > (r34440). It isn't enabled by default; to enable it, use
> > "make USE_PTHREAD=1". It only uses a handful of pthread functions, so
> > there's a reasonable chance of it working on MacOSX and/or Windows
> > (with the pthread-win32 library).
> >
> > You can change the number of worker threads by setting the WORKERS
> > environment variable.
> 
> A curiosity: is the approach completely different from the parallelization
> implemented in r.li.*?

AFAICT, r.li uses a daemon and separate processes.

> > The parallelism is limited to evaluation of argument expressions to
> > functions and operators.
> 
> This is not entirely clear to me:
> "evaluation of argument expressions" - isn't this done only one time
> when invoking r.mapcalc with a formula?

Evaluation is what r.mapcalc is doing all the time that it's running. 
The top-level expressions are evaluated for each map row. Evaluating a
function call (which includes infix operators) requires first
evaluating each argument expression to an argument value (i.e. a row
of data), then calling the function on the argument values.

> What would be a complex
> example which illustrates the benefits of parallelism in the evaluation
> part?

Consider e.g. "out = a * b + c * d".

This parses as "add(mul(a,b),mul(c,d))".

Evaluation would amount to:

	tmp1 = a * b
	tmp2 = c * d
	tmp3 = tmp1 + tmp2
	write_row(tmp3)

The parallelism causes the first two steps above to be executed
concurrently in separate threads.

The original evaluate_function() code was:

    for (i = 1; i <= e->data.func.argc; i++)
	evaluate(e->data.func.args[i]);

    res = (*e->data.func.func) (e->data.func.argc,
				e->data.func.argt, e->data.func.argv);

The new version is:

    if (e->data.func.argc > 1 && e->data.func.func != f_eval) {
	for (i = 1; i <= e->data.func.argc; i++)
	    begin_evaluate(e->data.func.args[i]);

	for (i = 1; i <= e->data.func.argc; i++)
	    end_evaluate(e->data.func.args[i]);
    }
    else
	for (i = 1; i <= e->data.func.argc; i++)
	    evaluate(e->data.func.args[i]);

    res = (*e->data.func.func) (e->data.func.argc,
				e->data.func.argt, e->data.func.argv);

IOW, rather than fully evaluating the argument expressions
sequentially, it fist commences evaluation for all of them, then waits
for all of them to finish.

The original behaviour is retained for the eval() function (which is
normally used as a sequencing operator, which will break if
expressions are evaluated concurrently) and functions with a single
argument (where there's no benefit to parallelism, only overhead).

> > It doesn't attempt to calculate multiple rows
> > concurrently, or multiple output maps.
> 
> Could this be integrated somehow from r.li.*?

No.

The main problem is trying to retrofit parallelism onto code
(r.mapcalc scripts) which isn't expecting it.

The problem with evaluating multiple output maps concurrently is
related primarly to the handling of variables. E.g. consider:

	r.mapcalc <<EOF
	out1 = eval(a=1,...)
	out2 = eval(a=2,...)
	out3 = a
	EOF

If you try to evaluate the maps in parallel, you'll get problems with
the out1 and out2 evaluations trying to store different values in "a".
Also, out3 should be 2, but that would need steps to make it happen.

IOW, you would have to ensure that anything which reads a variable
waits until its value for that row has been calculated.

The main problem with evaluating multiple rows concurrently is the way
that buffers are managed. Each node in the parse tree has an
associated buffer which contains the value for the current row. This
would need to be change to allow either N buffers for N-row
parallelism or separating the buffers from the expression nodes. The
latter approach is complicated by the implementation of variables
(which simply use the RHS expression's buffer as their own buffer
rather than allocating a separate buffer).

> > Also, only one thread can call get_map_row() at a time, so commands
> > which are I/O bound (i.e. which only perform simple calculations, so
> > most of the time is spent performing I/O), won't see much of an
> > improvement.
> >
> > This is mainly due to the fact that the libgis raster I/O functions
> > aren't re-entrant, due to the use of pre-allocated row buffers (and
> > possibly other issues).
> >
> > [There are also static row buffers in r.mapcalc itself, but those
> > could easily be eliminated if there was any point.]
> 
> Long ago (KR to ANSI C reformatting) I suggested to split the
> raster functions out of libgis, calling them Rast_XXX().
> Do you think this could become relevant now for GRASS 7?
> 
> I still find libgis overwhelmed with too many different functions.

I'm inclined to agree. However, it can sometimes be hard to separate
"raster" functions from "general" functions due to GRASS' origin as a
purely raster GIS. E.g. the region resolution is specific to rasters,
but region handling geneerally isn't. Are the functions which deal
with colour tables "raster" functions. Etc.

Other cases are easier, e.g. memory allocation and error handling.

> > Finally, eval() is excluded from parallelisation, as it is used for
> > variable assignments, and a variable needs to be assigned before it is
> > used. If this is significant, we can have separate parallel and
> > sequential versions. If you perform variable assignments other than in
> > eval(), you lose.
> >
> > Mostly, I'm interested in discovering whether this actually has any
> > practical benefit. IOW, whether anyone is actually using scripts which
> > do enough computation to make use of additional cores.
> 
> Do we have an example in [6.4] scripts/ which I can use for timing testing?

r.shaded.relief seems to have the most complex r.mapcalc expression of
any of the supplied scripts.

-- 
Glynn Clements <glynn at gclements.plus.com>