[GRASS-dev] pthreads in r.mapcalc slower than without

Glynn Clements glynn at gclements.plus.com
Sun Jun 24 05:52:29 PDT 2012


Hamish wrote:

> I've just been running some benchmarks for r.mapcalc
> to try and find the best method for parallelizing
> a script / best way to minimize overheads.
> 
> I'd like to understand where it is useful to combine
> expressions into a single r.mapcalc process, and
> where it isn't,

If the expressions share input maps, then combining them into a single
operation will avoid the overhead of reading the map multiple times. 
If the expressions are completely disjoint, then there shouldn't be
much difference between one command and multiple commands.

> and what sort of mapcalc expressions
> can best take advantage of pthreads support, and
> which are not good matches for it.

More columns, more leaf nodes, and more computationall expensive leaf
nodes should all gain. Multiple output maps are evaluated
sequentially, so there's no gain there.

> also, as with some OpenMP experiments, if it makes
> sense to parallelize by row (given a target column
> length of 1000-3000 cells), or by some other way
> (e.g. for a 1000 row tall raster spawn 4 x 250row
> each threads)
> ?

The problem with evaluating different rows in parallel is that input
would become a bottleneck, as you can't run multiple Rast_get_row
calls on a single map concurrently.

> summary of results: r.mapcalc built without pthread
> support was the fastest for my test case. When built
> with pthread support, using WORKERS=1 was the fastest option (default is 8)*. executing r.mapcalc
> as three different processes was the fastest of all.
> 1-worker grass7 x3 processes was faster than
> grass6.5svn.

Multiple distinct processes avoids contention, although the total CPU
time would increase due to reading the input maps multiple times. This
isn't an issue if the additional cores would otherwise be idle, but
would reduce overall throughput if there is contention for CPU cycles.

> [*] note that even with r.mapcalc built without
> pthreads (make clean r.mapcalc dir + edited r.mapcalc
> Makefile) it still uses more than one CPU core.
> maybe because of lib/gis/counter.c(?)

That code uses mutexes to provide thread-safety to other functions,
but it doesn't use threads.

Note that WORKERS=1 will use one background thread in addition to the
main thread; you need to use WORKERS=0 for single-threaded execution.

The current design is quite inefficient, as it uses a thread for each
node in the tree, most of which will spend the bulk of their time
waiting for argument threads to complete. E.g. "a+b" requires three
threads, one of which will wait for the "a" and "b" threads to
complete before performing the addition. If the number of workers is
low, this can result in the leaf nodes being starved of workers
because the higher-level nodes have taken them all. But even
WORKERS=100 doesn't see any gain.

Also, in the case of a single input map with multiple modifiers (#r
etc), Rast_get_row() will be called multiple times. The reading and
decompression is only done once, but the decoding, resampling,
conversion to requested format and embedding of nulls are all done
multiple times. The per-map lock is held for all of these, plus the
application of the modifier itself (i.e. colour lookup in the case of
#r etc).

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list