[GRASS-dev] Interested in parallelization of GRASS

Sat Mar 26 20:43:46 EDT 2011

Maris wrote:
> > There are many areas that could benefit from thread-safe/
> > parallel processing: vector and raster reading/writing, OGSF
> > etc.

Glynn:
> Supporting concurrent reads on a single raster map would
> make the code significantly more complex. Concurrent writes
> would be even worse unless compression was disabled.
> 
> In 7.0, concurrent reads from or writes to different raster
> maps should work (in 6.x, it's unsafe). This might allow for
> significant parallelism in e.g. r.series, which often has a
> large number of input maps.

It seems to me that the biggest gains for the least effort is to
concentrate on the modules/lib fns which are CPU bound not I/O
bound.

While e.g. pthreads in r.mapcalc in grass7 may give me a nice
10% speedup by separating reading and writing into two threads
[or is that splitting into IO and maths?] (* I haven't actually benchmarked it, 10% is just a guess), and that is not a gift to
refuse, the reality seems that the time-to-completion is still
dominated by the I/O bottlenecks and saturating the bus-- at which
point it doesn't matter how many threads or CPUs you have, you're
still limited by the speed of your bus/drive/RAID array.

Optimizing/parallelizing short-running tasks seems most
appropriate when they will be run repetitively in tight loops.
There's nothing wrong with speeding them up, but I'd rather chop
an hour long process into 10 minutes on my hex-core CPU than get
a 5 second one-off process to complete in 4.5 seconds.

It's the ones that take hours, or at least give you time to
glance at the process monitor and think "I've got 2/4/6/8 cores
here, but only one is being used. this is taking forever; argh!".

So I try to think of modules which are CPU bound.. the first
task is to replace inefficient algorithms with better ones (e.g.
Glynn's r.cost work, Markus M's r.watershed work), and then if
still needed to split those tasks up with pthreads, OpenMP, or
OpenCL (depending on the right horse for the course).

One thing I learned from Seth's work on r.sun (which I still
need to merge, bad me), was that you could instruct OpenCL to
send the job to $x CPU cores instead of an OpenCL-aware GPU.

my current understanding goes like:
-pthreads maybe best for things like splitting IO and math into
 two threads.
-OpenML maybe best for things like sending a CPU-bound problem
 to 2/4/6/8 CPU cores.
-OpenCL maybe best for ray-tracing type problems (or maybe just
 fine for more general use?)

*.surf.rst takes a while to run; v.in.ogr in GRASS 6 takes a
while to run (haven't benchmarked in gr7, know there has been
work done on that); r.surf.contour used to take days (but worth
the wait :); not sure if Markus M's updates there do anything
for that or not)

There may be a number of slow vector modules, I mostly work with
rasters so don't know them as well. It is worth noting that a
lot of the DB stuff is already run as a separate process, so
some advantage of a multi-core system there.

I'd be interested to hear what non-IO bound modules people spend
a lot of time waiting for. Also if there are any memory-bound
bottlenecks around?

2c,
Hamish