[GRASS-dev] Grass parallelization
hamish_b at yahoo.com
Wed Sep 4 02:56:54 PDT 2013
> Do you know if there is much interest in greater parallelization?
Huge interest. Outside of the core libraries, GRASS is made up of
~500 individual processing modules, each doing their own thing well.
Each utilizes their own algorithms and strategies which is why GRASS 7
has built-in support for OpenMP, pthreads, *and* OpenCL- the idea is that
the right parallelization strategy can be matched to the nature of the
problem which each individual module faces. Additionally our python
scripting library has helper functions to make parallel discrete-processes
easy to use, since a common use case is to run the same computation
on three different Red,Green,Blue imagery bands, or all ~7-11 spectral
bands from satellite data (e.g. LANDSAT). In those cases the number of
natural processes are in the same neighborhood as the number of cores
on a typical workstation, so backgrounding all but one of the jobs in
bash or python then waiting for them all to finish works remarkably well
and takes minimal programming effort and divergence from the single-thread
case. That's not far from the MPI situation, instead of backgrounding
jobs they could just as well be sent to other machines in the cluster.
As Soeren mentioned the gmath and gpde libraries support OpenMP already;
in addition Seth Price put together an OpenCL version of our r.sun
module (GPU ray tracing sunbeams, seems like a natural fit!) but I/we
still need to finish integrating that into the main build; and our
r.mapcalc module has pthreads support. The r.mapcalc (raster array map
calculator) case is a pretty typical one for GRASS modules actually, they
are not entirely, but not far from, being I/O limited not CPU bound per se.
For MPI this means that there's a *lot* of data to pass around the network,
and unless you've got infiniband or some network infrastructure near to
the speed of your RAID, I suspect you'll quickly saturate.
The main highly-CPU-bound modules I am personally very keen to see get
parallelized are our spline interpolation modules: v.surf.rst and
v.surf.bspline. The LU decomposition parts of them are actually in the
GRASS libraries not the modules, so that would also benefit e.g. v.vol.rst
which does 3D voxel cube interpolations. The v.surf.rst module uses quadtree
segmentation, and v.surf.bspline does its own splitting into ~ 12-120
processing segments, so those yell out to me as low hanging fruit.
I am sure the vector network analysis modules could make good use of
parallelization too, but I don't use those enough personally to be able
to comment on their immediate needs and bottlenecks.
Markus N. might be able to talk about what he's doing on the Top500
supercomputer (AIX); I'm not sure how much Maui/Torque or similar is
handling the job submissions there and how much is manual scripting
to break up/send out the jobs and then process the results.
> And have the Intel compilers and MPI been used with GRASS?
Yes, I've built GRASS with icc ver 12.1.3 (-O2 -xHost -ipo -static-intel
-parallel -Wall). Considering the size of the GRASS codebase it might
be a little surprising that there weren't more problems :), but we do
try pretty hard to keep the code straight ANSI/C89 C, which helps a lot
with portability. For GRASS + icc build notes see:
For GRASS I generally need to keep a close eye on the Debian packaging,
so typically build it will gcc; outside of GRASS for I do use ifort a
lot, and there the OpenMP auto-vectorization works really great. I
understand that's a bit easier to do for FORTRAN than C though.
As for MPI, there's a MPI version of the above mentioned v.surf.rst module
for GRASS 5 floating around somewhere (probably under its old name of
's.surf.rst'); I actually run a medium sized cluster in my day job which
is ~85% MPI usage, but I've never really been tempted to use it for GRASS
things.. for what I personally do often saturating all cores/CPUs on the
local workstation is enough. Also, the cluster setup can be non-trival for
new users (NFS mounts, ssh keys, etc..), so out-of-the-box "just works"
OpenMP style multi-threading probably gets us better bang for the buck
when trying to support the 'Desktop GIS' user case, which is probably the
bulk of our end users. But don't get me wrong, if the long-running modules
like the spline interpolations and the r.cost module for search-paths were
MPI-enabled they'd certainly get used by our power users & teams using it
for back-end server satellite image number crunching!
More information about the grass-dev