[GRASS-user] OpenCL GPU acceleration build support now available in trunk

Hamish hamish_b at yahoo.com
Sun May 12 22:32:34 PDT 2013


Hi,

fyi we have just added OpenCL GPU (graphics card) support to the build
system for GRASS 7. Use the --with-opencl ./configure switch. It has been
tested on Mac OSX and Linux (both AMD/nVidia). Probably r.sun will be
the first module that uses it since Seth Price has already made that work.

So far it's just the build support, and only tested on a couple
different setups, but it's a good first step. :)


some caveats about GPU performance:
 - when it works, it works very very nicely (ray-tracing for example?)
   it's hard to say how much speed-up, maybe 10-20x, maybe none
 - it only works well for certain kinds of problems
 - in the current generation memory I/O on and off the card is expensive
   this will get better with time.
 - in the current generation many consumer grade graphics cards are single
   precision floating point only. this will get better with time.
 - the size of you calculation is often constrained by the limited memory
   in the video card. this will get better with time.
 - the graphics drivers are still catching up, but are arriving. nVidia ships
   it standard with their proprietary driver now; MacOSX has it standard now;
   Intel has been lagging on Linux; and AMD's Catalyst driver has support.
   The open source versions of each on Linux all are seeing a lot of activity,
   with at least roughly-working prototypes in their development versions.
   this will get better with time.
 - OpenCL isn't just for GPUs, it can do multi-CPU as well, or both together.

for the cases where OpenCL isn't the right tool, GRASS already supports OpenMP
and POSIX threads (pthreads) in the build infrastructure, the python library
has multi-processor enabled functions and example scripts, and a number of
the UNIX shell scripts in devbr6 (GRASS 6.5) and the addons repository are
multi-process enabled.


see also
  http://grasswiki.osgeo.org/wiki/GPU
  http://grasswiki.osgeo.org/wiki/Category:Parallelization


For multi-core CPU, the throughput efficiency goal I'm aiming for
is less than 15-20% overhead vs. running 6 separate jobs in parallel.
So far the enabled shell scripts in devbr6 and the enabled
python scripts in trunk are good at that, because they essentially
just background e.g. 2 of 3 rgb band rasters then wait for all
3 to finish, but e.g. the LU decomposition using OpenMP in lib-
gmath which should be an easy win still needs a separate unwound loop
version written to get the number of threads down into the dozens
instead of the 10s of thousands. (hint: a nice little self contained
learn-OpenMP project for someone who wants to speed up RST interps)


regards,
Hamish


More information about the grass-user mailing list