[GRASS-dev] GRASS GIS nightly builds

Sun Feb 24 21:14:50 PST 2013

Hi,

to test the efficiency (does 650% of the CPU go 6.5x as fast as
running 100% on a single core?) you can use the OMP_* environment
variables.  from the bash command line:

# try running it serially:
OMP_NUM_THREADS=1
export OMP_NUM_THREADS
time g.module ...

# let OpenMP set number of concurrent threads to number of local CPU cores
unset OMP_NUM_THREADS
time g.module ...

then compare the overall & system time to complete.
see http://grasswiki.osgeo.org/wiki/OpenMP#Run_time

if that is horribly inefficient, it will probably be more
efficient to run multiple (different) jobs serially, at the same
time. The bash "wait" command is quite nice for that, waits
for all backgrounded jobs to complete before going on.

for r.in.{xyz|lidar|mb} this works quite well for generating
multiple statistics at the same time, as the jobs will all want
to read the same part of the input file at the about the same
time, so it will still be fresh in the disk cache keeping I/O
levels low.  (see the r3.in.xyz scripts)

for v.surf.bspline my plan was to put each of the data subregions
in their own thread; for v.surf.rst my plan was to put each of
the quadtree squares into their own thread. Since each thread
introduces a finite amount of time to create and destroy, the
goal is to make fewer, longer running ones. Anything more than ~
an order of mangnitude more that the number of cores you have is
unneeded overhead.

e.g., processing all satellite bands at the same time is a nice
efficient win. If you process all 2000 rows of a raster map in
2000 just-an-instant-to-complete threads, the create/destroy
overhead to thread survival time really takes its toll.
Even as thread creation/destruction overheads become more
efficiently handled by the OSs and compilers, the situation will
still be the same. The interesting case is OpenCL, where your
video card can run 500 GPU units..

Hamish