[GRASS-dev] Re: parallelizing GRASS modules

Mon Dec 5 09:31:36 EST 2011

> I'm looking at this as a learning exercise as much as anything else, so
> I don't really mind a bit of wasted effort. And if I can speed up
> v.surf.rst today by adding a conditional one-liner somewhere, I think it
> is worth my time. (In ~30 minutes of experimenting I already have it
> completing ~25% faster; the just posted problem runs 50% faster but gives
> garbage results)

Unfortunately i must say, parallelizing a program which produces
garbage is no indicator
how much faster the program becomes if it gets correctly parallelized.
The problem for garbage are
in most cases race conditions. If you have race conditions in the
code, you need to restructure it.
The cholseky band matrix solver needed to be restructured to run in
parallel, luckily that was not so hard. IMHO in case of the NR LU
decomposition code, you really need to understand what is going on for
parallel restructuring.

[...]
> The NR (Numerical Recipes) solver is designed to run very
>> fast in serial, but i think is very hard to parallelize.
>
> fwiw, long ago the Numerical Recipes authors gave permission for their
> code to be used in GRASS, but yes, we should work to remove it anyway.

Well, i guess they don't gave there permission to distribute it under the GPL2?
Thats actually what we are doing. Besides of that the code is not commented
or documented as NR code with special license ... .

[...]

> having looked at v.surf.rst a bit more, it seems to me the loop that
> really wants to be parallelized is in lib/rst/interp_float/segmen2d.c
> IL_interp_segments_2d().  the "cv" for loop starting on this line:
>
>    for (skip_index = 0; skip_index < m_skip; skip_index++) {
>
> calls matrix_create() ~256 times during the course of the module run, and
> within that the LU decomposition function is already abstracted so easy
> to swap out with something else.  If each one of those matrix_create()s
> could be run in their own thread there would be no need to parallelize
> the deep linear algebra innards, and so no huge overhead to worry about
> and numeric code designed to be serial could remain that way.
> it is a bit more complicated, so may be a job for pthreads..?

OpenMP support parallel section:
#pragma omp sections
{
  #pragma omp section
    /* Code for section 1 */
 #pragma omp section
   /* Code for section 2 */
}

Best regards
Soeren