[GRASS-dev] mutli-core GRASS [was: ramdisk as mapset?]

Mon Jul 23 13:23:48 EDT 2007

Hamish wrote:

> > I am about to purchase a cluster of Mac Pros for filtering and
> > rendering sonar data and I have been curious what has been done to
> > parallelize GRASS buy enterprising people.
> 
> Q: is the GRASS segmentation process inherently thread-friendly?
> (I mean theoretically, not as written)
> 
> ie if the segmentation library was rewritten to be better, could it use
> some sort of n_proc value set at compile time, (or (better) a gis var
> set with g.gisenv, or even a shell enviro var) to determine how many
> simultaneous processes to run at once?

No. You can't avoid modifying code.

> Given our manpower, the best way I see to get GRASS more multi-core and
> multi-processor ready is a piecemeal approach, starting will the low-
> hanging fruit / biggest bang-for-buck libraries. Simultaneously we can
> gradually remove as many global variables as possible from the libs.

It isn't just global variables (although those are a major issue, not
just for threading).

For most applications, you would want to be able to have multiple
threads reading/writing the input/output maps, so the issue also
applies to the fields of "struct fileinfo".

> I wonder if Thiery has any thoughts here, as he is probably in a better
> position to fundamentally & quickly rework the architecture than we are.
> (ie less baggage to worry about) I think it is very safe to say that for
> the next decade or so multi-core scaling is going to be the future of
> number crunching. Eventually new paradigms and languages will arrive, but
> for now we have to fight with making our serial languages thread-safe....
> 
> some sort of plan of action, in order of priority:
> 1) [if plausible] Make the segment lib multi-proc'able. If it's currently
>    crappy, then all the more reason to start rewrites here.

The segment library isn't really helpful here. It's essentially just a
home-grown virtual memory system. AFAICT, the only advantage over the
OS' virtual memory is that you aren't limited to (at most) 4GiB on a
32-bit system. On a 64-bit system, you may as well just read the
entire map into (virtual) memory.

BTW, unless the module explicitly opens the segment file in 64-bit
mode (LFS), it will hit the 2GiB file limit. Many systems have more
than 2GiB of RAM, so using the segment library may actually reduce the
maximum size of a map compared to just reading it into memory.

> 2) Work on quad-tree stuff (v.surf.*, r.terraflow) individually  (???)
> 3) Create new row-by-row libgis fns and start migrating r.modules, 1 by 1.
>    (what would the module logic look like instead of two for loops?)

If the complexity is in the algorithm, then there's no alternative to
restructuring the code. Obviously, this requires that the algorithm is
actually parallelisable.

There are a few things which can be done in libraries, e.g. using
separate threads for {get,put}_row operations, so that the actual
algorithm gets a complete CPU core to itself.

As for parallelising individual modules, r.mapcalc would be an obvious
priority. It's current structure isn't conducive to that (the buffers
are stored in the expression nodes), but that isn't too hard to
change. There's still the issue that the {get,put}_row operations have
to be serialised, so you won't be able to process data faster than a
single core can read/write a map. Still, for complex calculations, it
might be worth the effort.

> 4) I don't know, but suspect, MPIing vector ops will be much much harder.

The main issue is likely to be the difficulty of making the output
operations thread-safe. Making read operations thread-safe is usually
simple (if you have a read "cursor", that needs to be updated
atomically).

> After the segment lib & one-offs, the next big multi-proc task I see is
> the row-by-row raster ops. This of course means replacing
> G_{get|put}_*_row() in the raster modules with a more abstract method.
> Then, in some new libgis fn, splitting the map up into n_proc parts and
> applying the operation to each. Worry about multi-row r.neighbors etc
> later?   This is getting near to writing r.mapcalc as a lib fn. (!)

Most "filters" can be parallelised easily enough. This includes those
which need a "window", e.g. r.neighbors; it doesn't matter if multiple
threads are reading the same row. You do need to make the rowio window
large enough to account for the number of active threads (e.g. if you
have 4 threads and a 5x5 window, you need at least 2+4+2 = 8 rows).

The main issue is that the core raster I/O needs to be made
thread-safe, including multiple threads using a single map. That means
either replacing the {work,null,mask,temp,compressed}_buf fields in
the fileinfo structure with an array of such buffers (one per thread),
or using automatic buffers (i.e. alloca(); does any supported platform
not provide this?).

-- 
Glynn Clements <glynn at gclements.plus.com>