[GRASS-dev] mutli-core GRASS [was: ramdisk as mapset?]

Mon Jul 23 12:27:03 EDT 2007

Hi,

>
> Dylan wrote:
> > I am about to purchase a cluster of Mac Pros for filtering and
> > rendering sonar data and I have been curious what has been done to
> > parallelize GRASS buy enterprising people.
>
> Q: is the GRASS segmentation process inherently thread-friendly?
> (I mean theoretically, not as written)

Code which should run on a cluster must not be thread safe, 
unless you are using a single system image (SSI) linux with 
distributed thread support (SGI Altix series). Most cluster do not
support thread spreading to different cluster nodes (the network connection
is in most cases the limiting factor -> exception: take a look at numa links 
from SGI).

I prefer threaded parallelism, because it is easier to implement and we do not 
need to handle with message passing overhead. But this code will not run on a
cluster (unless you use OpenMP and the Intel OpenMP on Cluster compiler 
extension).

> ie if the segmentation library was rewritten to be better, could it use
> some sort of n_proc value set at compile time, (or (better) a gis var
> set with g.gisenv, or even a shell enviro var) to determine how many
> simultaneous processes to run at once?

A variable would be the best. The gpde library uses OpenMP to run
some tasks in parallel. The number of threads can be controlled via the 
environment variable OMP_NUM_THREADS.

But why multi-threading the segment library? 
IMHO it is currently not useful to 
mult-thread io operations. IO is mostly serial (except cluster fs).

> Given our manpower, the best way I see to get GRASS more multi-core and
> multi-processor ready is a piecemeal approach, starting will the low-
> hanging fruit / biggest bang-for-buck libraries. Simultaneously we can
> gradually remove as many global variables as possible from the libs.

I currently design the N_array stuff within the gpde library from scratch
to support multi threaded processing of raster and volume data loaded
into the memory. For now some higher level array functions are 
implemented which are using OpenMP to speed up some tasks.

The current N_array implementation only support the 3 data types of grass 
and is not designed for performance, but for easy usage. This may change
in the future.

Future task are:

* Use a more abstact approach for the N_array struct handling
** 1d, 2d and 3d arrays should be managed as one structure
** Use function pointer and member functions for a more OO like approach
** A flag decides the type of the array -> easy conversion of 1d into 2d or 3d 
    arrays
** the access member function will be set while allocating an array
    eg: 1d array: double array_1d->get_d_value(array_1d, col);
          2d array: double array_2d->get_d_value(array_2d, col, row);
          3d array: double array_3d->get_d_value(array_3d, col, row, depth);
    and so on ...
** support for data references in the internal data structure
     eg: setting an already allocated raster row buffer as data pointer for an 
           1d array, in case the array is deleted (free) the buffer will not 
           be freed
* Implement new data types into the N_array library
** unsigned char, signed char, unsigned short, signed short, unsigned int,
    signed int, float, double
* create a more abstact interface to 2d and 3d raster data
** implementation of so called "data sources" in 2d and 3d 
** data sources will have member functions to access the raster and 
     volume data eg: 
    double data_source_2d->get_d_value(data_source_2d, col, row);
    N_array * data_source_2d->get_row(data_source_2d, row);
    N_array * data_source_3d->get_tile(data_source_3d, x, y, z);
    and so on
* High level functions like:
** array copy; statistic calculation of an array (mean, max, min, ...);
     sorting, basic mathematical tasks like array substraction, addition,
     multiplication, division, modulo and so on should be implemented 
     multi threaded (take a look at N_arrays_calc.c for current
     implementations)
** Neighbourhood searching routines should be implemented using N_arrays
    eg: N_array * array_2d->get_neighbours(array_2d, row, col, size)

Some of these functionality are already implemented and tested in the gpde 
lib. 

> I wonder if Thiery has any thoughts here, as he is probably in a better
> position to fundamentally & quickly rework the architecture than we are.
> (ie less baggage to worry about) I think it is very safe to say that for
> the next decade or so multi-core scaling is going to be the future of
> number crunching. Eventually new paradigms and languages will arrive, but
> for now we have to fight with making our serial languages thread-safe....

Indeed.

>
>
> some sort of plan of action, in order of priority:
> 1) [if plausible] Make the segment lib multi-proc'able. If it's currently
>    crappy, then all the more reason to start rewrites here.
> 2) Work on quad-tree stuff (v.surf.*, r.terraflow) individually  (???)

AFAIC the quad-tree stuff implemented in v.surf.rst is not usable for raster 
data storage or handling. 

> 3) Create new row-by-row libgis fns and start migrating r.modules, 1 by 1.
>    (what would the module logic look like instead of two for loops?)
> 4) I don't know, but suspect, MPIing vector ops will be much much harder.
>
>
> After the segment lib & one-offs, the next big multi-proc task I see is
> the row-by-row raster ops. This of course means replacing
> G_{get|put}_*_row() in the raster modules with a more abstract method.

I would like to suggest to implement the raster row and tile handling in a new 
library called Gdata_ which should implement the functionality i explained 
above.
The abstact Gdata interface should be able to handle different storage 
implementations (current raster storage, segemt and rowio lib, 
an interface to gdal, g3d lib ...) with the data_source approach. 

> Then, in some new libgis fn, splitting the map up into n_proc parts and
> applying the operation to each. Worry about multi-row r.neighbors etc
> later?   This is getting near to writing r.mapcalc as a lib fn. (!)

Indeed.

Best regards
Soeren

> I wonder if the python-C SWIG interface helps with prototyping?
> Then slowly move as many of the 150 raster modules to the new MPI-aware
> lib fns as are suited for it, one by one. Again I think the low-hanging
> fruit will be obvious and the most important modules (r.mapcalc, r.cost)
> will be taken care of first, and the lesser used raster modules on a needs
> basis by contributors. (as long as we offer a clean API method)
>
>
> I've read that "n" in 'make -j n' should be n_procs + 1. Is that just
> true for quick little processes where you always want a job ready at the
> door and there's a lot of overhead creating & destroying the process?
>
>
>
> thoughts?
> Hamish
>
> _______________________________________________
> grass-dev mailing list
> grass-dev at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grass-dev