[GRASS-user] 64 bit or parallel processing
soerengebbert at gmx.de
Wed Feb 28 17:48:49 EST 2007
Glynn Clements wrote:
> Hamish wrote:
>> It may be interesting to try and parallelize the segmentation library,
>> but most of GRASS's raster code needs to be rewritten.
>> two problems:
>> 1) most grass raster modules are serial row based. maybe if the raster
>> format gets updated to a tiled model it would be possible to start
>> parallizing it. (not impossible, NULL and FP support was added in the past)
>> or break up lines into chunks like raster modules that have rows= or
>> percent= options already.
> Rows versus tiles doesn't have any impact upon the ability to
> parallelise the code.
It will be easier with a tile based approach to parallelize the code.
The g3d lib implementation allows different tile sizes. So you are able
to define tile sizes (befor open a map) which will fit perfectly into
the memory. Those tiles can be processed in parallel.
I guess a similar approach can be implemented using the segment library
for raster maps.
I hope to get a prototype running using tile based parallelizing
approach in 5 - 6 months. It will be part of the new gpde library and
will use the gpde array implementation for easy data access and because
its thread safe (AFAIKT).
The gpde lib uses OpenMP www.openmp.org for parallel computation in
grass. But only the linear equation solver and the linear equation
system assembler are parallelized. This will hopefully change in the future.
To enable parallel data access will be a hard work in grass and will
only work on cluster file systems. One approach will be to distribute
the rows and tiles to different storage places to enable parallel data
Just my two cent
>> 2) the raster file format is split over may directories making it hard
>> to "lock" a map so another job doesn't try and edit the same map. the
>> future plan is to have the raster format stored like the vector format,
>> $MAPSET/raster/$MAPNAME/element, edited as a copy and moved into position
>> in full when the raster is closed.
>> 3) (bonus problem) we need to ensure that the region (WIND) file isn't
>> modified by any module but g.region, and that modules read the WIND file
>> only one when they first start. (in case it changes mid-process)
> This isn't an issue for parallelising individual modules (i.e. using
> multiple threads in order to utilise multiple CPU cores). It is an
> issue for being able to run multiple modules concurrently.
> The core problem for parallelising GRASS is that the libraries aren't
> remotely thread-safe, and the way that they are written is such that
> making them thread-safe would be a lot of work.
> The two biggest issues are:
> 1. Use of global/static variables; libgis alone has ~180 of them (not
> including the 181 GRASS_copyright variables).
> 2. Use of "scratch" buffers. Current policy deprecates the use of
> alloca(), so eliminating scratch buffers would involve lots of
> malloc/free calls for short-lived buffers, which is inefficient.
> Consequently, it isn't feasible to make the bulk of GRASS thread-safe.
> We might be able to do this for limited portions, e.g. the core raster
> I/O code (get/put row operations), but you would still need to ensure
> that everything else was called from the main thread.
> It would be more feasible to modify individual modules; e.g. a
> multi-threaded version of r.mapcalc might be feasible. But unless the
> raster I/O code was made thread-safe, you would still be limited by
> the rate at which map data could be read and written by a single
> thread, so it would only be worthwhile for cases where the bulk of the
> overhead was in the actual calculations.
More information about the grass-user