[GRASS-user] 64 bit or parallel processing
Glynn Clements
glynn at gclements.plus.com
Wed Feb 28 16:33:06 EST 2007
Hamish wrote:
> It may be interesting to try and parallelize the segmentation library,
> but most of GRASS's raster code needs to be rewritten.
>
> two problems:
> 1) most grass raster modules are serial row based. maybe if the raster
> format gets updated to a tiled model it would be possible to start
> parallizing it. (not impossible, NULL and FP support was added in the past)
> or break up lines into chunks like raster modules that have rows= or
> percent= options already.
Rows versus tiles doesn't have any impact upon the ability to
parallelise the code.
> 2) the raster file format is split over may directories making it hard
> to "lock" a map so another job doesn't try and edit the same map. the
> future plan is to have the raster format stored like the vector format,
> $MAPSET/raster/$MAPNAME/element, edited as a copy and moved into position
> in full when the raster is closed.
>
> 3) (bonus problem) we need to ensure that the region (WIND) file isn't
> modified by any module but g.region, and that modules read the WIND file
> only one when they first start. (in case it changes mid-process)
This isn't an issue for parallelising individual modules (i.e. using
multiple threads in order to utilise multiple CPU cores). It is an
issue for being able to run multiple modules concurrently.
The core problem for parallelising GRASS is that the libraries aren't
remotely thread-safe, and the way that they are written is such that
making them thread-safe would be a lot of work.
The two biggest issues are:
1. Use of global/static variables; libgis alone has ~180 of them (not
including the 181 GRASS_copyright variables).
2. Use of "scratch" buffers. Current policy deprecates the use of
alloca(), so eliminating scratch buffers would involve lots of
malloc/free calls for short-lived buffers, which is inefficient.
Consequently, it isn't feasible to make the bulk of GRASS thread-safe.
We might be able to do this for limited portions, e.g. the core raster
I/O code (get/put row operations), but you would still need to ensure
that everything else was called from the main thread.
It would be more feasible to modify individual modules; e.g. a
multi-threaded version of r.mapcalc might be feasible. But unless the
raster I/O code was made thread-safe, you would still be limited by
the rate at which map data could be read and written by a single
thread, so it would only be worthwhile for cases where the bulk of the
overhead was in the actual calculations.
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-user
mailing list