[GRASS-user] 64 bit or parallel processing
c.ehlschlaeger at insightbb.com
Wed Feb 28 19:52:38 EST 2007
Most GRASS raster functions, r.mapcalc and similar, won't speed up by more
than a couple of percent from parallelization. 90% of the time spent by
these commands comes from moving the data off the hard drive and getting it
to the CPU and visa versa (based on parallel processing research I did in
the mid '90s). For "normal" GRASS raster commands, having four hard drives
on RAID 0 will dramatically increase GRASS performance more than anything
one thing you could do to a computer (or GRASS itself). The reason ORNL
parallelized my code was that the computational requirements was over c^2,
where c is the number of grid cells on a map, for some of the analyses they
wanted to do.
As Hamish wrote in recent email, good candidates for parallelization would
be viewshed analysis or other functions requiring many cells to be read to
get a value for each output grid cell. r.watershed won't benefit from
Chuck Ehlschlaeger, Associate Professor & GIS Center Director
Department of Geography, Western Illinois University
215 Tillman Hall, 1 University Circle, Macomb, IL 61455
cre111 at wiu.edu, phone: 309-298-1841, fax: 309-298-3003
From: grassuser-bounces at grass.itc.it [mailto:grassuser-bounces at grass.itc.it]
On Behalf Of Glynn Clements
Sent: Wednesday, February 28, 2007 3:33 PM
Cc: grassuser at grass.itc.it
Subject: Re: [GRASS-user] 64 bit or parallel processing
> It may be interesting to try and parallelize the segmentation library,
> but most of GRASS's raster code needs to be rewritten.
> two problems:
> 1) most grass raster modules are serial row based. maybe if the raster
> format gets updated to a tiled model it would be possible to start
> parallizing it. (not impossible, NULL and FP support was added in the
> or break up lines into chunks like raster modules that have rows= or
> percent= options already.
Rows versus tiles doesn't have any impact upon the ability to
parallelise the code.
> 2) the raster file format is split over may directories making it hard
> to "lock" a map so another job doesn't try and edit the same map. the
> future plan is to have the raster format stored like the vector format,
> $MAPSET/raster/$MAPNAME/element, edited as a copy and moved into position
> in full when the raster is closed.
> 3) (bonus problem) we need to ensure that the region (WIND) file isn't
> modified by any module but g.region, and that modules read the WIND file
> only one when they first start. (in case it changes mid-process)
This isn't an issue for parallelising individual modules (i.e. using
multiple threads in order to utilise multiple CPU cores). It is an
issue for being able to run multiple modules concurrently.
The core problem for parallelising GRASS is that the libraries aren't
remotely thread-safe, and the way that they are written is such that
making them thread-safe would be a lot of work.
The two biggest issues are:
1. Use of global/static variables; libgis alone has ~180 of them (not
including the 181 GRASS_copyright variables).
2. Use of "scratch" buffers. Current policy deprecates the use of
alloca(), so eliminating scratch buffers would involve lots of
malloc/free calls for short-lived buffers, which is inefficient.
Consequently, it isn't feasible to make the bulk of GRASS thread-safe.
We might be able to do this for limited portions, e.g. the core raster
I/O code (get/put row operations), but you would still need to ensure
that everything else was called from the main thread.
It would be more feasible to modify individual modules; e.g. a
multi-threaded version of r.mapcalc might be feasible. But unless the
raster I/O code was made thread-safe, you would still be limited by
the rate at which map data could be read and written by a single
thread, so it would only be worthwhile for cases where the bulk of the
overhead was in the actual calculations.
Glynn Clements <glynn at gclements.plus.com>
grassuser mailing list
grassuser at grass.itc.it
More information about the grass-user