[GRASS-dev] Re: locking on a raster

Glynn Clements glynn at gclements.plus.com
Sun Apr 6 22:51:34 EDT 2008


Hamish wrote:

> > To make concurrent access work, updating a map would need to be an
> > atomic operation, so that any module which reads the map sees either
> > the "before" version or the "after" version, and never sees an
> > "in-progress" version.
> > 
> > This means that while any module is in the process of opening the
> > various elements that make up a map, the map cannot be replaced.
> 
> [somewhat OT, somewhat not]
> In my mind, due to the upcoming prevalence of multi-core processors
> working to convert the raster modules take advantage of multi-processor
> systems is much more important that allowing concurrent use of a single
> mapset. (not that they are mutually exclusive goals, but it may be good
> to think of one in light of the other)

While fine-grained concurrency is likely to be more useful, it's also
significantly harder to implement. Particularly for GRASS, as many of
the core library functions are not remotely thread-safe.

> Whether it is best to start with modules that use the segment memory (I
> assume this is typically for RAM limitations not CPU) or splitting serial
> G_{get|put}_raster_row() operations into n chunks of rows, I am not sure.
> And how (or is it possible) to abstract that to the library level to
> avoid a massive rewrite. (massive raster module rewrite is not off the
> table for GRASS 7 but seems like a less bang-for-buck approach) It would
> be interesting to rewrite r.example to split the operation into n
> threads. (GRASS_NPROC=n or GRASS_THREADS=n enviro var?)

As you can't call G_get_raster_row() etc from multiple threads, you
can't realistically write a N-threaded version of r.example.

There are some relatively simple things which can be done to take
advantage of multiple threads, e.g. having one thread for calling
GRASS functions and one or more threads for computation.

However, as r.example doesn't do any computation, you can't really
demonstrate that technique there.

With some cleaning up, you might be able to have one thread for
reading, one or more threads for computation, and one thread for
writing. However, the number of threads which would actually be useful
for computation would be limited by the I/O speed.

Having multiple input threads would be possible, but it would require
a fair amount of work in terms of cleaning up the I/O code. It would
probably also result in a net efficiency loss[1], so it would only
make sense for single-user desktop systems (i.e. where the other
core(s) would otherwise be idle).

[1] In particular, if the region resolution is finer than the map
resolution, some of the rows will be duplicated. The I/O code keeps a
copy of the last row read, so consecutive requests for the same map
row don't decode the data repeatedly. Obviously, this requires that
rows are read sequentially; if two threads read the same map row
concurrently, you'll end up decoding the same data twice.

Multiple output threads would effectively require that output maps
were uncompressed. Compressed maps require that you finish writing row
N before you can start writing row N+1.

Probably the most practical use of multiple threads would be in
modules which:

a) spend most of their time performing computations, rather than in
GRASS I/O, and

b) are widely used (to justify the effort).

The most obvious candidate is r.mapcalc.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list