[GRASS-dev] Using GRASS in long running and multithreaded applications Was: Re: The tomcat shut down ...

Sat Sep 26 20:30:31 EDT 2009

Soeren Gebbert wrote:

> > I'm not suggesting making all of the functions take a pointer to the
> > state as a parameter, just making it thread-local.
> 
> Ok.
> To my shame i have to admit that i never heard of the thread-local
> mechanism before.
> After a quick look at wikipedia i understand the principal and it sounds great!
> This will speed up things a lot.
> I guess we need to use the pthread version of thread-local to support
> other compiler than gcc and windows too?

You would need to conditionalise it. The usual mechanism is like that
used for errno. In a single-threaded implementation, it's just a
variable. In a multi-threaded implementation, it's a macro which
expands to (*errno_location()), where errno_location() retrieves the
address using pthread_getspecific().

> > However, the error handling is probably a bigger issue. Pushing error
> > handling onto the modules isn't an acceptable solution.
> 
> Indeed. This was the next issue i would like to talk about.
> 
> > Simply allowing the fatal error handler to longjmp() out then resume
> > using the GRASS libraries would be non-trivial, as you would have to
> > repair any inconsistencies in the library state.
> 
> Is there an alternative to longjmp() and setjmp()?

Not really.

> It seems to be quite complex, the man page warns about the usage. 
> And i never used it before.

longjmp() is conceptually similar to raising an exception in C++,
while setjmp() is equivalent to establishing a try/catch block.

The details are quite simple if you understand how C is implemented in
terms of machine code. setjmp() essentially saves the most important
CPU registers (including the program counter, stack pointer, and frame
pointer), while longjmp() restores them. So setjmp() records the
current execution state while longjmp() restores it (similar to
save/load in a game).

Most of the complexities and warnings relate to potential interactions
with optimisation. Primarily, local variables in the function which
calls setjmp() aren't guaranteed to be restored to the correct value
by longjmp(). gcc warns you if this might occur. Using the "volatile"
qualifier can help here.

The other caveat is that you can't "wrap" setjmp(). The saved state
ceases to become valid once you leave the function which called
setjmp(), so you can only call longjmp() from within a "descendent" of
the function which calls setjmp().

In terms of using it to recover from a fatal error, the usage would be
something along the lines of:

	jmp_buf save;

	int my_handler(const char *msg, int fatal) {
	    print_error(msg, fatal);
	    longjmp(save, 1);
	    return 0; /* can't happen; longjmp() doesn't return */
	}

	void main_loop(void) {
	    volatile int done;
	    G_set_error_routine(my_handler);
	    for (done = 0; !done; ) {
	        if (setjmp(save) != 0)
	            continue;	/* fatal error happened */
	        done = do_next_action();
	    }
	    G_unset_error_routine();
	}

A common idiom is to call setjmp() in the top-level loop, at the
beginning of each "action", and have the fatal error handler call
longjmp(). If an error occurs during the execution of an action, the
longjmp() will jump back out to the main loop which can then process
the next action.

An example can be found in lib/driver/main.c (in 6.x), where setjmp()
and longjmp() are used to to recover from SIGPIPE, so that the monitor
doesn't terminate if the client terminates prematurely.

> > Allowing G_fatal_error() to return is enough work that it can probably
> > be ruled out. Apart from changing every single call (I count 520
> > references in lib/*), almost every public function would need two
> > versions: one which returns an error code and one which treats errors
> > as fatal (i.e. only returns upon success).
> 
> 520 calls are indeed a lot. The raster and gis libraries all together
> have 70 calls and
> the vector and db libraries have 190 calls.
> Glynn, if you can point me to a concrete implementation concept, i
> would like to start to patch the gis, raster, vector and db libraries
> in grass7.
> Maybe we can use signals to set an error variable in the resume error function?

The least invasive approach is to perform clean-up before calling
G_fatal_error(), so that subsequent operations don't crash GRASS, and
rely upon the application registering an error handler which
longjmp()s out.

G_fatal_error() can't be made to return, as that would break all of
the modules which use it. And library functions which don't return on
error can't be changed to return.

You *could* replace existing functions with a wrapper around a version
which returns on error. The original function would be modified to
return a status indication upon error, and the wrapper would just call
the modified version and call G_fatal_error() in the event of an
error. Functions which want to handle the error themselves would call
the lower-level function.

The main problems here are:

1. The sheer number of such functions.

2. The functions may rely upon other functions which currently call
G_fatal_error(). So you would have to make similar changes to the
underlying functions, then modify the calling function to allow for
the fact that these functions can fail.

3. Reporting errors where the error message includes data from local
variables. One option here would be to give the underlying function a
"fatal" parameter, and add a G_error() function which takes an extra
parameter indicating whether to terminate.

All things considered, making it safe to longjmp() out of the fatal
error handler is would be a lot less work.

> > The main issue for concurrent reading is that the raster library
> > caches the current row, so that upscaling doesn't read and decode each
> > row multiple times. That's problematic if you want multiple threads
> > reading the same map.
> 
> Reading single raster maps in different threads is just great. Everything else
> is like icing on the cake.

BTW, you can read the same map from multiple threads provided that you
open it once for each thread.

r.mapcalc only opens each map once, but it uses a mutex to prevent
concurrent access.

-- 
Glynn Clements <glynn at gclements.plus.com>