[GRASS-dev] Using GRASS in long running and multithreaded applications Was: Re: The tomcat shut down ...

Thu Oct 1 12:54:49 EDT 2009

Hi Glynn,
thanks a lot for your response.
After reading some documentation and asking "silly" questions to
my poor informatics colleagues, i understand the concept of thread local and
the setjmp()/longjmp() approach a bit better.

I would suggest to add longjmp() to G_fatal_error().
It should be set at runtime by an application if longjmp() should be
chosen or not.
So G_fatal_error() will either call longjmp() or exit().

The setjmp() code goes into the application which calls the grass
library functions,
except if nested setjmp()/longjmp() calls are needed in grass to clean
up data, or
to close open file descriptors.

The linux threaded errno definition scared me, so i have chosen a
different approach.
We define thread local support and two extern variables in gis.h to
choose at runtime if
G_fatal_error() will call exit() or longjmp() and to add thread local support.

Example which works for me in my test code:

/*Thread local and setjmp() exception support*/
#include <setjmp.h>
#ifdef WIN32
#define Thread   __declspec( thread )
#else
#define Thread   __thread
#endif

extern Thread jmp_buf G_stack_buffer;    /*to save the most important
CPU register for each thread*/
extern int G_long_run;    /*Set to 1 to choose the setjmp() version of
G_fatal_error()*/

The G_long_run variable will be initialized in gisinit.c and so the
G_stack_buff:

int G_long_run;
Thread jmp_buf G_stack_buffer;
...
void G__gisinit(const char *version, const char *pgm)
{
    const char *mapset;

    if (initialized)
        return;

     G_long_run = 0;
...

The application has to set G_long_run right after
calling G_gisinit() from a single thread (i.e: a singleton).
Now we need to patch error.c to use longjmp() or exit():

void G_fatal_error(const char *msg, ...)
{
    va_list ap;

    va_start(ap, msg);
    vfprint_error(ERR, msg, ap);
    va_end(ap);

    if(G_long_run == 1)
       longjmp(G_stack_buffer, 1);
    else
       exit(EXIT_FAILURE);
}

The C++ application code may look like this:

extern "C" {
#include <grass/gis.h>
}

...
int G_long_run;
Thread jmp_buf G_stack_buffer;

 vtkGRASSInit::vtkGRASSInit() {

     G_gisinit("vtkGRASSBridge");

	 // Set the long run variable to provide long run support in grass libraries
	 G_long_run = 1;
}
...
/*Open a vector map*/

...
	if(!setjmp(G_stack_buffer))
	{
		if (1 > Vect_open_new(&this->map, name, with_z))
		{
			fprintf(stderr, "class: %s line: %i Unable to open vector map <%s>.",
					   this->GetClassName(), __LINE__, name);
			return false;
		}
	} else {
		fprintf(stderr, "class: %s line: %i Unable to open vector map <%s>.",
				   this->GetClassName(), __LINE__, name);;
		return false;
	}
...

That's all.

Is this approach ok or to simple or just naive? :)

If this is ok, i would like to test this approach to identify possible
nested setjmp()/longjmp()
calls in libgis, libraster and libvector.

Additionally i will try to make most of the static variable thread local.

Best regards
Soeren

2009/9/27 Glynn Clements <glynn at gclements.plus.com>:
>
> Soeren Gebbert wrote:
>
>> > I'm not suggesting making all of the functions take a pointer to the
>> > state as a parameter, just making it thread-local.
>>
>> Ok.
>> To my shame i have to admit that i never heard of the thread-local
>> mechanism before.
>> After a quick look at wikipedia i understand the principal and it sounds great!
>> This will speed up things a lot.
>> I guess we need to use the pthread version of thread-local to support
>> other compiler than gcc and windows too?
>
> You would need to conditionalise it. The usual mechanism is like that
> used for errno. In a single-threaded implementation, it's just a
> variable. In a multi-threaded implementation, it's a macro which
> expands to (*errno_location()), where errno_location() retrieves the
> address using pthread_getspecific().
>
>> > However, the error handling is probably a bigger issue. Pushing error
>> > handling onto the modules isn't an acceptable solution.
>>
>> Indeed. This was the next issue i would like to talk about.
>>
>> > Simply allowing the fatal error handler to longjmp() out then resume
>> > using the GRASS libraries would be non-trivial, as you would have to
>> > repair any inconsistencies in the library state.
>>
>> Is there an alternative to longjmp() and setjmp()?
>
> Not really.
>
>> It seems to be quite complex, the man page warns about the usage.
>> And i never used it before.
>
> longjmp() is conceptually similar to raising an exception in C++,
> while setjmp() is equivalent to establishing a try/catch block.
>
> The details are quite simple if you understand how C is implemented in
> terms of machine code. setjmp() essentially saves the most important
> CPU registers (including the program counter, stack pointer, and frame
> pointer), while longjmp() restores them. So setjmp() records the
> current execution state while longjmp() restores it (similar to
> save/load in a game).
>
> Most of the complexities and warnings relate to potential interactions
> with optimisation. Primarily, local variables in the function which
> calls setjmp() aren't guaranteed to be restored to the correct value
> by longjmp(). gcc warns you if this might occur. Using the "volatile"
> qualifier can help here.
>
> The other caveat is that you can't "wrap" setjmp(). The saved state
> ceases to become valid once you leave the function which called
> setjmp(), so you can only call longjmp() from within a "descendent" of
> the function which calls setjmp().
>
> In terms of using it to recover from a fatal error, the usage would be
> something along the lines of:
>
>        jmp_buf save;
>
>        int my_handler(const char *msg, int fatal) {
>            print_error(msg, fatal);
>            longjmp(save, 1);
>            return 0; /* can't happen; longjmp() doesn't return */
>        }
>
>        void main_loop(void) {
>            volatile int done;
>            G_set_error_routine(my_handler);
>            for (done = 0; !done; ) {
>                if (setjmp(save) != 0)
>                    continue;   /* fatal error happened */
>                done = do_next_action();
>            }
>            G_unset_error_routine();
>        }
>
> A common idiom is to call setjmp() in the top-level loop, at the
> beginning of each "action", and have the fatal error handler call
> longjmp(). If an error occurs during the execution of an action, the
> longjmp() will jump back out to the main loop which can then process
> the next action.
>
> An example can be found in lib/driver/main.c (in 6.x), where setjmp()
> and longjmp() are used to to recover from SIGPIPE, so that the monitor
> doesn't terminate if the client terminates prematurely.
>
>> > Allowing G_fatal_error() to return is enough work that it can probably
>> > be ruled out. Apart from changing every single call (I count 520
>> > references in lib/*), almost every public function would need two
>> > versions: one which returns an error code and one which treats errors
>> > as fatal (i.e. only returns upon success).
>>
>> 520 calls are indeed a lot. The raster and gis libraries all together
>> have 70 calls and
>> the vector and db libraries have 190 calls.
>> Glynn, if you can point me to a concrete implementation concept, i
>> would like to start to patch the gis, raster, vector and db libraries
>> in grass7.
>> Maybe we can use signals to set an error variable in the resume error function?
>
> The least invasive approach is to perform clean-up before calling
> G_fatal_error(), so that subsequent operations don't crash GRASS, and
> rely upon the application registering an error handler which
> longjmp()s out.
>
> G_fatal_error() can't be made to return, as that would break all of
> the modules which use it. And library functions which don't return on
> error can't be changed to return.
>
> You *could* replace existing functions with a wrapper around a version
> which returns on error. The original function would be modified to
> return a status indication upon error, and the wrapper would just call
> the modified version and call G_fatal_error() in the event of an
> error. Functions which want to handle the error themselves would call
> the lower-level function.
>
> The main problems here are:
>
> 1. The sheer number of such functions.
>
> 2. The functions may rely upon other functions which currently call
> G_fatal_error(). So you would have to make similar changes to the
> underlying functions, then modify the calling function to allow for
> the fact that these functions can fail.
>
> 3. Reporting errors where the error message includes data from local
> variables. One option here would be to give the underlying function a
> "fatal" parameter, and add a G_error() function which takes an extra
> parameter indicating whether to terminate.
>
> All things considered, making it safe to longjmp() out of the fatal
> error handler is would be a lot less work.
>
>> > The main issue for concurrent reading is that the raster library
>> > caches the current row, so that upscaling doesn't read and decode each
>> > row multiple times. That's problematic if you want multiple threads
>> > reading the same map.
>>
>> Reading single raster maps in different threads is just great. Everything else
>> is like icing on the cake.
>
> BTW, you can read the same map from multiple threads provided that you
> open it once for each thread.
>
> r.mapcalc only opens each map once, but it uses a mutex to prevent
> concurrent access.
>
> --
> Glynn Clements <glynn at gclements.plus.com>
>