[GRASS5] closing monitors by click
Eric G . Miller
egm2 at jps.net
Thu Nov 30 21:22:56 EST 2000
On Thu, Nov 30, 2000 at 12:01:44AM +0100, Andreas Lange wrote:
> Hi 2 all,
>
> i spend a lot of time in investigating this problem, but i will give
> up on this. I'll describe my findings in the hope that someone with
> more background on Linux system programming can help us.
>
> I used ddd with a non-stripped build of the whole GRASS source to look
> at the behaviour of d.mon -L/-l, start/stop, which call the programs
> mon.status, mon.start, mon.stop etc.
>
> I can circumcise the problem in the function fifoto in
> src/libes/raster/io.c.
>
> The idea with the fifoto function is that the fifos for reading and
> writing are opened, and if can be opened, the x driver is running. If
> a timeout occurs (i. e. the open blocks), the driver is not running or
> has lost connection.
>
> This is done as follows:
>
> no_mon = 0; /* global variable is reset to signal that no monitor is
> running */
> sigalarm = signal(SIGALRM, dead); /* SIGALRM is set to call a own
> function dead,
> which does nothing than setting no_mon to 1 */
> alarm(alarm_time); /* shedule the alarm to come in in alarm_time seconds
> */
> _wfd = open(output, O_WRONLY); /* open the fifo, if timeout occurs, the
> alarm function will cancel */
> alarm(0); /* cancel/reset the alarm function */
> if (no_mon) /* monitor is not listening on this fifo */
> return 0;
>
> The same is repeated for the reading fifo.
>
> This seems to work on IRIX (what about BSD and Solaris, would be
> interested in hearing from users of those systems), but on Linux the
> open call hangs forever. The man page for open on IRIX describes
> explicitly this form of testing if an other process is listening on a
> fifo. If you use O_WRONLY & O_NONBLOCK or O_WRONLY & O_NDELAY, the open
> process should not block.
It's not the blocking that's the problem it's the differing semantics of
the signal() function on BSD and SYSV. The GNU Libc documentation says
it uses the semantics of BSD when using signal(). But those semantics
can be changed depending if something like __SYSV__ is defined (I don't
think that's the exact symbol). Anyway, the docs also strongly
recommend the POSIX.1 standard sigaction() instead. I implemented this
last night for fifoto() and it seems to do the right thing. I need to
clean up how it save/restores the signal state, but that shouldn't
affect how it functions now. So try it out!
Note: I tried using O_NONBLOCK and then using fcntl() to unset the flag,
but intermittent errors of EOF being left on the fifo pipe were causing
problems with some programs some of the time. I guess sigaction()
should be pretty widely supported, so I think we should consider using
it whenever signal handling is required.
> There are two problems with this function:
> 1) There could be a race condition between the alarm call and the open
> call. But this should not matter as the process should never be blocked
> longer than the alarm period.
> 2) If system calls are restarted automatically, the open call is not
> interrupted with the SIGALARM handler returning. So the open hangs
> forever. This is IMHO the problem here.
>
> One other problem is that in the original code not O_RDONLY/O_WRONLY,
> but 0/1 are used.
I spotted that quick and it's gone. That is definitely bad juju.
> I tried already to implement this with setjmp/longjmp calls, but i could
> not get this to work either. The monitor starts, but the selection of
> the monitor fails alltogether.
>
> The removing of the lockfile is not the culprit, as i can test with
> IRIX. If i kill a monitor, the lockfile is not removed there either. But
> if i have a monitor running (x0) and i run "d.mon start=x0" the module
> exits with the message "monitor x0 already running", while with Linux
> the module blocks forever (in the open call in fifoto).
The above fix does not seem to change the problem with properly
responding to the destroywindow() event in X and removing the
lockfile (I don't think that's the right name for the event, but you get
the idea).
> With the IPC setup the fifoto function does not use a timeout (and never
> checks if the monitor is not open). I have no idea how this works.
This IPC thing need to be reconsidered anyway since it doesn't seem to
solve the portability issues like we had hoped.
> And i think we really need to merge together the fifo and IPC code, so
> that we can recompile with a compiler switch and some #ifdefs in the
> source. The way it is now (switching by copying files from different
> directories) is not usable. And i have not found a solution how i can
> use the IPC setup on Cygwin/Win32 in the first run. Compiling the whole
> source and issuing ./README.ipc -cantrememberflags and to recompile the
> whole thing is impractical.
Yea, that would be pretty straightforward for the IPC code if I remember
correctly (just a few changes). I wanted to try the same for a sockets
method, but function arguments/return values had to change.
--
Eric G. Miller <egm2 at jps.net>
----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo at geog.uni-hannover.de with
subject 'unsubscribe grass5'
More information about the grass-dev
mailing list