[GRASS5] d.legend and d.out.png

Glynn Clements glynn.clements at virgin.net
Thu Aug 26 21:58:12 EDT 2004


Moritz Lennert wrote:

> >> > I'v seen this too, but can't trigger it now. Any chance of running the
> >> > command line version through a debugger?
> >> >
> >> > CFLAGS="-ggdb -Wall" ./configure \
> >> > ...
> >> > make
> >> >
> >> >
> >> > GRASS:~> gdb $GISBASE/etc/bin/cmd/d.what.vect
> >>
> >> I haven't had the time to recompile with -ggdb, but my version was
> >> compiled with -g. So I ran d.what.vect through gdb and behold: it works
> >> like a charm !
> >> To make it clear:
> >>
> >> I run d.what.vect once: no problem, I quit.
> >> I run it again: no reaction on first click, command terminates on the
> >> second click.
> >> I run gdb $GISBASE/etc/bin/cmd/d.what.vect, then "run" and it works
> >> perfectly, even rerunning it several times.
> >
> > So the problem goes away if you run it under gdb. So much for
> > debugging.
> >
> > If you have strace, ltrace or equivalent, you could try using those.
> 
> I've attached the strace output.
> 
> At the end I get:
> 
> [...]
> write(6, "\0\0\0\0", 4)                 = 4
> read(7, "\332\1\0\0", 4)                = 4
> read(7, "\27\1\0\0", 4)                 = 4
> read(7, "\0\0\0\0", 4)                  = 4
> write(6, "\0\0\0\0", 4)                 = 4
> read(7, "\332\1\0\0", 4)                = 4
> read(7, "\27\1\0\0", 4)                 = 4
> read(7, "\1\0\0\0", 4)                  = 4
> write(6, "\177-", 2)                    = 2
> read(7, "\177", 1)                      = 1
> write(9, "C", 1)                        = -1 EPIPE (Broken pipe)
> --- SIGPIPE (Broken pipe) @ 0 (0) ---
> +++ killed by SIGPIPE +++

Right.

Short version: the etc/form/form program is dying (for as yet unknown
reasons).

Long version:

Descriptor 9 comes from the socketpair call (line 7030 in the strace
output):

	socketpair(PF_FILE, SOCK_STREAM, 0, [8, 9]) = 0

which corresponds to line 52 in lib/form/open.c:

	if ( G_sock_socketpair(AF_UNIX, SOCK_STREAM, 0, pipefd) < 0) 

In that file, it accessed by the macro pfd:

	#define	pfd	pipefd[1]	/* parent's end */

BTW, descriptors 6 and 7 are the connection to the monitor:

	socket(PF_FILE, SOCK_STREAM, 0)         = 6
	connect(6, {sa_family=AF_FILE, path="/tmp/grass57-mlennert-5169/x0"}, 110) = 0
	dup(6)                                  = 7

[lines 813-815 in the strace output]

The life history of that connection is:

	fcntl64(9, F_GETFL)                     = 0x2 (flags O_RDWR)
	fstat64(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
	_llseek(9, 0, 0xbfffe8ec, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
	
	fcntl64(9, F_GETFL)                     = 0x2 (flags O_RDWR)
	fstat64(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
	_llseek(9, 0, 0xbfffe8ec, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
	
	write(9, "O8\ncommunes550\n<HTML><HEAD><TITL"..., 565) = 565

	read(9, 0x4132f000, 1024)               = ? ERESTARTSYS (To be restarted)
	--- SIGCHLD (Child exited) @ 0 (0) ---
	read(9, 0x4132f000, 1024)               = -1 ECONNRESET (Connection reset by peer)

	write(9, "C", 1)                        = -1 EPIPE (Broken pipe)

The first two blocks correspond to lines 105-106 of lib/form/open.c:

	    parent_send = fdopen (pfd, "w");
	    parent_recv = fdopen (pfd, "r");

The subsequent write() corresponds to lines 113-120:

	fprintf ( parent_send, "O" );
	length = strlen ( title );
	fprintf ( parent_send, "%d\n", length );
	fprintf ( parent_send, "%s", title );
	length = strlen ( html );
	fprintf ( parent_send, "%d\n", length );
	fprintf ( parent_send, "%s", html );
	fflush ( parent_send );

The read() call (which fails) corresponds to line 124:

	/* Wait for response */
	c = fgetc ( parent_recv );

The SIGCHLD which is received at that point is almost certainly caused
by the etc/form/form program dying. Also, the fact that there are two
read() calls (the first failing with ERESTARTSYS) is an artifact of
the process receiving a signal (SIGCHLD) whose disposition is not to
interrupt system calls (i.e. the call is interrupted, but the kernel
restarts it automatically).

The parent doesn't notice that the child has died (it ignores the
value returned from fgetc(), apart from using in the following
G_debug() statement).

Finally, it calls F_clear(), which corresponds to the final write()
call, at lines 143-144:

	    fprintf ( parent_send, "C" );
	    fflush ( parent_send );

As there isn't anything on the other end of the connection, the
process receives SIGPIPE and terminates.

The next step is to try to debug the child process. Try running
d.what.vect but don't interact with it. Use "ps" to check whether the
child process ($GISBASE/etc/form/form) is still running at this point.
If it is, try attaching to it with gdb, e.g.

	$ gdb $GISBASE/etc/form/form
	> attach <pid>

where <pid> is the PID of the etc/form/form process.

Or use "strace -p <pid>".

OTOH, if etc/form/form is already dead, try using "strace -f ...",
which should also trace any child processes which d.what.vect creates. 
However, this tends to be unreliable; the process usually starts
running before strace can attach to it, so you often miss the startup.

Although, I have a strong suspicion that the ultimate solution will be
to scrap the form library. It doesn't even compile for me, it doesn't
work for you and, in four days of discussing this, I've yet to hear
any comments from either of the people who've actually worked on it
(Alex and Radim).

At the very least, I think that plain-text (i.e. -x) should be the
default, and the use of the form library should require an explicit
switch.

-- 
Glynn Clements <glynn.clements at virgin.net>




More information about the grass-dev mailing list