[GRASS-dev] [GRASS GIS] #2326: Command functions in grass.script.core miss a correct error reporting

Sat Jun 14 23:07:06 PDT 2014

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
 Reporter:  wenzeslaus          |       Owner:  grass-dev@…              
     Type:  enhancement         |      Status:  new                      
 Priority:  major               |   Milestone:  7.1.0                    
Component:  Python              |     Version:  svn-trunk                
 Keywords:  script, exceptions  |    Platform:  All                      
      Cpu:  Unspecified         |  
--------------------------------+-------------------------------------------
Changes (by wenzeslaus):

  * keywords:  script => script, exceptions

Comment:

 Here are my suggestions for changes in `grass.script.core`.

 The `read_command` function is similar to
 [https://docs.python.org/2/library/subprocess.html#subprocess.check_output
 subprocess.check_output] function, so I used the implementation from
 Python subprocess
 ([http://hg.python.org/cpython/file/2145593d108d/Lib/subprocess.py#l508
 cpython: Lib/subprocess.py]). It check the `stderr` parameter, uses
 `communicate()` and `poll()` and raises on non-zero return code. The main
 difference from `subprocess.check_output` (expect from GRASS parameter
 handing) is that stderr is stored to exception to provide a detailed
 message at the right place.

 {{{
 #!python
 def read_command(*args, **kwargs):
     """Run command and return its output."""
     # implemenation inspired by subprocess.check_output() function
     if 'stderr' in kwargs:
         raise ValueError('stderr argument not allowed, it would be
 overridden.')
     kwargs['stderr'] = PIPE
     process = pipe_command(*args, **kwargs)
     output, errors = process.communicate()
     returncode = process.poll()
     if returncode:
         cmd = kwargs.get("args")
         if cmd is None:
             cmd = args[0]
         raise CalledCommandError(returncode, cmd, errors=errors)
     return output
 }}}

 Note that `read_command()` is used by `parse_command()` function and that
 `pipe_command()` might be replaced by `start_command()` and `stdout=PIPE`.

 `write_command()` is similar to `read_command()` but without stdin and win
 stdout as a string:

 {{{
 #!python
 def write_command(*args, **kwargs):
     """!Passes all arguments to feed_command, with the string specified
     by the 'stdin' argument fed to the process' stdin.
     """
     # implemenation inspired by subprocess.check_output() function
     if 'stdin' not in kwargs:
         raise ValueError('stdin argument is required.')
     if 'stderr' in kwargs:
         raise ValueError('stderr argument not allowed, it would be
 overridden.')
     stdin = kwargs['stdin']  # get a string for stdin
     kwargs['stdin'] = PIPE  # to be able to send data to stdin
     kwargs['stderr'] = PIPE
     process = start_command(*args, **kwargs)
     unused, errors = process.communicate(stdin)
     returncode = process.poll()
     if returncode:
         cmd = kwargs.get("args")
         if cmd is None:
             cmd = args[0]
         raise CalledCommandError(returncode, cmd, erros=errors)
     # subprocess.check_call() function returns 0 for success
     # but this would create only confusion since we don't return for
 failure
 }}}

 `write_read_command()` is a new proposed function which is a combination
 of `write_command()` and `read_command()` functions.

 {{{
 #!python
 def write_read_command(*args, **kwargs):
     """Uses stdin string as stdin and returns stdout"""
     # implemenation inspired by subprocess.check_output() function
     if 'stdin' not in kwargs:
         raise ValueError('stdin argument is required.')
     if 'stdout' in kwargs:
         raise ValueError('stdout argument not allowed, it would be
 overridden.')
     if 'stderr' in kwargs:
         raise ValueError('stderr argument not allowed, it would be
 overridden.')
     stdin = kwargs['stdin']  # get a string for stdin
     kwargs['stdin'] = PIPE  # to be able to send data to stdin
     kwargs['stdout'] = PIPE
     kwargs['stderr'] = PIPE
     process = start_command(*args, **kwargs)
     output, errors = process.communicate(stdin)
     returncode = process.poll()
     if returncode:
         cmd = kwargs.get("args")
         if cmd is None:
             cmd = args[0]
         raise CalledCommandError(returncode, cmd, erros=errors)
     return output
 }}}

 I'm not sure with limitations of `write_command()`, `read_command()` and
 `write_read_command()` considering large inputs and outputs but they are
 using `out, err = p.communicate(stdin)` and `p.poll()` which should be the
 best way according to what Glynn is saying, library (subprocess) is using,
 and documentation is suggesting. Moreover, they are not worse that the
 existing functions and I expect that for special cases you need to use
 `start_command()` directly anyway.

 The missing function is now a function which would [comment:3 connect
 stdout and stderr] (`stdout=PIPE, stderr=STDOUT`). But this is again
 something special (e.g. used in GUI) which might not need a convenient
 function and user should use `start_command()` directly.

 Considering how `run_command()` function is used I would suggest to change
 it to raise an exception. This will break the code which was using it
 correctly and this code will have to be fixed manually and perhaps adding
 try-except will be necessary. However, most of the code will improve since
 it will at least fail with proper error message at proper place and not
 few lines further with hard-to-understand message. Adding `return 0` to
 the end of the function may provide a compatibility with previous behavior
 for cases when the return value was used. It will not create an error when
 everything is OK but it will not use the error handling at that place when
 the command fails. At the end, `return 0` should be removed anyway.

 {{{
 #!diff
 -    ps = start_command(*args, **kwargs)
 -    return ps.wait()
 +    if 'stdin' not in kwargs:
 +        raise ValueError('stdin argument is required.')
 +    else:
 +        stdin = kwargs['stdin']
 +        del kwargs['stdin']  # clean non-standard argument
 +    process = feed_command(*args, **kwargs)
 +    unused, errors = process.communicate(stdin)
 +    returncode = process.poll()
 +    if returncode:
 +        cmd = kwargs.get("args")
 +        if cmd is None:
 +            cmd = args[0]
 +        raise CalledCommandError(returncode, cmd, erros=errors)
 +    return 0
 }}}

 I don't see particular advantage in functions `pipe_command()` (sets
 `kwargs['stdout'] = PIPE`) and `feed_command()` (sets `kwargs['stdin'] =
 PIPE`). The the parameters can be set directly with the function
 `start_command()` and for me the functions does not contribute to
 readability because the ''feed'' does not tell me if the string is the
 input (as for `write_command()` function) or it justs sets `stdin=PIPE`.
 Similarly, ''pipe'' does not tell me if everything is piped (stdin, stdout
 and stderr), stdin and stdout are piped (as for  `write_read_command()`
 function) or just one of stdin and stdout is piped. It is even more
 confusing name when I consider piping in command line where I connect
 multiple processes (which is not done by PIPE constant in
 [https://docs.python.org/2/library/subprocess.html#replacing-shell-
 pipeline subprocess]).

 I wanted to change (instead of adding) the functions, especially
 `run_command()`, originally but then I was afraid of problems with
 releasing 7.0 and changing API. However, the usage statistics and the fact
 that Glynn [comment:1 suggested the same] convinced me that we should
 really do it. However, it is challenging, we have to change some code (at
 least 150 occurrences of `run_command()` and `write_command()` in trunk
 and addons but maybe also other functions) and I'm still not sure with
 some detail such as if stderr should be always captured to get the right
 error message (consider e.g. the result with `DEBUG=1`).

 Another consideration is raising exception versus calling `fatal()`
 function which can be set to raise exception but this exception would have
 to contain everything in a message while a custom exception (I used one
 derived from
 [https://docs.python.org/2/library/subprocess.html#subprocess.CalledProcessError
 subprocess.CalledProcessError]) can store return code, command (or
 command/module name) and stderr separately.

 For the 7.0 this actually should not be applied completely but the API
 should be changed to fit the improved implementation/new API, so
 `run_command()` function should have `return 0` but a waring in
 documentation. Then perhaps `pipe_command()` and `feed_command()`
 functions should be marked as depreciated.

 Adding (not tested) draft of proposed changes.

-- 
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:6>
GRASS GIS <http://grass.osgeo.org>