[GRASS-dev] [GRASS GIS] #2134: Create a general exit-safe interface to C libraries

Wed Nov 20 02:12:37 PST 2013

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
 Reporter:  wenzeslaus                            |       Owner:  grass-dev@…              
     Type:  enhancement                           |      Status:  new                      
 Priority:  normal                                |   Milestone:  7.0.0                    
Component:  Python ctypes                         |     Version:  svn-trunk                
 Keywords:  G_fatal_error, exit, multiprocessing  |    Platform:  All                      
      Cpu:  Unspecified                           |  
--------------------------------------------------+-------------------------

Comment(by huhabla):

 Replying to [comment:9 wenzeslaus]:
 > Replying to [comment:6 glynn]:
 > > Replying to [comment:4 zarch]:
 > >
 > > > Or as already suggested by Glynn (#1646) wrap the
 G_add_error_handler,
 > >
 > > Using an error handler allows you to avoid process termination. But
 once a fatal error has occurred, you cannot safely call any GRASS
 function; doing so may well result in a segfault.
 >
 > One of the issues which `G_add_error_handler` is trying to solve is to
 provide meaningful error message to the user. For example, failing to open
 some temporary file causes `exit` with "`No such file /tmp/kjewbf8d38dj`".
 This does not help user nor programmer to understand that the error
 occurred (when does this happened, what is the stack trace, what are
 consequences and what are suggestions to solve it). In other words,
 sometimes the message provided by `G_fatal_error` caller is too low level.
 >
 > Python `RPCServer` with wrapper functions throwing exceptions would help
 to solve this issue.  But it seems to me that #1646 remains valid for
 pyGRASS (and possibly others) and C code itself.

 I have re-designed the RPC interface, now the Python function wrapper will
 return an exception and the result of the function calls, so that the RPC
 server interface that provides the {{{call()}}} functions can raise these
 exceptions (exceptions raised in the subprocess will kill the subprocess
 and will not be catch'd in the parent process). Hence, the Python wrapper
 functions transform the C-function return values into meaningful
 exceptions that will be raised in the parent process.

 While re-designing i concluded that a no wait function call
 {{{call_no_wait()}}} is not meaningful when mixed with calls that wait to
 receive data. There is only a limited number of C-functions that do not
 return values or return states. It is better to wait for a function call
 to finish, than risking a race condition in case a fatal error occur'd
 meanwhile. An exception is the messaging interface, which should stay as
 is.

 However, maybe two RPC interfaces are meaningful: one that waits for
 functions to return (expecting return values including exceptions) and one
 that does not wait?

 > Replying to [comment:7 huhabla]:
 > > In my opinion the RPC approach is only meaningful for persistent
 applications that need fast access to C-library functions, or that need
 low level API access for data modification (like digitizing).
 >
 > And this is not only vector and raster digitizing, this is also new
 scatter plot tool and in fact the whole `g.gui.iclass`, `nviz` (which is
 unfortunately more complicated) and of course everything temporal-related
 (everything started to be temporal-related).
 >
 > > My intention to write the RPC server was to make the temporal
 framework usable in persistent applications and to be as fast as possible.
 >
 > I'm not sure how the speed of `RPCServer` compares to module call but
 the speed is not the only advantage. Fine control of what is called and a
 smoother interface (possibly, depending on wrappers) is the other
 advantage. Calling subprocess from GUI for every single task and parsing
 its output is cumbersome.

 I have added benchmark runs to the rpc server script, to get an idea what
 the performance loss and gain of the RPC interface is:
 {{{
 GRASS 7.0.svn (Test XY):~ > python c_library_interface.py
 ##############################################################
 TESTS
 ERROR: A fatal error
 WARNING:root:Needed to restart the rpc server
 ERROR: A fatal error
 WARNING:root:Needed to restart the rpc server
 ##############################################################
 ##############################################################
 Raster map exists benchmark
 Time to call 1000 functions directly: 0.017043s
 Time to call 1000 functions via RPC: 0.178600s
 Time to perform 1000 g.findfile module runs: 30.343877s
 ##############################################################
 ##############################################################
 Raster map info benchmark
 Time to call 10000 functions directly: 0.856104s
 Time to call 10000 functions via RPC: 7.189188s
 Time to perform 10000 r.info module runs: 120.261683s
 ##############################################################
 }}}

 As you can see the RPC interface is for the two tested functions about 10
 times slower then the direct Python function calls that wrap the GRASS
 C-functions. But the RPC interface is about 17 to 600 times faster then
 using the grass.script interface that calls GRASS modules (g.findfile and
 r.info).

 > > The pyGRASS interface is well designed for module programming not for
 persistent applications. Otherwise each C-function call should be handled
 via RPC. From my point of view and some tests that i made slows the RPC
 approach the processing significantly down.
 >
 > So, in the next step, we need pyGRASS-like interface which is fail safe
 and temporal library which is faster?

 So you want to wrap all C-function calls in PyGRASS to be wrapped using
 the RPC interface?

 > > Running the script will show that calling functions from a dict is 50
 time faster than using a pipe with a subprocess.
 >
 > It seems that this is something we need to take care of. And this is
 something what my factory pattern suggestions is trying to address.
 >
 > We would need to create two sets of class with identical interface. One
 using `RPCServer` (safe) the other calling ctypes directly (fast). Objects
 should be created by factory, so that the factory will put the `RPCServer`
 into the objects, so the user does not take care of it. Maybe in Python we
 can go beyond the classic factory pattern and create also the
 `RPCServer`-dependent classes from the classes calling ctypes directly.
 >
 > I realize that ''some code'' would be appreciated but I cannot dive into
 this more now.

 I am not sure if i understand your approach, so code examples would be
 very helpful here. :)

-- 
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:10>
GRASS GIS <http://grass.osgeo.org>