[Gdal-dev] Changing How GDAL Reports Errors

Fri Sep 2 19:27:13 EDT 2005

I'm posting this to the gdal-dev list to see what people think of
the possibility of changing how GDAL reports errors.

Currently, GDAL uses a mixture of four error reporting mechanisms:

1.  Many functions return an error code indicating that an error
occurred.  This error code is nothing more than a flag that indicates
whether or not an error occurred.  Checking this error flag is the
primary way of determining whether an operation succeeded or failed.

2.  More detailed information about errors is stored in
thread-local global variables that are accessed with the functions
CPLGetLastErrorNo, CPLGetLastErrorType, and CPLGetLastErrorMsg.
The "error type" is usually CE_Failure.  The "error number" is one
of a small number of codes such as CPLE_OutOfMemory or CPLE_FileIO.
The "error message" contains the real information about where the
error occurred and what it means.

3.  When an error occurs--before control is returned to the
caller--an error handler function is called.  This error handler
function is set with CPLSetErrorHandler.  The default error handler
CPLDefaultErrorHandler prints the error message to stderr.  Two other
error handler functions provided by GDAL are CPLQuietErrorHandler
and CPLLoggingErrorHandler.  The actions that can be taken in the
error handler function are very limited; in particular, you can't
translate the error into a thrown exception.  So you have little
choice but to pick one of the three provided error handler functions.
The error handler mechanism requires thread-local storage.

4.  For some errors, rather than use the normal mixture of the above
three techniques, GDAL throws an exception, usually std::bad_alloc.

For warnings, only the error handler function is used.  So if you want
to do anything other than print warnings to stderr, send them to a
log file, or ignore them, you have to write your own error handler
function that stores warnings in a thread-local global buffer, and
check the buffer when control returns to the caller.

GDAL also has "fatal" errors that never return control to the caller.
These errors are always reported to the error handler, but no matter
what the error handler does with them, the program is aborted.

I propose to change this error reporting mechanism to the following
mechanism:

The C and C++ APIs would have different error reporting mechanisms,
and the warning reporting mechanism would be different from the error
reporting mechanism.

The C++ API would throw exceptions.  Except for std::bad_alloc for
memory allocation errors, it would only throw exceptions of type
"CPLErrorRecord", which would be a data structure containing
the information currently returned by CPLGetLastErrorNo,
CPLGetLastErrorType, and CPLGetLastErrorMsg.

The C API would use just the "return an error code" mechanism.
Every C API function that can fail would be modified to take an extra
parameter, a pointer to a "CPLErrorRecord".  If an error occurs,
the error record would indicate it and have complete information
about the error.

Warnings would be stored in a "CPLWarningRecord" data structure.
Any API function that could emit a warning would have an extra
parameter added, which would be a pointer to a "CPLWarningRecord"
object.

"Fatal" errors are bad, but unavoidable in some circumstances.
These would be converted to exceptions whereever possible.  If there
are any fatal errors that can't be converted to exceptions, they
would just call abort directly (which is what "assert" does).

(I'm not committed to the details of the above mechanism.  A different
mechanism would be acceptable to me if it had the same advantages
over the current mechanism.)

This error reporting mechanism would have the following advantages
over the existing mechanism:

1.  For languages such as C++ that have exception handling, dealing
with errors would be much easier for the standard reasons:

1.a.  Exceptions can't be silently ignored.  This makes exceptions
less error-prone than other mechanisms:  the programmer can not,
through accident or laziness, fail to deal with an exception.

1.b.  Exceptions propagate automatically.  Exceptions need only be
caught and handled in the appropriate place, which could be just one
place for the entire program.  With error codes, every function call
that can return an error code must have that error code checked,
and possibly propagated to the caller.

1.c.  Exception handling removes error handling and recovery from the
main line of control flow.  With error codes, error handling must be
interspersed with the code that executes in the absence of errors.
With exceptions, error handling is done is separate blocks of code.
This makes the code better organized, easier to read, and simpler.

1.d.  Exceptions can be used to report errors even in functions that
can't return an error code, such as operators and constructors.

For more on why exceptions are generally better than the other known
methods of reporting errors, see "C++ Coding Standards" item 72 or
"The C++ Programming Language, special 3rd edition", section 14.1.

2.  I don't know much about GDAL's interfaces to other languages such
as Python, but I think it's likely that for those language bindings
that use SWIG, exception-based error handling is much easier to
implement and use than GDAL's current mechanism.

3.  For languages such as C that don't have exception handling,
returning error codes is the best known way of reporting errors.
Reporting errors using a "CPLErrorRecord" data structure is simpler
than the current mechanism because all information about the error is
in one place.  Also, it's easier to deal with just one mechanism (error
codes) rather than a mixture of three mechanisms.  (Errors reported
using the fourth current mechanism will cause a C program to crash.)

4.  The current mechanism relies heavily on thread-local storage.
This is an advanced threading technique that has poor support on many
current OSs, including Windows.

5.  In the current mechanism, it is very difficult to do anything
with warnings other than send them to stderr or a log file.
This is unacceptable for programs with a graphical user interface.
Explicitly passing in a pointer to a record in which to store warnings
would make it easy to do anything with warnings.

6.  Using exception handling internally would simplify GDAL's guts.
GDAL would be more robust and maintainable.  The reasons are the same
as for reason 1 above.

Here are the costs of making this change that I can see:

1.  Any C or C++ client code would need to be changed.  However,
the changes would be minor, and would generally tend to improve the
quality of the client code.  I don't know if client code in other
languages such as Python would need to change; I doubt it.

2.  It would need to be implemented.  I would be willing to do this
work.  I believe the implementation can be organized into a sequence
of minor changes, so that (hopefully) at no time will GDAL be in a
"broken" state.  The biggest "break" would consist of flipping a
switch to turn exceptions on.

Does anyone see any other costs worth mentioning?  Does anyone feel
that either of these costs is a show-stopper?  Does anyone see any
changes they would like to make to the proposed new mechanism?