[gdal-dev] Improving GDAL safety with respect to hostile/corrupt
files
Even Rouault
even.rouault at mines-paris.org
Mon Feb 9 19:46:41 EST 2009
Hi all,
This email follows a discussion started on GDAL IRC channel and some effort I
have began in my own tree to review and fix GDAL drivers for being more
robust when facing invalid data which can be corrupt files (random accidental
errors), or hostile files (after code analysis of flaws in source code, or
use of automated tool for that task).
This effort is a bit in the continuity of
http://trac.osgeo.org/gdal/wiki/rfc19_safememalloc
First disclaimer : this topic only tries to address GDAL drivers, as opposed
to OGR ones (but there would certainly similar issues with OGR drivers !) and
I focus mainly on the read part of GDAL drivers.
Second disclaimer : I'm not personnaly a specialist in software safety, just
trying to improve GDAL's state in that area.
First observation: we have currently 96 GDAL drivers advertized in
http://gdal.org/formats_list.html (and a 97th commited just a few hours ago),
representing about 368 000 lines of C/C++ code, so this would be a HUGE
effort to make an exhaustive review.
I'm mainly interested in fixing code used by GDALOpen() and that could blow up
when fed with invalid data. After successfully opening a dataset, the
IReadBlock() method is generally the second place to check for issues. But as
GDALOpen() is the first step the user needs to run to be able to decide if we
want to go on using the dataset, the Open() method of the driver is the top
priority place to look at.
From what I have seen, security related bugs can be classified as :
- segmentation faults = application crash.
- denial of service = code doesn't return to the caller in a reasonable amount
of time. Endless loops, or loops on a huge number of iterations, huge memory
allocations (ones that are comparable to the amount of RAM, so that they can
be satisfied but at the expense of massive disk swapping)....
- buffer overflows, in particular stack buffer overflows, that could be
exploited to make hostile code embedded in a dataset run as the user of the
process linking with GDAL. Those last ones are generally the most dangerous
ones.
Depending on how GDAL is used and deployed, this can have different
consequences.
My review of quite a few drivers has revealed that there were several common
mistakes that are made and which generally sum up to a lack of validation of
input data for a small set of critical items. They generally can be easily
avoided by simple tests. For example, the nRasterXSize field of the
GDALDataset object is a (signed) int. But it is invalid to set a null or
negative value into it. This is currently done in a minority of drivers. The
same applies for block sizes in GDALRasterBand objects too. The idea is too
check that the dimensions declared while reading the input data are valid,
and if not, return properly with an error value. (We could also imagine to
change the type of nRasterXSize to unsigned, but I expect this would be
considered a too disruptive API change --> probably hundreds of compiler
warnings to fix, and user code not ready to deal with 4 294 967 295 instead
of 2 147 483 647 ).
For uncompressed data, we could also check that the file size is consistant
with the expected file size (roughly header size + width * height * nbands *
GDALGetDataSize(eDataType) ). But, I haven't really encountered a driver that
would behave badly from the security point of view because of that. We would
get I/O errors which is fine I think. I also want to remind people that we
currently have tests in the autotest suite that use volontarily truncated
datasets to check that the header can be successfully decoded. This is
desirable to keep the test suite data small enough to be downloaded...
Another common issue, which is really the main point where I wanted to get
feedback at first, is dealing with the number of bands. Currently, when a
GDAL driver opens a dataset and has detected that it was valid, it loops over
the number of bands that must be created to instanciate them, like in :
for( int iBand = 1; iBand <= nBands; iBand++ )
{
poDS->SetBand( iBand, new FooRasterBand(poDS, iBand) );
}
and it returns to GDALOpen() only after that. So, what happens if nBands is
really really huge, like 2 147 483 647 ? It will spend a lot of time in
looping and will eventually crash in an unsucessfull memory allocation after
having exhausted the memory. Many drivers read the number of bands from a
int32, so that can happen.
To solve that particular case, I'd like to introduce a GDAL_MAX_BAND_COUNT
#define (defined in some GDAL core header file, gdal_priv.h for example) with
which we would compare the number of bands read from the file, and error out
if necessary. I'd suggest GDAL_MAX_BAND_COUNT = 65536. TIFF has a limit to
65535 for example, and after a bit of discussion on IRC, it seems to be a
reasonable limit even for hyperspectral data, where GDAL still behaves well.
Do people agree ? Or do you have real-world use cases where this would happen
to be a too low limit ? We can safely define it at 2147483647, and people
wanting to restrict it for their own needs could just change it and recompile
GDAL with the value they feel to be appropriate. Another option would be to
make that value a configuration option that could be changed at runtime.
Phil Vachon suggested to make the dataset dynamically allocate the Band
object. This would require changes to GDAL core, that should be designed as
additions, and not as requiring compulsory changes to existing drivers. It
would anyway require each driver wanting to be more secure to be changed to
use that new capability, and I think checking the number of bands is the
easiest way of achieving the desired result.
Other issues can be related to integer overflow in additions, multiplications,
etc . For example, for "nBlocksPerRow = (nRasterXSize+nBlockXSize-1) /
nBlockXSize;", after having checked that nRasterXSize > 0 and nBlockXSize >
0, we should also check that nRasterXSize <= INT_MAX - (nBlockXSize-1). I've
added recently that kind of checks in gdalrasterband.cpp, but that can be
found at the driver level too.
A difficult issue to solve is the case of those big memory allocations, big
enough to exhaust your physical RAM, but not all your virtual memory. In most
cases, those only occur in GDAL cache block when you first begin to read data
in a band, so it can be analysed at application level if it is reasonnable to
deal with a dataset whose blocksize is 100 000 x 100 000. But in a few
drivers, this allocation is also done in the constructor of the Band object.
This could probably be restructured by moving the buffer allocation at the
first call of IReadBlock()/IWriteBlock().
A complementary idea is that we could define, via a configuration option, an
optional maximum allocation size for a single allocation and make VSIMalloc
check first that we don't ask for more. For example, we want all single
memory allocations above 500 MB to fail. It is unlikely that a valid use of
GDAL would require to allocate 500 MB in a single time. For example, FrankW
lately wrote an alternate GIF driver to deal specifically with GIF files
whose uncompressed data size is above 100 MB, without allocating all that
memory.
(some people could object that similar mechanisms exists for that, like ulimit
on Linux, but AFAIK they extend to the whole process, not at the library
level)
Anyway, I think all substantial memory allocations should be done with
VSIMalloc/Calloc and the return value tested against NULL, instead of using
CPLMalloc/Calloc which just abort() in case of failure.
Well, I'm going to stop here this long and not exhaustive list of
recommandations. This could be material for a dedicated page on Trac.
FrankW suggested that we could add a metadata item at the driver level which
would indicate that the driver is considered to be safe (similarly to the
metadata item indicating that a driver makes use of the VSI*L safe
indicator). Quotting him : "So it would be easy for folks to only register
drivers that claim to be vetted. The reference documentation would provide
all appropriate disclaimers.".
Although I'm not opposed to it, I feel personnaly a bit shy about that
proposal and think the following points would have to be solved:
1) what would be the criteria to declare a driver to be vetted ? Check that
some precise to-be-defined rules have been followed ? Or just a general code
review ? by one or several people ? and/or a PSC decision ?
2) how do we handle cases where third-party libraries are used ? For example,
let's imagine that we declare the GDAL specific code of the GeoTIFF driver to
be safe. But as it makes uses of libgeotiff and libtiff, its safety also
depends on those two ones...
3) are we in a position to judge our own work ? doesn't it put too much
responsibility on our shoulders, even if we write all the appropriate
disclaimers ? Would people really read and understand the scope of what
we "guarantee" and what we don't.
I think many drivers can be easily fixed to be safer against random data
corruption, but being robust against a targeted attack is far more
complicated. In some cases, we can even foresee the path the attacker would
follow, but fixing the driver might be a lot of work, that the maintener of
the driver is not ready to do. And as all people in the GDAL developer
community aren't necessarily interested in that area, we cannot reasonnably
set strict rules on contributors.
I believe it could be interesting to have a Trac page with :
1) recommandations, common errors to avoid
2) a summary of what drivers have been reviewed, on which aspects they have
been reviewed, on which revision of the file, by whom.... (this is a bit
redundant with a careful analysis of subversion commit logs and diffs)
Then users interested could make their own enlightened choice of which driver
to compile / register based on those indications and their own review.
Would we also want to indicate unresolved issues ? As GDAL is open source, it
is already fairly easy for hostile people to craft malicious files and they
wouldn't probably learn much with those informations, except maybe some
script kiddies discovering a new playground...
Or should we just go on as we currently do, that is to say, fix things
incrementally without too much formalism and publicity ?
For those who have had managed to reach the end of this lengthy email, I'd be
happy to hear their feedback...
Best regards,
Even
More information about the gdal-dev
mailing list