[gdal-dev] Intermittent memory-corruption crashes in GRIB2 import

Even Rouault even.rouault at spatialys.com
Tue Nov 16 15:27:22 PST 2021


Le 17/11/2021 à 00:21, Simon Eves a écrit :
> Hi Even,
>
> Thank you very much for the quick and plausible fix. It certainly 
> seems to resolve the issue on my machine. I'm sending a build to my 
> colleague to test it on his Threadripper, as he was the first one to 
> hit it.
>
> Looks like you have merged this into master and into the 3.4 release 
> branch. Can we assume, therefore, that it will be in 3.4.1, which 
> appears to be scheduled for the end of December?
your inference is correct :-)
>
> Simon
>
> On Tue, Nov 16, 2021 at 12:57 PM Simon Eves <simon.eves at omnisci.com 
> <mailto:simon.eves at omnisci.com>> wrote:
>
>     Hi Even,
>
>     Sorry for the slow response. I was out yesterday and this morning.
>     Testing your fix now.
>
>     Simon
>
>     On Mon, Nov 15, 2021 at 4:59 AM Even Rouault
>     <even.rouault at spatialys.com <mailto:even.rouault at spatialys.com>>
>     wrote:
>
>         Simon,
>
>         unfortunately there are a number of places in the degrib
>         library which aren't thread-safe, and you just spotted that
>         the errSprintf() routine was one of them. I've queued a fix
>         for that in https://github.com/OSGeo/gdal/pull/4830
>         <https://github.com/OSGeo/gdal/pull/4830>. Could you try it ?
>
>         Regarding gdal_translate performance, this is related to
>         something I mentioned recently (not sure to whom), but you'll
>         get much better performance if you add -co INTERLEAVE=BAND, so
>         that the output GeoTIFF is written band after band, to match
>         the most performant access pattern for reading GRIB files.
>
>         Even
>
>         Le 15/11/2021 à 01:32, Simon Eves a écrit :
>>         We have recently implemented a geo raster importer, and all
>>         seems fine, except that we hit an issue with a particular
>>         GRIB2 file from the NOAA website, where we get an
>>         inconsistent crash inside GDAL after a few hundred scanlines.
>>
>>         We have seen two different crashes inside GDAL, and a third
>>         in one of our code paths, but given that there is a memory
>>         corruption, the latter is perhaps unsurprising.
>>
>>         All crashes report "double free or corruption (fasttop)".
>>
>>         We are multi-threading the reading, but using a OGRDataSource
>>         per thread. The child threads are only
>>         calling GetRasterBand(), GetRasterDataType() and RasterIO()
>>         and only one one band at a time.
>>
>>         The GRIB2 file is 103MB with 73 Float64 bands, but only
>>         2345x1597 "pixels".
>>
>>         We tried converting the file to GeoTIFF with gdal_translate
>>         (no options, just in and out) and it took 28 minutes (on a
>>         ~2017 i7 quad 4.2), which is surprising, as we have other
>>         GRIB2 files (between 2 and 12MB) which convert "instantly".
>>         The resulting GeoTIFF is much bigger (21x) but seems to
>>         import reliably, has basically the same schema (as reported
>>         by gdalinfo) and results in the same data when imported into
>>         our system.
>>
>>         We only get the crash occasionally, and have only been able
>>         to trap it in the Debugger a couple of times, with nothing
>>         obviously wrong.
>>
>>         Here is a link to the GRIB2 file in question:
>>
>>         https://drive.google.com/file/d/12Fo6jnIhxzCvnSsup9n0kHVKy9lrHD2l/view?usp=sharing
>>         <https://drive.google.com/file/d/12Fo6jnIhxzCvnSsup9n0kHVKy9lrHD2l/view?usp=sharing>
>>
>>         Attached is the most common stack trace.
>>
>>         With a DEBUG build of GDAL, looks like it's crashing trying
>>         to do a realloc() on "buffer" which is NULL, although that is
>>         supposedly a copy of "&errBuffer" at the frame above which
>>         seems fine.
>>
>>         Gonna try robustifying that code and see what happens...
>>
>>         This is all with GDAL 3.2.2 on Ubuntu 20.04 LTS with GCC 9.
>>
>>         -- 
>>         <http://www.omnisci.com/>
>>         	
>>         Simon Eves
>>         Senior Graphics Engineer, Rendering Group
>>         100 Montgomery St (5th Floor), San Francisco, CA 94104, USA
>>
>>
>>         	
>>         Email: simon.eves at omnisci.com
>>         <mailto:simon.eves at omnisci.com> | Cell: 	+1 (415) 902-1996
>>
>>
>>
>>
>>         _______________________________________________
>>         gdal-dev mailing list
>>         gdal-dev at lists.osgeo.org  <mailto:gdal-dev at lists.osgeo.org>
>>         https://lists.osgeo.org/mailman/listinfo/gdal-dev  <https://lists.osgeo.org/mailman/listinfo/gdal-dev>
>
>         -- 
>         http://www.spatialys.com  <http://www.spatialys.com>
>         My software is free, but my time generally not.
>
>
>
>     -- 
>     <http://www.omnisci.com/>
>     	
>     Simon Eves
>     Senior Graphics Engineer, Rendering Group
>     100 Montgomery St (5th Floor), San Francisco, CA 94104, USA
>
>
>     	
>     Email: simon.eves at omnisci.com <mailto:simon.eves at omnisci.com> |
>     Cell: 	+1 (415) 902-1996
>
>
>
>
>
> -- 
> <http://www.omnisci.com/>
> 	
> Simon Eves
> Senior Graphics Engineer, Rendering Group
> 100 Montgomery St (5th Floor), San Francisco, CA 94104, USA
>
>
> 	
> Email: simon.eves at omnisci.com <mailto:simon.eves at omnisci.com> | Cell: 
> +1 (415) 902-1996
>
>
>
-- 
http://www.spatialys.com
My software is free, but my time generally not.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20211117/d84d6e54/attachment-0001.html>


More information about the gdal-dev mailing list