[gdal-dev] memory leak in GRIB reader (with Python bindings)

Even Rouault even.rouault at mines-paris.org
Thu May 24 17:40:37 EDT 2012


Ok, see http://trac.osgeo.org/gdal/ticket/4682 for a fix. Basically, the 
current caching strategy is maintained (cache all bands that have been 
accessed), until a threshold is reached (arbitrarly set to 100 MB by default). 
When the threshold is reached, then we only cache one band at a time. That 
could be made smarter, but I think this is good enough for now.

> maybe -- but what is GDAL policy usually? It doesn't read the data
> until you ask for it, and I would have expected to keep copy myself if
> want to use it again.

I'm not a specialist of the GRIB API, but from what I see, it only returns the 
data for a full band, and not for partial reads. So for example, if you 
accessed one line at a time, and that GDAL didn't do any caching, it would 
mean that GDAL would have to decode the whole band each time. Pretty slow !

> 
> That would be great -- I can see caching a full band, so you could
> pull out pieces efficiently, but caching the entier thing seems like a
> bad idea. Even in the single bad case, I'd expect the user to pull the
> whole thing if s/he wanted that.

The previous strategy didn't cache all the bands, but each band once you have 
you read it once. Obviously, if you read all the bands, then at the end, it 
has cached all the bands ;-) The rationale behind this was that it was the 
best strategy to minimize the number of lines of codes in the GRIB driver ;-)


More information about the gdal-dev mailing list