[Gdal-dev] using GDALDataset* in "point" mode and memory allocation

Tue Feb 3 14:03:29 EST 2004

Gregory, Matthew wrote:
> I am working on an app which runs a model at point, list and window modes, which gets determined at run-time.  I have
> a class which has a GDALDataset* as a private member and this dataset is opened at run initialization through
> GDALOpen, eg.
> 
> _varLayer = (GDALDataset *) GDALOpen( _varFileName.c_str(), GA_ReadOnly );
> 
> This class exists throughout the entire model run.  What I've noticed when running the point and list (set of points)
> runs is that I am ramping up memory pretty quickly through numerous calls to RasterIO, eg.
> 
> GDALRasterBand* tempBand = _varLayer->GetRasterBand( 1 ); tempBand->RasterIO(GF_Read, col, row, 1, 1, ptr, 1, 1,
> GDT_Float64, 0, 0);
> 
> I'm guessing that this is NOT the intended way to run RasterIO, ie. a pixel at a time.  It looks like each call to
> RasterIO is allocating memory within GetBlockRef() either for cache raster blocks or doing actual reads on the data
> (I'm a bit fuzzy on this).

Matt,

It shouldn't be a problem to access the data one pixel at a time, from
a memory point of view.  However, there will be significant CPU overhead
for each pixel request so I don't advise this approach.  Better to request in
some chunk size that is convenient for you if possible.  Whether that is whole
scanlines, or tiles or whatever.

As to the memory issues, I would guess that what you are seeing is the
tile cache filling up.  By default GDAL should be using a 5MB tile caching
which means you would see memory use grow sometimes when you call RasterIO()
till the tile cache is full.  At that point GDAL would discard the least
recently used tile when a new one is allocated.

I am not aware of reason you should see memory use going up within GDAL
itself unless there is a problem with a particular format driver.  Even the
cache with a limit at 5MB shouldn't be significant.   If you can provide a
simple GDAL sample application and a data file that demonstrates unreasonable
memory use (with GDAL from CVS) then pass it on and I will look into the
problem.

> Note that everything cleans up well when my class goes out of scope, but I'm a bit worried that users may run out of
> dynamic memory if they run a huge set of points.
> 
> 1.  Is there a better way to free memory after each point is run?  I'm imagining that means taking my GDALDataset*
> out of scope each time, which obviously wouldn't be beneficial in terms of speed.

It depends where the memory is.  You can call FlushCache() on a dataset or
rasterband, but if you are going to access things one pixel at a time you *really*
need the caching to prevent disk IO for each request.

> 2.  Is it more economical to front load a number of tiles at initialization, rather than potentially one at a time.

As noted above, it is better to do your reading in chunks.  The chunk size doesn't
need to be particularly large, and it can be a shape that is handy for you.

Best regards,

-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent