[Gdal-dev] Re: GDAL raster block caching issues

Wed Aug 31 15:55:40 EDT 2005

Frank Warmerdam wrote:
> On 8/26/05, Steve Soule <Steve.Soule at vexcel.com> wrote:
> 
>>Issue 1:  Global vs. dataset LRUL
>>
>>Currently, the LRUL is global, that is, it contains blocks from all
>>open datasets.  I think it would be better if each dataset had its
>>own LRUL (or possibly each raster band).  This would have the following
>>advantages:

(What followed was a long discussion of global vs. dataset LRUL, in 
which I gave five arguments why dataset was better, and Frank refuted them.)

Frank, you've convinced me that dataset-level LRUL would not give any 
significant performance benefit, and my arguments two through five 
aren't good arguments.  But argument one (thread-safety) still stands.

> Back to your overall point, I am interested in offering per-dataset
> caching rather than global caching as an option (presumably
> non-default).  Would you be interested in trying to implement this?
> 
> My hope is that this could be mostlydone in gdalrasterblock.cpp, likely
> with a block list stored on the GDALDataset.  Actually this may be
> a bit messy, since there are such things as free standing 
> GDALRasterBand objects not associated with any GDALDataset.  I'm 
> not sure how you would want to address that issue.  
> 
> Ideally the policy could be set at runtime (checked via CPLGetConfigOption()).
> If you are interested in doing that, then go ahead, but please let me know
> when you commit the changes. 

Implementing a dual LRUL mechanism where you could select global or 
dataset LRUL at run-time would not address the thread-safety issue. 
Since that's the one argument for dataset-level LRUL that I still 
believe in, I'm not interested in implementing the 
switch-between-global-and-dataset-LRUL approach.

However, I have a new idea along these lines that could fix the biggest 
thread-unsafety problem with the current caching mechanism:  the dirty 
blocks.  My idea is to make a raster-band-level LRUL just for dirty 
blocks; dirty blocks would not be listed in the global LRUL.  When a 
dirty block was written out, it would be moved from the band-level LRUL 
to the global LRUL.  The default size limit on each band's LRUL would be 
just big enough for one row's worth of blocks.  For datasets that are 
organized into rows, this band-level LRUL would be limited to just one 
block by default.

The global cache use variable "nCacheUsed" would keep track of the sum 
of the sizes of blocks in both the global and band LRULs.  So this 
mechanism would still have the same memory-use characteristics; dirty 
blocks would just have a smaller limit.  Applications that need to work 
on gigantic datasets where a single row doesn't fit into memory can just 
set the dirty block cache limit to zero blocks, effectively turning off 
write caching.

If you like this idea, I can go ahead and implement it.