[Gdal-dev] Re: GDAL raster block caching issues
Steve Soule
stsyo3lwdia4 at vexcel.com
Wed Aug 31 15:55:40 EDT 2005
Frank Warmerdam wrote:
> On 8/26/05, Steve Soule <Steve.Soule at vexcel.com> wrote:
>
>>Issue 1: Global vs. dataset LRUL
>>
>>Currently, the LRUL is global, that is, it contains blocks from all
>>open datasets. I think it would be better if each dataset had its
>>own LRUL (or possibly each raster band). This would have the following
>>advantages:
(What followed was a long discussion of global vs. dataset LRUL, in
which I gave five arguments why dataset was better, and Frank refuted them.)
Frank, you've convinced me that dataset-level LRUL would not give any
significant performance benefit, and my arguments two through five
aren't good arguments. But argument one (thread-safety) still stands.
> Back to your overall point, I am interested in offering per-dataset
> caching rather than global caching as an option (presumably
> non-default). Would you be interested in trying to implement this?
>
> My hope is that this could be mostlydone in gdalrasterblock.cpp, likely
> with a block list stored on the GDALDataset. Actually this may be
> a bit messy, since there are such things as free standing
> GDALRasterBand objects not associated with any GDALDataset. I'm
> not sure how you would want to address that issue.
>
> Ideally the policy could be set at runtime (checked via CPLGetConfigOption()).
> If you are interested in doing that, then go ahead, but please let me know
> when you commit the changes.
Implementing a dual LRUL mechanism where you could select global or
dataset LRUL at run-time would not address the thread-safety issue.
Since that's the one argument for dataset-level LRUL that I still
believe in, I'm not interested in implementing the
switch-between-global-and-dataset-LRUL approach.
However, I have a new idea along these lines that could fix the biggest
thread-unsafety problem with the current caching mechanism: the dirty
blocks. My idea is to make a raster-band-level LRUL just for dirty
blocks; dirty blocks would not be listed in the global LRUL. When a
dirty block was written out, it would be moved from the band-level LRUL
to the global LRUL. The default size limit on each band's LRUL would be
just big enough for one row's worth of blocks. For datasets that are
organized into rows, this band-level LRUL would be limited to just one
block by default.
The global cache use variable "nCacheUsed" would keep track of the sum
of the sizes of blocks in both the global and band LRULs. So this
mechanism would still have the same memory-use characteristics; dirty
blocks would just have a smaller limit. Applications that need to work
on gigantic datasets where a single row doesn't fit into memory can just
set the dirty block cache limit to zero blocks, effectively turning off
write caching.
If you like this idea, I can go ahead and implement it.
More information about the Gdal-dev
mailing list