[gdal-dev] Call for discussion on RFC 26: GDAL Block Cache Improvements

Even Rouault even.rouault at spatialys.com
Thu Jun 4 07:05:00 PDT 2015


Hi,

I've updated an old RFC initiated by Tamas. The main idea, having a hashset 
based implementation as an alternative to the array based, remains. Changes 
consist mainly in code restructuration, perf improvements to reduce lock 
contention and porting to the state of the latest code base.

This is a RFC for GDAL 2.1

Details at https://trac.osgeo.org/gdal/wiki/rfc26_blockcache

== Summary and rationale ==

GDAL maintains an in-memory cache for the raster blocks fetched from the 
drivers and ensures that the second attempt to access the same block will be 
served from the cache instead of the driver. This cache is maintained in a 
per-band fashion and an array is allocated for the pointers for each blocks 
(or sub-blocks). This approach is not sufficient with large raster dimensions 
(or large virtual rasters ie. with the WMS/TMS driver), which may cause out of 
memory errors in GDALRasterBand::InitBlockInfo, as raised in #3224

For example, a band of a dataset at level 21 with a GoogleMaps tiling requires 
2097152x2097152 tiles of 256x256 pixels. This means that GDAL will try to 
allocate an array of 32768x32768 = 1 billion elements (32768 = 2097152 / 64). 
The size of this array is 4 GB on a 32-bit build, so it cannot be allocated at 
all. And it is 8 GB on a 64-bit build (even if this is generally only virtual 
memory reservation but not actually allocation of physical pages of memory, 
due to over-commit mechanism of the operating system). At dataset closing, 
this means that those 1 billion cells will have to be explored to discover 
remaining cached blocks. In reality, all above figures must be multiplied by 3 
for a RGB (or 4 for a RGBA) dataset.

In the hash set implementation, memory allocation depends directly on the 
number of cached blocks. Typically with the default GDAL_CACHEMAX size of 40 
MB, only 640 blocks of 256x256 pixels can be simultaneously cached (for all 
datasets).

Best regards,

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list