[gdal-dev] Re: Problems with large raster sizes (WMS/TMS)

Fri Nov 13 10:32:54 EST 2009

Hi Even,

Yes, I thought it was a bit complicated issue. I would also support
replacing the array of pointers with a hashtable to eliminate the
unnecessary memory requirement. I've been looking at the code and
found a hashset implementation in CPL which could probably be utilized
here, however this one doesn't support to auto-grow and it should also
store all the items with the same hash value (along with the key) in a
linked list instead replacing each other. I don't think it would
significantly slow down the lookup process since only the list of the
same hash should be iterated.

The problem of the nRasterXSize limitation is more significant IMO,
however I think it would be enough to change the type from integer to
float or double and some type casts should be added in the related
expressions. But it would indeed be a significant change in gdal,
however I think we should start thinking about it in order to support
the (virtually) large raster dimensions.

Best regards,

Tamas

2009/11/13 Even Rouault <even.rouault at mines-paris.org>:
> Hi Tamas,
>
> 2 issues directly related to the big raster dimensions :
>
> * Currently GDAL allocates an array of pointer to raster blocks of size
> nBlocksPerRow x nBlocksPerColumn where nBlocksPerRow =
> (nRasterXSize+nBlockXSize-1) / nBlockXSize and nBlocksPerColumn =
> (nRasterYSize+nBlockYSize-1) / nBlockYSize.
> When TileLevel = 20, nBlocksPerRow = nBlocksPerColumn = 1 048 576. This is too
> big. Currently, we then try to use a second level of block array
> ("sub-blocking") as soon as nBlocksPerRow > SUBBLOCK_SIZE/2 (SUBBLOCK_SIZE=64).
> This leads to allocating nSubBlocksPerRow x nSubBlocksPerColumn where
> nSubBlocksPerRow = (nBlocksPerRow + SUBBLOCK_SIZE + 1)/SUBBLOCK_SIZE and
> nSubBlocksPerColumn = (nBlocksPerColumn + SUBBLOCK_SIZE + 1)/SUBBLOCK_SIZE.
> In that case, nSubBlocksPerRow = nSubBlocksPerColumn = 16384, and
> nSubBlocksPerRow x nSubBlocksPerColumn = 268 435 456 pointers, so 1 GB on a
> 32bit machine or 2 GB on a 64bit machine.
>
> --> So we would need a three-level block cache. This would be feasable
> (implementation "detail" of gdalrasterband.cpp).
>
> Or I'm thinking that we could completely remove the array and use instead a hash
> map that would map a block number to its raster block. Could cause a (small?)
> slowdown of course but would be simpler to implement and maintain than a 3 level
> array.
>
> * At TileLevel = 23, nRasterXSize = 2 147 483 648, which is ahem just one byte
> more than the largest signed int, so = -2147483648
> --> The proper solution would be to promote nRasterXSize/nRasterYSize to be of
> type GIntBig (64bit). We would also need to update the IReadBlock(),
> IWriteBlock(), IRasterIO() interfaces (and probably many others) to use GIntBig.
> So this is both a large API and ABI breakage, that IMHO looks more like a GDAL
> 2.0 project... An alternative would be to have both the existing 'int'
> variables/interfaces and introduce new GIntBIG variables/interfaces, and migrate
> GDAL internals (gcore/* alg/*) and the relevant drivers such as WMS to use the
> GIntBig version. Could easily turn to be very messy in the end...
>
> I cannot think of an easy workaround. An idea would be to split the WMS extent
> into 4 pieces to go under the 2 147 486 647 pixel limit, but this is not really
> doable without a small hack in the TMS minidriver as you would need to add an
> offset to the x and y tile number requested to the server.
>
> Best regards,
>
> Even
>