[gdal-dev] GDAL WKT Raster cache: Problem, options to solve and doubts

Fri Aug 14 11:06:01 EDT 2009

Hello,

2009/8/14 Frank Warmerdam <warmerdam at pobox.com>:
> 2009/8/14 Jorge Arévalo <jorge.arevalo at gmail.com>:
>> Problem: In basic GDAL WKT Raster driver, each row of a raster table
>> (one block, in regularly blocked rasters) means one server round. This
>> is slow, and "sub-optimal".
>
> Jorge,
>
> I would note that one round trip per block is not necessarily so terrible
> if the block size is reasonably large (128x128 or larger for instance).
>

Yes, but don't you think it could be a good idea trying to avoid extra
server rounds, in general?

>> How to solve it?: IReadBlock executes an spatial query to get all the
>> rows of a table that fits in a natural block, defined by nXOff, nYOff
>> and the values fetched by GetBlockSize. In many raster formats, one
>> natural block is a scanline, of size (nXRasterSize, 1). In WKT Raster
>> format, if we have a regularly blocked raster,  "natural" block size
>> will be equal to RASTER_COLUMNS-defined block size, and the query will
>> return one block. To avoid one spatial query for each block petition,
>> we should "force" the driver to get all the blocks covering the area
>> requested in a IRasterIO call.
>>
>> How to implement it? My approach is based on implementing
>> WKTRasterDataset::IRasterIO method (overriding GDALDataset::IRasterIO
>> method). This method executes a spatial query that returns all the
>> raster rows covering the area requested. Now, I have all the data of
>> an image region. If the requested region dimensions match the buffer
>> dimensions, I can copy all the pixels fetched in this way:
>>
>> pImage = {b1b1b1b1b1b1b1b1b1b1...|b2b2b2b2b2b2b2b2b2...|...|bnbnbnbnbnbnbnbnbn}
>>
>> Where bi are the bytes of the band i. A WKT Raster image has a
>> non-interleaved format, all the band are consecutive. Is it correct? I
>> mean, copy the data in pImage with this format.
>
> Some of the arguments to IRasterIO() are the values indicating
> how the imagery should be interleaved into the target buffer.
> You might want to check them and only implemented the direct
> copy if the interleaving matches what is convenient for you.

Ok, nPixelSpace and nLineSpace, right. Stupid me. -1

>
>> And if the region dimensions don't match the buffer dimensiones,
>> should I raise an error and finish or delegate in base
>> GDALDataset::IRasterIO implementation?
>
> Definately do not raise an error.  If for any reason it is not
> convenient to process a request in an optimized fashion then
> just call the underlying IRasterIO() method (on whatever your
> base class is, possible GDALDataset).

Ok, my code works in that way just now. Thanks

>
>> Another question: After copying the data from fetched rows in pImage
>> buffer, should I do anything more?
>
> You need to ensure it is in local machine byte order if it is not
> GDT_Byte.
>

Yes, I check this. I forgot to mention.

>> Other part of my implementation is overriding
>> GDALRasterBand::IRasterIO method. My method will simply call the
>> WKTRasterDataset::IRasterIO method with only one band to read. Is it
>> correct?
>
> This is acceptable I think.  My only concern is that there may be
> situations in which WKTRasterDataset::IRasterIO() will not
> process the request in an optimized fashion and will fallback to
> GDALDataset::IRasterIO() which in turn might call IRasterIO
> on the band again.  This may become obvious in use. :-)
>

Mmmm... I was studying the WCSDataset.cpp code and I thought that this
might happen:

- WCSDataset::IRasterIO calls GDALRasterBand::RasterIO
- GDALRasterBand::RasterIO calls WCSRasterBand::IRasterIO
- WCSRasterBand::IRasterIO calls WCSDataset::IRasterIO
....

But I'm sure I'm making a mistake, a misunderstanding, because WCS
driver is a solid, well tested driver. Anyway, my idea is only to call
IReadBlock as the "last resource". For this reason, I'd like to take
advantage of the GDAL Data Model with this simple algorithm:

WKTRasterDataset::IRasterIO(...)
{
   // Fetch all raster rows covered by the area requested

   If I have rows:
        // Get the data of these rows and copy it into the pImage
buffer (byte swapping if needed)
        // Anything more? return CE_None? add blocks to rasterband
cache? delegate in GDALDataset::RasterIO?
  else:
      // Delegate in GDALDataset::RasterIO? This will derive in
WKTRasterRasterBand::IReadBlock or not??
}

WKTRasterRasterBand::IRasterIO(...)
{
  poDS->IRasterIO(current band);
}

WKTRasterRasterBand::IReadBlock(...)
{
 // Fetch all raster rows that the block covers (in regularly blocked
rasters, 1 row (= 1 tile))

If only 1 block:
    // copy data in pImage buffer
    // return
else:
   // non-regularly blocked rasters. Raise an "under development"
error, just now.
}

But I don't know if this approach works as I want. IReadBlock works
fine, but I have problems with the rest of the system (IRasterIO
implementations), and I don't know what is the best approach on this
driver :-(

What's the normal way in which a program calls GDAL IO system. Is
usual to call RasterIO directly? maybe call IReadBlock directly? What
should a driver expect? Should a driver expect something as "usual"?

I suspect that depends of the driver format...

> I would note that the optimized IRasterIO() implementation is
> not really necessary for a successful project though it would
> certainly be a crowning achievement for the summer.
>

Ok. I've quite a lot of code related with this concept (cache,
IRasterIO...), but doesn't seem to works fine, for any reason. Maybe
needs more discussion and/or thinking. I could put it off...

Best regards,
Jorge

> Best regards,
> --
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | Geospatial Programmer for Rent
>