[gdal-dev] GDAL WKT Raster cache: Problem, options to solve and doubts

Fri Aug 14 10:28:48 EDT 2009

Jorge Arévalo wrote:
> Hello,
> 
> I've asked several concepts related with RasterIO-related methods in
> Dataset and RasterBand. Thanks to your responses, I've a better
> understanding of the GDAL drivers' I/O method. But I've a couple of
> doubts I need to solve to finish the GSoC, although I'd like to
> continue developing the driver after it.
> 
> Problem: In basic GDAL WKT Raster driver, each row of a raster table
> (one block, in regularly blocked rasters) means one server round. This
> is slow, and "sub-optimal".

Jorge,

What you mean as one server round? Is it 1 SQL query per tile?

> How to solve it?: IReadBlock executes an spatial query to get all the
> rows of a table that fits in a natural block, defined by nXOff, nYOff
> and the values fetched by GetBlockSize. In many raster formats, one
> natural block is a scanline, of size (nXRasterSize, 1).

Yes, I found the natural block (=scalnline for many datasets)
as inconvenient for querying regular blocking tables.
Instead, I'd recommend to use tile (block) size as size of
natural block, so you always query and process
whole tile or number of tiles.

> In WKT Raster 
> format, if we have a regularly blocked raster,  "natural" block size
> will be equal to RASTER_COLUMNS-defined block size, and the query will
> return one block.

A! Perfect. This is what I've mentioned above.
By the way, my thinking of WKT Raster in general may be a bit biased
because I'm personally focused on regular blocking only.

> To avoid one spatial query for each block petition,
> we should "force" the driver to get all the blocks covering the area
> requested in a IRasterIO call.

Yes, this is the crux.

> How to implement it? My approach is based on implementing
> WKTRasterDataset::IRasterIO method (overriding GDALDataset::IRasterIO
> method). This method executes a spatial query that returns all the
> raster rows covering the area requested. Now, I have all the data of
> an image region. If the requested region dimensions match the buffer
> dimensions, I can copy all the pixels fetched in this way:
> 
> pImage = {b1b1b1b1b1b1b1b1b1b1...|b2b2b2b2b2b2b2b2b2...|...|bnbnbnbnbnbnbnbnbn}
> 
> Where bi are the bytes of the band i. A WKT Raster image has a
> non-interleaved format, all the band are consecutive. Is it correct? I
> mean, copy the data in pImage with this format.
> 
> And if the region dimensions don't match the buffer dimensiones,
> should I raise an error and finish or delegate in base
> GDALDataset::IRasterIO implementation?

What it means "region dimensions don't match the buffer"?

I'd imagine something like this:

1. Query tiles that match the requested region (window) of raster
coverage (table).
2. Fetch
3. Merge tiles into in-memory or on-disk file.

The merge step could be based on RasterIO calls following this approach

http://trac.osgeo.org/gdal/browser/trunk/gdal/swig/python/scripts/gdal_merge.py

Also, the step 3 should be constrained by max limit of data pushed into
memory, and fail if let's say a user queries 12000 tiles :-)
and there is no overview level available that could be returned
instead of such a huge amount of data.
Perhaps it could be configurable.

> Other part of my implementation is overriding
> GDALRasterBand::IRasterIO method. My method will simply call the
> WKTRasterDataset::IRasterIO method with only one band to read. Is it
> correct?

Looks OK.

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net
Charter Member of OSGeo, http://osgeo.org