[gdal-dev] GDAL WKT Raster cache: Problem, options to solve and doubts

Jorge Arévalo jorge.arevalo at gmail.com
Fri Aug 14 11:30:10 EDT 2009


Hello,

2009/8/14 Mateusz Loskot <mateusz at loskot.net>:
> Jorge Arévalo wrote:
>> Hello,
>>
>> I've asked several concepts related with RasterIO-related methods in
>> Dataset and RasterBand. Thanks to your responses, I've a better
>> understanding of the GDAL drivers' I/O method. But I've a couple of
>> doubts I need to solve to finish the GSoC, although I'd like to
>> continue developing the driver after it.
>>
>> Problem: In basic GDAL WKT Raster driver, each row of a raster table
>> (one block, in regularly blocked rasters) means one server round. This
>> is slow, and "sub-optimal".
>
> Jorge,
>
> What you mean as one server round? Is it 1 SQL query per tile?

Yes, 1 sql select per IReadBlock call (1 tile in regularly blocked rasters).


>
>> How to solve it?: IReadBlock executes an spatial query to get all the
>> rows of a table that fits in a natural block, defined by nXOff, nYOff
>> and the values fetched by GetBlockSize. In many raster formats, one
>> natural block is a scanline, of size (nXRasterSize, 1).
>
> Yes, I found the natural block (=scalnline for many datasets)
> as inconvenient for querying regular blocking tables.
> Instead, I'd recommend to use tile (block) size as size of
> natural block, so you always query and process
> whole tile or number of tiles.
>

Yes, I use block size read from RASTER_COLUMNS to create the
RasterBands, as block size.


>> In WKT Raster
>> format, if we have a regularly blocked raster,  "natural" block size
>> will be equal to RASTER_COLUMNS-defined block size, and the query will
>> return one block.
>
> A! Perfect. This is what I've mentioned above.
> By the way, my thinking of WKT Raster in general may be a bit biased
> because I'm personally focused on regular blocking only.

Just like me... I've detected the points of the code where the
situation "non-regular blocking rasters" takes place, and, normally, I
raise an "Under development" error. This will be the bigger todo task
in this driver, I think.


>
>> To avoid one spatial query for each block petition,
>> we should "force" the driver to get all the blocks covering the area
>> requested in a IRasterIO call.
>
> Yes, this is the crux.
>
>> How to implement it? My approach is based on implementing
>> WKTRasterDataset::IRasterIO method (overriding GDALDataset::IRasterIO
>> method). This method executes a spatial query that returns all the
>> raster rows covering the area requested. Now, I have all the data of
>> an image region. If the requested region dimensions match the buffer
>> dimensions, I can copy all the pixels fetched in this way:
>>
>> pImage = {b1b1b1b1b1b1b1b1b1b1...|b2b2b2b2b2b2b2b2b2...|...|bnbnbnbnbnbnbnbnbn}
>>
>> Where bi are the bytes of the band i. A WKT Raster image has a
>> non-interleaved format, all the band are consecutive. Is it correct? I
>> mean, copy the data in pImage with this format.
>>
>> And if the region dimensions don't match the buffer dimensiones,
>> should I raise an error and finish or delegate in base
>> GDALDataset::IRasterIO implementation?
>
> What it means "region dimensions don't match the buffer"?
>
> I'd imagine something like this:
>
> 1. Query tiles that match the requested region (window) of raster
> coverage (table).
> 2. Fetch
> 3. Merge tiles into in-memory or on-disk file.


Yes, that's the point. With "region dimensions don't match the buffer"
I mean that nXSize, nYSize are different from nBufXSize, nBufYSize.
And maybe eBufType is different from band data type (for each band.
I'm talking about WKTRasterDataset::RasterIO method).

And what should be the criterion to choose between in-memory or
on-disk storage? If the memory buffer is bigger enough to store the
fetched data... it should be enough.

>
> The merge step could be based on RasterIO calls following this approach
>
> http://trac.osgeo.org/gdal/browser/trunk/gdal/swig/python/scripts/gdal_merge.py
>
> Also, the step 3 should be constrained by max limit of data pushed into
> memory, and fail if let's say a user queries 12000 tiles :-)
> and there is no overview level available that could be returned
> instead of such a huge amount of data.
> Perhaps it could be configurable.

Thanks! I've used gdal_merge, but I didn't study the code. I'm going
to study it just now :-)

>
>> Other part of my implementation is overriding
>> GDALRasterBand::IRasterIO method. My method will simply call the
>> WKTRasterDataset::IRasterIO method with only one band to read. Is it
>> correct?
>
> Looks OK.

Thanks for confirmation.


Best regards,
Jorge


>
> Best regards,
> --
> Mateusz Loskot, http://mateusz.loskot.net
> Charter Member of OSGeo, http://osgeo.org
>


More information about the gdal-dev mailing list