[Gdal-dev] A questions about IRasterIO and overlays

Tue Dec 9 13:19:56 EST 2003

James Gallagher wrote:
> I'm improving a GDAL driver for the DODS/OPeNDAP protocol. The driver
> can read rasters from DODS/OPeNDAP servers. One problem with the current
> implementation is that it always reads the entire raster. 
> 
> Before I dive into IRasterIO() I'd like to make sure providing my own
> implementation of that method is really the best way to go. OPeNDAP
> provides ways to read parts of rasters using the indexes combined with
> an optional 'stride' value (which allows for sub-sampling).
> 
> I was also wondering about overlays and the comment that a driver that
> provides its own implementation of IRasterIO() would normally implement
> overlays. What exactly are 'overlays?'
> 
> As you can probably tell, I'm not that familiar with GIS software so my
> apologies if these are really basic questions.

James,

I'm not sure where you read overlays.  I suspect you are referring to
overviews, is that correct?  An overview is a reduced resolution image used
for faster reads when only a reduced resolution image is needed.

You can use stride values to simulate overviews.  So, I think it would
be wise to return synthetic power of 2 overiews for your dataset.  You might
want to skim the MrSID example for how this can be done.

However, the deeper problem is how to handle IO requests. Right now I gather
you read the whole remote dataset on open, and pull it into local memory?
For modest sized datasets that are read by applications a bit at a time this
is a very good approach.  I would presume there is a non-trivial overhead
(network latency, etc) to each OPeNDAP request?

If network latency and other request overhead was zero,the idea approach would
be to satisfy each IRasterIO() request with an OPeNDAP request.  This would
minimize the amount of memory used in the driver, and ensure only required
data would be read.  However, there is latency, and for some sorts of requests,
such as the common case of the application asking for the entire dataset, but
one scanline at a time, the latency would likely kill you.

On the other hand if you are working with a big dataset, and a client like
MapServer that might only want a small area or want the dataset at a much
reduced resolution, then reading the whole thing will kill you.

What I have done in some drivers is have two basic cases:

  1) Attempt to satisfy each RasterIO() request via a lower level request.

  2) Treat the dataset as "scanlined organized" for blocksize/caching
     purposes, but read-ahead lots of scanlines in response to any request
     that isn't satisfied out of the cache.  Implement this logic in the
     IReadBlock() method.

then I basically check the nature of the request in the RasterIO() to
decide how to handle it.  If the request is fairly large then I stick
with case (1) otherwise I call the GDALRasterBand::IRasterIO() which will
eventually call IReadBlock() taking care of any required resampling.

I hope this helps, though I am proposing some complications. :-)

Best regards,

-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent