[Gdal-dev] Implementing GDALRasterBand::IRasterIO

Wed Jan 14 16:53:52 EST 2004

On Wed, 2004-01-14 at 12:01, Frank Warmerdam wrote:
> James Gallagher wrote:
> > Hi,
> > 
> > I would like to add sub-sampling capabilities to the OPeNDAP/GDAL
> > driver. To do this I plan on specializing GDALRasterBand::IRasterIO(). 
> > 
> > First question: is that method the correct place to implement
> > sub-sampling?
> 
> James,
> 
> It depends a bit what you mean by sub-sampling. If you want a client
> application to be able to request a sub-area at full resolution and
> the driver would be able to fetch just the subarea

That's part of what I mean (the most important part). I can read reduced
resolution rasters over the net, but if a caller asks for the same
raster at higher resolution I'll have to read the whole thing unless I
write more code.

> then overriding
> IRasterIO() would be one approach.  Another would be to pretend that
> the data is tiled, and just implement the IBlockRead() method.

Right now my implementation of IReadBlock() can only read the whole
raster. That is, in Open() I set the block size to the raster size.
There might be a better size like 1024^2 blocks, but it's so dependent
on the connection's bandwidth that you'd really want to calculate on a
per connection basis. I think that's a bit much for the first version...

> If you want to offer an efficient way to accessed reduced resolution
> images, then you could produce pseudo-overview layers.
> 
> However, given that for the OPeNDAP driver you want alot of control
> over how many individual requests actually go to the remote server,
> I would say overriding IRasterIO() makes sense.

OK. That's what I'll do first.

> Be aware that:
>   o applications normally call RasterIO(), so you will now very rarely
>     get block accesses - however, it can still happen.  For instance, I
>     think the min/max computation goes through the block API.

OK. I also noticed that GDAL's caching uses it too.

>   o It is very common for applications to make many small RasterIO()
>     calls, often for one scanline at a time.  I often try to recognise
>     the scanline request case, and pre-read a bunch of scanlines at once
>     and cache them.

Where would I look to find out more about GDAL's data caching system?

> > Second question: I noticed that the OGDI driver implemented only the
> > GDALRasterBand::IRasterIO method while the ECW driver implemented it's
> > own version of both that and GDALDataset:IRasterIO. What are the
> > implications of specializing the second method?
> 
> You would implement a custom GDALDataset::IRasterIO() if you want a more
> efficient access to multi-band datasets as a single request.  For instance,
> ECW implements GDALDataset::IRasterIO() because it is much cheaper to pass
> one request on to the ECW API for all the bands of an image at once,
> instead of requesting them one at a time.

OK. Sounds like I should look into this a bit once I get the basic
access (IRasterIO) working.

> By the way, one of the applications that we want the GDAL/OPeNDAP driver to
> be good with is MapServer.  MapServer currently always makes one big
> RasterIO() request for whatever it needs for each band being read.  Eventually
> I hope to change this to utilize the GDALDataset::RasterIO() entry point for
> greater efficiency where that is specialized.  So, overriding
> GDALRasterBand::IRasterIO() or GDALDataset::IRasterIO() will give a big win
> for MapServer.

Sounds great.

> In fact, you might consider just overriding the GDALDataaset::IRasterIO(), and
> doing an implementation of GDALRasterBand::IRasterIO() that calls the dataset
> level one with a single band requested.  That would set you up optimally for
> future improvements in MapServer.

OK. I think I see how this would be done. DODSDataset::IRasterIO() would
be in charge of actually reading the data (maybe using GDAL's caching
sub-system) and DODSRasterBand::IRasterIO() would make calls to it. In
the default implementation it's the other way around (Dataset call
RasterBand).

> Applications like OpenEV are going to make lots of "tile by tile" RasterIO()
> requests to GDAL.  This would presumably turn into a remote request for each
> tile which is sensible, but will add a real latency drag into OpenEV renders.
> 
> Batch applications are usually scanline based, via RasterIO(), and as mentioned
> before it would a wise idea to recognise this case, and force some sort of
> chunking.
> 
> One final note, when you implement IRasterIO() you avoid data going through
> the GDAL cache.  That is good if you are effectively caching things yourself
> somehow, but can completely hammer you in some situations if you are not.

Well the cache I implemented is a basic HTTP/1.1 cache. It doesn't yet
know about the stuff inside a data object. So it's not a very good cache
for data. I think the GDAL cache is going to be important.

Thanks for the info,

James
> 
> Best regards,
-- 
__________________________________________________________________________
James Gallagher		         The Distributed Oceanographic Data System
jgallagher at gso.uri.edu               http://unidata.ucar.edu/packages/dods
Voice/Fax: 406.723.8663