[Gdal-dev] Efficient BIP Access?

Frank Warmerdam warmerdam at pobox.com
Mon Apr 23 14:10:59 EDT 2007


Simon Perkins wrote:
> Hi,
> 
> I have a need to access large hyperspectral (hundred of bands) data 
> files efficiently. The files are stored in BIP format (nad interleaved 
> by pixel, i.e. "packed" pixel format), and hyperspectral processing is 
> typically carried out a pixel at a time using data from all bands.
> 
> The GDAL data access functions on the other hand are very band oriented, 
> and cache blocks are inherently 2D, and in the absence of tiling, 
> default to single rows of the image. So, if I want to access all bands 
> in a tile of an image, this will often blow GDAL's cache, since multiple 
> whole rows will be loaded for every band. It's also inherently 
> inefficient since hundreds of separate multiple random access reads have 
> to be made to the same portion of the file for each pixel.
> 
> Now, GDALDataset does define a RasterIO() method that allows 3D windows 
> to be retrieved, but the default implementation reverts to accessing the 
> data band by band. So my question is: which data formats implement this 
> method efficiently for BIP images? Second question: do these 
> implementations use GDALs cache?

Simon,

Good questions!

Generally speaking when a format overrides GDALDataset::RasterIO() in order
to provide efficient pixel interleaved access, it ends up skipping the cache
though that is not strictly necessary.

On inspection of gdal/frmts/raw/rawdataset.cpp it appears that all the "raw
raster" based drivers have no special support for handling BIP rasters
efficiently, and could benefit from very careful improvements.

There are formats like JPEG that have special logic to "push" extra bands
into the cache when a one band is read.  So if you read an RGB JPEG file
one scanline via IReadBlock() (directly or indirectly) it will only decode
the image once when reading the first band, at which point it will force
the data into the cache for the second and third bands.

Other formats, like MrSID and JP2KAK override IRasterIO() on the dataset,
and implement reading all requested bands in one pass, but at the cost of
avoiding the block cache.

For pixel interleaved data, the TIFF driver keeps the last tile or strip
around and can satisfy requests for other bands very efficiently.  But this
only works if you request all the bands of a given block before moving on
to the next.  It might make sense to apply the same "push into global cache"
logic in the TIFF driver as is currently used in the JPEG driver.

I think the most important step we could take on more efficient hyperspectral
data access would be to work on the "raw" infrastructure to handle pixel
interleaved data efficiently.  Possibly using the "push into cache" approach
of the jpeg driver.

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | President OSGeo, http://osgeo.org




More information about the Gdal-dev mailing list