[gdal-dev] Bigtiff question

Even Rouault even.rouault at mines-paris.org
Thu Mar 5 13:45:59 EST 2009


Ivan,

for the very poor performance when dealing with pixel interleaved GTiffs with 
a large number of bands, I think you've hit ticket #2838 that has been fixed 
3 weeks ago in trunk and branches/1.6 (*). The performance issue was about 
*reading* in such files, but sometimes when you write and the block cache 
size is not big enough, you also end up reading back partially written 
scanlines. Hopefully this would correct the problem you observe.

But as Frank suggested, I would not recommand using pixel interleaved GTiffs 
with a large number of bands anyway. This is fine for RGB or RGBA datasets, 
but for more bands, it could cause cache trashing problems as we prefill the 
blocks for all bands in pixel-interleaved mode (#2838 was about the fact this 
technic wasn't correctly implemented)

However I'm not sure why you would get black files.

I've tested the following with trunk and it produces the expected result.

#!/usr/bin/python
import gdal

driver_tif = gdal.GetDriverByName("GTIFF")
output_dst = driver_tif.Create( 'test.tif', 20, 20, 320, gdal.GDT_Byte, 
[ 'TILED=NO', 'INTERLEAVE=PIXEL' ])
for i in range(320):
    output_band = output_dst.GetRasterBand( i + 1 )
    output_band.Fill(i + 1)
    output_band.FlushCache()
output_dst = None

Even

(*) :  Currently there's an outstanding issue in 1.6 branch related to the 
GTiff driver (#2820) and color table handling, so I wouldn't recommand you to 
grab it. Or maybe just the patch of r16298 in 1.6.0 release.

Le Thursday 05 March 2009 18:07:43 Frank Warmerdam, vous avez écrit :
> Lucena, Ivan wrote:
> > Yes, that runs a lot of seek's to writes just few bytes here and there.
>
> Ivan,
>
> I would note that for pixel interleaved data, access is still a whole
> strip/tile at a time which in your case likely means a whole scanline.
> In no case does GDAL's GTiff driver seek along to update individual
> bytes in a pixel interleaved scanline or tile.
>
> > I am wondering what the geotiff driver could do to improve that; keeping
> > tiles in memory until they are filled up for writing at once for example
> > (?)
>
> GDAL will cache the blocks on a band-by-band basis (at a level
> where it doesn't realize the underlying datastore is pixel
> interleaved).  The actual block flushing code in the GTIFF driver
> does ensure that all the cache data for all bands is assembled
> and written at once if available.  So if you had a big enough
> block cache - or if you wrote all bands for a given scanline at
> approximately the same time - then only one write to disk would
> take place for each block.
>
> But because you write "all of the first band", then all of the second
> band and so on, you are basically triggering cache writes often and
> preventing GDAL from doing things intelligently.
>
> > BTW, would make any difference if tile the geotiff? In that case what
> > would be the blockxsize, blockysize recommended for 320 bands interleaved
> > by PIXEL?
>
> I do not anticipate this would make much difference.  As noted,
> the key factors affecting performance are block cache size, and
> the order you write data.
>
> >>  OK, it sounds like the pixels all being zero is a bug, and
> >>  it would be good to file a ticket demonstrating this problem.
> >>  Hopefully a somewhat minimalist example of the problem.
> >
> > I think it would be very hard so send data samples so I would suggest
> > running a script that creates fake raster bands with all pixels as 1 on
> > band 1, 2 on band 2, etc. Something like that perhaps:
> >
> > --
> >     driver_tif = gdal.GetDriverByName("GTIFF")
> >     output_dst = driver_tif.Create( output_tif, x_size, y_size,
> > serie_count, data_type, [ 'TILED=NO', 'INTERLEAVE=PIXEL' ])
> >     for i in range(320):
> >         output_band = output_dst.GetRasterBand( 1 + 1 )
> >         output_band.Fill(i + 1)
> >         output_band.FlushCache()
> > --
>
> Well, please develop such a script, confirm it reproduces the
> bug (ideally with a well known binary version of GDAL like the
> OSGeo4W package) and then file the bug accordingly.  You might want
> to check if it really needs a lot of bands to trigger the issue.
>
> The position I *hate* to be in is doing a lot of guessing trying
> to reproduce a bug.
>
> Best regards,




More information about the gdal-dev mailing list