[gdal-dev] Bigtiff question

Frank Warmerdam warmerdam at pobox.com
Thu Mar 5 12:07:43 EST 2009


Lucena, Ivan wrote:
> Yes, that runs a lot of seek's to writes just few bytes here and there. 

Ivan,

I would note that for pixel interleaved data, access is still a whole
strip/tile at a time which in your case likely means a whole scanline.
In no case does GDAL's GTiff driver seek along to update individual
bytes in a pixel interleaved scanline or tile.

> I am wondering what the geotiff driver could do to improve that; keeping tiles in memory until they are filled up for 
> writing at once for example (?)

GDAL will cache the blocks on a band-by-band basis (at a level
where it doesn't realize the underlying datastore is pixel
interleaved).  The actual block flushing code in the GTIFF driver
does ensure that all the cache data for all bands is assembled
and written at once if available.  So if you had a big enough
block cache - or if you wrote all bands for a given scanline at
approximately the same time - then only one write to disk would
take place for each block.

But because you write "all of the first band", then all of the second
band and so on, you are basically triggering cache writes often and
preventing GDAL from doing things intelligently.

> BTW, would make any difference if tile the geotiff? In that case what would be the blockxsize, blockysize recommended for 320 bands interleaved by PIXEL?

I do not anticipate this would make much difference.  As noted,
the key factors affecting performance are block cache size, and
the order you write data.

>>  OK, it sounds like the pixels all being zero is a bug, and
>>  it would be good to file a ticket demonstrating this problem.
>>  Hopefully a somewhat minimalist example of the problem.
> 
> I think it would be very hard so send data samples so I would suggest running a script that creates fake raster bands 
> with all pixels as 1 on band 1, 2 on band 2, etc. Something like that perhaps:
> 
> --
>     driver_tif = gdal.GetDriverByName("GTIFF")
>     output_dst = driver_tif.Create( output_tif, x_size, y_size, serie_count, data_type,
>         [ 'TILED=NO', 'INTERLEAVE=PIXEL' ])
>     for i in range(320):
>         output_band = output_dst.GetRasterBand( 1 + 1 )
>         output_band.Fill(i + 1)
>         output_band.FlushCache()
> --

Well, please develop such a script, confirm it reproduces the
bug (ideally with a well known binary version of GDAL like the
OSGeo4W package) and then file the bug accordingly.  You might want
to check if it really needs a lot of bands to trigger the issue.

The position I *hate* to be in is doing a lot of guessing trying
to reproduce a bug.

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent



More information about the gdal-dev mailing list