[gdal-dev] Writing descriptions to GeoTiff bands

Even Rouault even.rouault at spatialys.com
Sun Sep 18 03:49:08 PDT 2016


Le dimanche 18 septembre 2016 11:02:20, Sean Gillies a écrit :
> Hi Andrew, Even,
> 
> On Sat, Sep 17, 2016 at 9:52 PM, Even Rouault <even.rouault at spatialys.com>
> 
> wrote:
> > Le vendredi 16 septembre 2016 23:11:07, Even Rouault a écrit :
> > > Le vendredi 16 septembre 2016 22:57:13, Andrew Bell a écrit :
> > > > Hi,
> > > > 
> > > > My code for creating a Tiff raster looks something like this:
> > > > 
> > > > int nBands = 5;
> > > > dataset->Create(filename, width, height, nBands, ...);
> > > > 
> > > > for (int i = 1; i <= nBands; ++i)
> > > > {
> > > > 
> > > >     GDALRasterBand *band = dataset->GetRasterBand(i);
> > > >     band->SetDescription(someString);
> > > >     band->WriteBlock(someData);
> > > > 
> > > > }
> > > > 
> > > > It appears that only the description to band 1 is written (it's the
> > 
> > only
> > 
> > > > one reported by gdalinfo).  A little debugging leads me to believe
> > > > that what's happening is that WriteBlock() invokes Crystalize() ->
> > > > WriteMetadata(), which takes care of setting the band description. 
> > > > But once Crystalize() is called, it sets a flag so as to be a NOOP
> > > > in
> > 
> > future
> > 
> > > > calls.  I'm not using streaming.
> > > > 
> > > > I'm trying to understand if this behavior is by design, a limitation
> > 
> > that
> > 
> > > > I can't find in the documentation or a bug.
> > > 
> > > It's a limitation due to how libtiff works mostly and/or how we use it
> > 
> > (but
> > 
> > > mostly how libtiff works, and a bit how the TIFF format itself makes it
> > > hard). Basically for GTiff, you need to do all operations that affect
> > > metadata, in a broad meaning, ie georeferencing, description, offsets,
> > > color table, TIFF & GDAL metadata, etc... before writing any imagery. 
> > > If we allowed to change metadata after crzystalization, this would
> > > require rewriting the whole set of TIFF tags at the end of file each
> > > time their serialized form increase.
> > > 
> > > So rewrite your loop into 2: one to set all descriptions, and another
> > > one to write blocks.
> > > 
> > > Other formats may have similar limitations, so it is generally safe to
> > > proceed this way in general.
> > 
> > Actually the above is partly true & wrong. It is indeed discouraged to
> > change
> > metadata after having started writting imagery, but in the case of the
> > band description, you can still do it. As I said this will cause the
> > TIFF directory
> > to be rewritten, so a bit of storage loss, but nevertheless the
> > descriptions
> > are then correctly retrieved.
> > 
> > I used the following Python test script
> > 
> > {{{
> > from osgeo import gdal
> > gdal.SetCacheMax(0) # to force Fill() to commit to file immediatly
> > ds = gdal.GetDriverByName('GTiff').Create('test.tif', 1000, 1000, 5)
> > 
> > for i in range(5):
> >     ds.GetRasterBand(i+1).SetDescription('foo%d' % i)
> >     ds.GetRasterBand(i+1).Fill(100)
> > 
> > }}}
> > 
> > Works on latest state of trunk , 2.1 and 2.0 branches
> 
> I am so grateful you asked this question, Andrew.

UPDATE: Andrew, after digging, I believe you have hit 
https://trac.osgeo.org/gdal/ticket/6592 whose fix hasn't yet reached any 
released version.

> 
> Even, two follow up questions, one concrete, one more abstract. Is
> "crystalized" a state of all raster datasets, no matter the driver,

No. First it is only relevant to drivers that have Create capabilities (I mean 
contrary to the ones that support CreateCopy() only). And  some of them do not 
have restrictions. For example, in the case of drivers that use a dedicated 
file for imagery and a (text) header for metadata, you can use the API without 
any particular constraints since all the metadata updates are stored in the 
state variables of the dataset and flushed to to the header file at dataset 
closing.

In the case of TIFF, you can (bugs aside) update metadata after imagery, with 
the inconvenient of the TIFF tags being rewritten at the end of the file. In 
fact TIFF cryztalisation could (probably?) be avoided if we were OK for the 
TIFF tags to be always written at the end of file, which would make the file not 
compatible of streaming for readers (the seek to end of file could be 
particularly costly if you do a gdalinfo on a huge TIFF in a zip). This 
concept of cryztalisation was added long time ago ( 
https://trac.osgeo.org/gdal/changeset/1977 ) and apparently, it was to avoid 
an early crystalization which occured at dataset creation, so you ended up 
always with duplicated tags.

With Erdas Imagine from what I can see you can update at any time and the 
metadata is written only once at the end of file at dataset closing.

In other drivers like MBTiles/GPKG, you cannot modify the geotransform once 
you have set it (would require to shift imagery within tiles and update tile 
coordinates. and for that reason, you do have to call it before being allowed 
to write any imagery), but you can still set user metadata at any time, so it 
is a partial crystalization.

> and is
> there a method of determining whether the dataset is crystalized? 

No. Apart from knowing that crystalization will occur after the first Fill() / 
WriteBlock() / RasterIO(GF_Write, ....) call

> Could
> this situation be made less complicated or be made more safe for developers
> by splitting the existing update access mode into update-metadata and
> update-imagery access modes?

I'm not sure if you are talking about the update access mode that you provide 
to GDALOpen() or the implicit update mode you get from calling Create()

In the Create() situation, do you mean that there would be a state in the 
dataset object that would first only allow update metadata and then, at the 
first imagery, allow only imagery update ? This is certainly a best practice, 
that utilities like gdalwarp apply. I can't think of a driver that would only 
accept setting one of this metadata after having finished writing the imagery. 
Wouldn't make sense.
But enforcing this best practice would potentially break existing working 
code. And that would require all drivers to be modified to implement this 
policy (not possible to implement in the core, at least for folks using the 
C++ API)

And regarding update-open situations, we have the gdal_edit.py utility whose 
purpose is to update metadata on a fully defined dataset. What is possible 
strongly depends on the capabilities of the driver and the underlying format. 
You can sometimes even update some metadata on drivers that support 
CreateCopy() only (e.g ECW)


-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list