[gdal-dev] Upcoming Cloud Optimized Geotiff (COG) related enhancements

Vincent Sarago vincent.sarago at gmail.com
Wed May 29 06:56:21 PDT 2019


Thanks for the hard work Even. 

> Le 29 mai 2019 à 09:48, Even Rouault <even.rouault at spatialys.com> a écrit :
> 
> Hi,
> 
> I've submitted a PR per https://github.com/OSGeo/gdal/pull/1600 which 
> implements the low-level work of below points 4) and 5). To get all benefits, 
> this requires GDAL to be built against internal libtiff or libtiff master 
> after https://gitlab.com/libtiff/libtiff/merge_requests/81 and https://
> gitlab.com/libtiff/libtiff/merge_requests/82 have been merged. Starting with 
> libtiff 4.0.11 in which this changes will appear, there won't be any specific 
> behaviour of building against internal libtiff (currently this is required to 
> avoid loading the whole tile indexes)
> 
> As I anticipated, reading a single tile from a COG, generated and read with 
> this PR, now requires just 3 GET range requests: one to get the header and 
> IFDs (without the tile array indices), one to get the offset of the tile and 
> its successor, and one to get the tile data. That for images with or without 
> transparency mask.
> 
> To describe the specific layout of those COG files, I've decided to include a 
> description of the features used at the beginning of the file, so that 
> optimized readers (like GDAL) can use them and take shortcuts. I've decided to 
> include them as ASCII strings "hidden" just after the 8 first bytes of a 
> ClassicTIFF (or after the 16 first ons for a BigTIFF). That is the first IFD 
> starts just after those strings. This is completely valid to have 'ghost' 
> areas like this in a TIFF file, and readers will normally skip over them. So 
> for a COG file with a transparency mask, those strings will be:
> GDAL_STRUCTURAL_METADATA_SIZE=000177 bytes\n
> LAYOUT=IFDS_BEFORE_DATA\n
> STRILE_ORDER=ROW_MAJOR\n
> STRILE_LEADER=SIZE_AS_UINT4\n
> STRILE_TRAILER=LAST_4_BYTES_REPEATED\n
> KNOWN_INCOMPATIBLE_EDITION=NO\n
> MASK_INTERLEAVED_WITH_IMAGERY=YES\n
> 
> For a COG without mask, the last item will not be present of course.
> 
> So it starts with GDAL_STRUCTURAL_METADATA_SIZE=XXXXXX bytes\n where XXXXXX 
> describes the size of this whole section (starting at the beginning of 
> GDAL_STRUCTURAL_METADATA_SIZE).
> 
> - LAYOUT=IFDS_BEFORE_DATA: the IFDs are located at the beginning of the file. 
> GDAL with this PR will also makes sure that the tile index arrays are written 
> just after the IFDs and before the imagery, so that a first range request of 
> 16 KB will always get all the IFDs
> 
> - STRILE_ORDER=ROW_MAJOR: (strile is a contraction of 'strip or tile') the 
> data for tiles is written in increasing tile id order. Future enhancements 
> could possibly implement other layouts, like Z_ORDER or HILBERT_CURVE
> 
> - STRILE_LEADER=SIZE_AS_UINT4: each tile data is preceded by 4 bytes, in a 
> 'ghost' area as well, indicating the real tile size (in little endian order). 
> TileOffset[i] points to the real tile data, that is, just after those 4 bytes. 
> An optimized reader seeing this metadata item will thus look for TileOffset[i] 
> and TileOffset[i+1] to deduce it must fetch the data starting at 
> offset=TileOffset[i] - 4 and of size=TileOffset[i+1]-TileOffset[i]+4. It then 
> checks the 4 first bytes to see if the size in this leader marker is 
> consistent with TileOffset[i+1]-TileOffset[i]. When there is no mask, they 
> should normally be equal (modulo the size taken by STRILE_LEADER and 
> STRILE_TRAILER). In the case where there is a mask and 
> MASK_INTERLEAVED_WITH_IMAGERY=YES, then the tile size indicated in the leader 
> will be < TileOffset[i+1]-TileOffset[i] since the data for the mask will 
> follow the imagery data (see MASK_INTERLEAVED_WITH_IMAGERY=YES)
> 
> - STRILE_TRAILER=LAST_4_BYTES_REPEATED: just after the tile data, the last 4 
> bytes of the tile data are repeated. This is a way if optimized readers to 
> check that TIFF writers not aware of those optimizations have modified the 
> TIFF file in a way that breaks the optimizations. If an optimized reader 
> detects an inconsistency, it then fallback to the regular/slow method if using 
> TileOffset[i] + TileByteCount[i]. I've hesitated about using something like a 
> CRC32, but checking for the last 4 bytes is probably sufficient with 
> compression schemes to detect if STRILE_LEADER is valid.
> 
> - KNOWN_INCOMPATIBLE_EDITION=NO: when a COG is generated this is always 
> written. If GDAL is then used to modify the COG file, as most of the changes 
> done on an existing COG file, will break the optimized structure, GDAL will 
> change this metadata item to KNOWN_INCOMPATIBLE_EDITION=YES, and issue a 
> warning on writing, and when reopening such file, so that users know they have 
> 'broken' their COG file
> 
> - MASK_INTERLEAVED_WITH_IMAGERY=YES: signals that mask data immediately 
> follows imagery data. So when reading data at offset=TileOffset[i] - 4 and 
> size=TileOffset[i+1]-TileOffset[i]+4, you'll get a buffer with:
> 	* leader with imagery tile size (4 bytes)
>   * imagery data (starting at TileOffset[i] and of size TileByteCount[i])
>   * trailer of imagery (4 bytes)
>   * leader with mask tilesize (4 bytes)
>   * mask data (starting at mask.TileOffset[i] and of size 
> mask.TileByteCount[i], but none of them actually need to be read)
>   * trailer of mask data (4 bytes)
> 
> One point I hesitated about was how to write those structural metadata. Other 
> possibilities would have been to include them in the GDAL_METADATA TIFF tag 
> which contains user visible metadata serialized as XML. But I prefered those 
> structural details to remain mostly hidden. There are really low level details 
> not describing the image content. A possibility based on GDAL_METADATA would 
> have been to use a dedicated metadata domain to hold them, but a utility that 
> would for example copy TIFF to TIFF could propagate the content of this tag 
> while not including the needed leader & trailer that would be unknown of it.
> One could also have introduced a new (or several) TIFF tag(s) to describe 
> them, but that would have require to register them and caused warnings to be 
> emitted when reading them with older GDAL versions. The only drawback I can 
> think of about the solution I implemented is that if using only libtiff API 
> those structural metadata cannot be discovered. Let me know if you believe 
> that this could cause issues and what other solution would be preferred.
> 
> One other thought I had is that we could actually potentially save the reading 
> of TileOffset[i] and TileOffset[i+1]. For uncompressed data, this would be 
> trivial since only TileOffset[0] is needed to deduce the location of other 
> tiles when STRILE_ORDER=ROW_MAJOR. But uncompressed data for COG must not be a 
> very common use case. For compressed data, we could imagine to take the 
> maximum compressed size of a tile, which would be written as a new metadata 
> item, and adding padding to smaller tiles to get to that size. So that would 
> require doing a first pass to compute that maximum tile size, and then the 
> real one to write the file, so basically a x2 slowdown on generation. For JPEG 
> compression, one could imagine to avoid most of that initial pass, by exmining 
> just a few random samples in the source raster to compute the mean compressed 
> tile size and its standard deviation, and use that to determine a maximum 
> compressed tile. It could happen that some tiles would still be larger, in 
> which case the writer would need to remove the higher frequencies of the tile 
> data until the compressed data fits in the maximum allowed size. Another 
> drawback is also potential significant lost space in the file if there a lot 
> of variations in the compression rate among tiles. The criticity of this 
> depends on how the file size vs the number of requests is seen as important 
> (depends on cloud storage fees and use patterns).
> And we could also potentially save the reading of the header (IFD description) 
> of the file is the calling code could provide it to the GTiff reader (cases 
> where it would always read the same COG and so could hardcode its header)
> Anyway, those ideas are perhaps unneeded overcomplications and are not part of 
> my current scheduled tasks, but I wanted to share them in case that seems of 
> interest to anyone.
> 
> Even
> 
>> 
>> 4) Optimizations specific to JPEG-compressed imagery (YCbCr color space)
>> with a 1-bit transparency channel, to minimize the number of HTTP range
>> requests needed to read them.
>> As JPEG compression cannot include the transparency information, two TIFF
>> IFD have to be created: one for YCbCr, and another one for alpha. Currently
>> the COPY_SRC_OVERVIEWS=YES creation option of the GeoTIFF driver separates
>> data for all the tiles of the color channels from data for all the tiles of
>> the transparency channel. In practice, readers will generally want to
>> access, for a same location, to data of both color and transparency
>> channels. I will modify the writer to interleave blocks so that color and
>> transparency information are contiguous. If COLOR_X_Y designates the tile
>> with color information at coordinates X,Y (in tile coordinate space), the
>> layout of data in the file will be: COLOR_0_0, TRANSPARENCY_0_0, COLOR_1_0,
>> TRANSPARENCY_1_0, etc. The GeoTIFF driver will be improved to fetch
>> together the color and transparency channel when such a layout is detected.
>> 
>> A further improvement is to be able to avoid completely to read the
>> TileByteCount array of the color channel, and the TileByteCount & TileOffset
>> arrays of the transparency channel. The trick is to reserve 4 bytes before
>> the start of each COLOR_X_Y tile to indicate its size (those bytes will be
>> 'ghost', that is not in the range of data pointed by
>> TileByCount&TileOffset). An optimized reader wanting to read tile
>> i=Y*nb_tiles_in_width+X will start by reading the offsets of tile i and
>> i+1: TileOffset_color[i] and
>> TileOffset_color[i+1]. It will then seek to TileOffset_color[i] – 4 and read
>> 4 + TileOffset_color[i+1] – TileOffset_color[i] bytes in a buffer. The
>> first 4 bytes of this buffer will indicate the number of bytes of the color
>> tile, and thus it is possible to deduce the offset and size of the mask
>> tile that is located at the end of the buffer. A TIFF metadata item will be
>> written to indicate that such layout has been used (with an indication of
>> the file size so as to be able to detect if the file has been later be
>> altered in a non- optimized way), so that optimized readers can adopt the
>> above described behavior. This will require to extend the libtiff interface
>> so that the user can directly provide the input buffer to decompress.
>> As the file will remain fully TIFF/BIGTIFF compliant, non-optimized readers
>> (such as newer GDAL builds against an older external libtiff version, or
>> previous GDAL versions) will still be able read it, loading values from the
>> 4 arrays instead of just one.
>> Note: for other compressions types, a simpler version of the above
>> optimization can still be done, by using TileOffset[i] and TileOffset[i+1],
>> and saving the read of TileByteCount[i]
>> To sum up, with the improvements of this task, once the initial loading of
>> metadata has been done, a GDAL ReadBlock(x,y) request will cause only two
>> networks range requests: one to read TileOffset[i] and TileOffset[i+1]
>> (potentially already cached if neighboring tiles have been previously
>> accessed in the same process), and another one to read the imagery (+mask)
>> data. Whereas currently, 6 might be needed for JPEG YcbCr+mask.
>> 
>> 5) Optimizing the layout of the header of a COG file
>> 
>> The current layout of the header part of COG file is:
>> - TIFF / BigTIFF signature, followed by the offset of the first IFD (Image
>> File Directory)
>> - IFD of full resolution image, that is the list of the tags and their value
>> when it consists of a single numeric value, followed by the offset of the
>> next - IFD. Its size is 2 + number_of_tags * 12 + 4 (or 2 + number_of_tags
>> * 20 + 8) bytes, so typically 200 bytes maximum
>> - Values of TIFF tags that don't fit inline in the IFD directory, such as
>> TileOffsets and TileByteCounts arrays and GeoTIFF keys
>> - IFD of first overview (typically subsampled by a factor of 2)
>> - Values of its tags that don't fit inline
>> - ...
>> -IFD of last overview
>> - Values of its tags that don't fit inline
>> 
>> When the COG file is not too large, the fact of having the TileOffsets and
>> TileByteCounts between IFD descriptors is not an issue since they are not
>> too large, and most TIFF readers will load their values when opening the
>> IFD. But for an optimized reader such as GDAL with internal libtiff support
>> (or with external libtiff after the optimization of task 4), loading the
>> values of the TileOffsets/TileByteCounts arrays is only needed when
>> accessing imagery.
>> 
>> A more efficient layout for network access is :
>> - TIFF / BigTIFF signature, followed by the offset of the first IFD
>> - IFD of full resolution image, followed by the value of its non-inline
>> tags, except  TileOffsets/TileByteCounts
>> - IFD of first overview followed by the value of its non-inline tags, except
>> - TileOffsets/TileByteCounts
>> - IFD of last overview followed by the value of its non-inline tags, except
>> TileOffsets/TileByteCounts
>> - Values of the TileOffsets/TileByteCounts arrays of IFD of full resolution
>> image
>> - Values of the TileOffsets/TileByteCounts arrays of IFD of first overview
>> - ...
>> - Values of the TileOffsets/TileByteCounts arrays of IFD of last overview
>> 
>> With such a structure, the initial reading of 16 KB at the start of the file
>> will be able to load the IFD descriptors of all overviews (and masks, which
>> are actually interleaved in between when present). So, combined together
>> with task 4, a cold read of a tile at any zoom level (ie opening the file +
>> tile request) could result in just 3 network range requests: one to get the
>> IFD descriptors at the start of the file, one to read the location of the
>> tile from the TileOffsets array and one to read the tile data.
>> The proposed structure itself is still fully TIFF compliant. The script that
>> validates the COG structure will be adapted to accept that new variant of
>> the header structure.
> 
> -- 
> Spatialys - Geospatial professional services
> http://www.spatialys.com
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev



More information about the gdal-dev mailing list