[gdal-dev] Upcoming Cloud Optimized Geotiff (COG) related enhancements

Fri May 3 01:04:36 PDT 2019

Hi,

I wanted to mention COG related enhancements (*) that I will work on in GDAL 
in the coming weeks, so interested parties are aware of them and can 
potentially react.

1) Creation of a dedicated COG creation-only driver simplifying the creation 
workflow. Currently, creating a COG involves a number of steps, using gdaladdo 
and gdal_translate with the right arguments. For very large COG files, 
invoking gdaladdo in an efficient way can be tricky (.ovr.ovr trick: https://
github.com/OSGeo/gdal/issues/1442). The driver will take care of creating 
needed temporary overviews.

2) The driver will offer integrated reprojection capabilities, and in 
particular a WebMercator/GoogleMapsCompatible tiling scheme profile (as 
defined in WMTS), so that TIFF tiles exactly match GoogleMapsCompatible ones. 
This will be similar to the corresponding option of GeoPackage. With a 
subtelty that due to how GeoTIFF overviews work, it is not possible to have 
this alignment on the tiling scheme for all zoom levels. So the user will 
define how many zoom levels starting from the full resolution image must be 
aligned (if N is the number of aligned levels, up to 2^N padding tiles in 
horizontal and vertical dimensions are needed for the full resolution image, 
so N should be kept reasonably small)

3) gdalwarp will be enhanced to allow output to drivers that have only 
CreateCopy() capabilities such as the COG driver. It will try to avoid 
materializing the intermediate file when possible by using VRT capabilities, 
otherwise it will have to create a temporary TIFF file before creating 
CreateCopy()

4) Optimizations specific to JPEG-compressed imagery (YCbCr color space) with 
a 1-bit transparency channel, to minimize the number of HTTP range requests 
needed to read them.
As JPEG compression cannot include the transparency information, two TIFF IFD 
have to be created: one for YCbCr, and another one for alpha. Currently the 
COPY_SRC_OVERVIEWS=YES creation option of the GeoTIFF driver separates data 
for all the tiles of the color channels from data for all the tiles of the 
transparency channel. In practice, readers will generally want to access, for 
a same location, to data of both color and transparency channels. I will 
modify the writer to interleave blocks so that color and transparency 
information are contiguous. If COLOR_X_Y designates the tile with color 
information at coordinates X,Y (in tile coordinate space), the layout of data 
in the file will be: COLOR_0_0, TRANSPARENCY_0_0, COLOR_1_0, TRANSPARENCY_1_0, 
etc. The GeoTIFF driver will be improved to fetch together the color and 
transparency channel when such a layout is detected.

A further improvement is to be able to avoid completely to read the 
TileByteCount array of the color channel, and the TileByteCount & TileOffset 
arrays of the transparency channel. The trick is to reserve 4 bytes before the 
start of each COLOR_X_Y tile to indicate its size (those bytes will be 
'ghost', that is not in the range of data pointed by TileByCount&TileOffset). 
An optimized reader wanting to read tile i=Y*nb_tiles_in_width+X will start by 
reading the offsets of tile i and i+1: TileOffset_color[i] and 
TileOffset_color[i+1]. It will then seek to TileOffset_color[i] – 4 and read 4 
+ TileOffset_color[i+1] – TileOffset_color[i] bytes in a buffer. The first 4 
bytes of this buffer will indicate the number of bytes of the color tile, and 
thus it is possible to deduce the offset and size of the mask tile that is 
located at the end of the buffer. A TIFF metadata item will be written to 
indicate that such layout has been used (with an indication of the file size 
so as to be able to detect if the file has been later be altered in a non-
optimized way), so that optimized readers can adopt the above described 
behavior. This will require to extend the libtiff interface so that the user 
can directly provide the input buffer to decompress.
As the file will remain fully TIFF/BIGTIFF compliant, non-optimized readers 
(such as newer GDAL builds against an older external libtiff version, or 
previous GDAL versions) will still be able read it, loading values from the 4 
arrays instead of just one. 
Note: for other compressions types, a simpler version of the above 
optimization can still be done, by using TileOffset[i] and TileOffset[i+1], 
and saving the read of TileByteCount[i]
To sum up, with the improvements of this task, once the initial loading of 
metadata has been done, a GDAL ReadBlock(x,y) request will cause only two 
networks range requests: one to read TileOffset[i] and TileOffset[i+1] 
(potentially already cached if neighboring tiles have been previously accessed 
in the same process), and another one to read the imagery (+mask) data. 
Whereas currently, 6 might be needed for JPEG YcbCr+mask.

5) Optimizing the layout of the header of a COG file

The current layout of the header part of COG file is:
- TIFF / BigTIFF signature, followed by the offset of the first IFD (Image 
File Directory)
- IFD of full resolution image, that is the list of the tags and their value 
when it consists of a single numeric value, followed by the offset of the next 
- IFD. Its size is 2 + number_of_tags * 12 + 4 (or 2 + number_of_tags * 20 + 
8) bytes, so typically 200 bytes maximum
- Values of TIFF tags that don't fit inline in the IFD directory, such as 
TileOffsets and TileByteCounts arrays and GeoTIFF keys 
- IFD of first overview (typically subsampled by a factor of 2)
- Values of its tags that don't fit inline 
- ...
-IFD of last overview
- Values of its tags that don't fit inline 

When the COG file is not too large, the fact of having the TileOffsets and 
TileByteCounts between IFD descriptors is not an issue since they are not too 
large, and most TIFF readers will load their values when opening the IFD. But 
for an optimized reader such as GDAL with internal libtiff support (or with 
external libtiff after the optimization of task 4), loading the values of the 
TileOffsets/TileByteCounts arrays is only needed when accessing imagery.

A more efficient layout for network access is :
- TIFF / BigTIFF signature, followed by the offset of the first IFD
- IFD of full resolution image, followed by the value of its non-inline tags, 
except  TileOffsets/TileByteCounts
- IFD of first overview followed by the value of its non-inline tags, except  
- TileOffsets/TileByteCounts
- IFD of last overview followed by the value of its non-inline tags, except  
TileOffsets/TileByteCounts
- Values of the TileOffsets/TileByteCounts arrays of IFD of full resolution 
image
- Values of the TileOffsets/TileByteCounts arrays of IFD of first overview
- ...
- Values of the TileOffsets/TileByteCounts arrays of IFD of last overview

With such a structure, the initial reading of 16 KB at the start of the file 
will be able to load the IFD descriptors of all overviews (and masks, which 
are actually interleaved in between when present). So, combined together with 
task 4, a cold read of a tile at any zoom level (ie opening the file + tile 
request) could result in just 3 network range requests: one to get the IFD 
descriptors at the start of the file, one to read the location of the tile 
from the TileOffsets array and one to read the tile data.
The proposed structure itself is still fully TIFF compliant. The script that 
validates the COG structure will be adapted to accept that new variant of the 
header structure.

Even

(*) Funding by Land Information New Zealand / https://www.linz.govt.nz/

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com