[gdal-dev] Upcoming Cloud Optimized Geotiff (COG) related enhancements

Even Rouault even.rouault at spatialys.com
Mon Jun 3 04:21:06 PDT 2019


Hi Thomas,

> As a very general remark, this proposal seems to focus on minimizing the
> bandwidth from the cog storage to the consumer, whereas I'd argue that
> there are a broad range of usages where the consumer is in the same cloud
> region as the storage and in that case the transferred bandwidth becomes
> much less of an issue compared to the number of GET requests sent to the
> underlying file. That said your proposal does not impede on this remark, I
> just wanted to point out that in that case I believe a more efficient setup
> would be to use a larger curl blocksize to include all strile
> offsets/lengths in a single request.

Minimizing the number of GET requests is certainly one of the main objectives 
of this work. Regarding getting all strile offsets/lengths in a single 
request, due to the /vsicurl/ caching, you'll get consecutive strile offsets 
in one request. But as offset and length arrays are separated, for very large 
COGs, you would have needed 2 requests, hence the optimizations to avoid 
reading the length array.

> Could this one be renamed to COG_VERSION or COG_FLAVOR, which would allow
> you to have the spec for this metadata evolve over time (e.g. STRILE_ORDER
> could be left out for now as it only has a single valid value) and still be
> set to COG_VERSION=INCOMPATIBLE if needed. COG_VERSION should probably
> become the first member of the metadata string.

I considered that, but I prefered to have orthogonal optimizations. Regarding 
BLOCK_ORDER, if in the future we allow other values, I can imagine there could 
still be cases where you could prefer BLOCK_ORDER=ROW_MAJOR, so defining a 
version number doesn't seem obvious to me.

> I see wasted storage space as important :)

Sure

> Another optimization going down a similar road would be to store the
> uint8/uint (depending on bigtiff or not) offset of the first strile in the
> IFD description, and then just having to read the short/uint TileByteCounts
> knowing that each strile is stored consecutively to its predecessor.

But that implies that you need to load the bytecounts of all predecessors. If 
you want fast random access to a huge COG, that wouldn't work.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list