[gdal-dev] Upcoming Cloud Optimized Geotiff (COG) related enhancements
even.rouault at spatialys.com
Mon Jun 3 04:21:06 PDT 2019
> As a very general remark, this proposal seems to focus on minimizing the
> bandwidth from the cog storage to the consumer, whereas I'd argue that
> there are a broad range of usages where the consumer is in the same cloud
> region as the storage and in that case the transferred bandwidth becomes
> much less of an issue compared to the number of GET requests sent to the
> underlying file. That said your proposal does not impede on this remark, I
> just wanted to point out that in that case I believe a more efficient setup
> would be to use a larger curl blocksize to include all strile
> offsets/lengths in a single request.
Minimizing the number of GET requests is certainly one of the main objectives
of this work. Regarding getting all strile offsets/lengths in a single
request, due to the /vsicurl/ caching, you'll get consecutive strile offsets
in one request. But as offset and length arrays are separated, for very large
COGs, you would have needed 2 requests, hence the optimizations to avoid
reading the length array.
> Could this one be renamed to COG_VERSION or COG_FLAVOR, which would allow
> you to have the spec for this metadata evolve over time (e.g. STRILE_ORDER
> could be left out for now as it only has a single valid value) and still be
> set to COG_VERSION=INCOMPATIBLE if needed. COG_VERSION should probably
> become the first member of the metadata string.
I considered that, but I prefered to have orthogonal optimizations. Regarding
BLOCK_ORDER, if in the future we allow other values, I can imagine there could
still be cases where you could prefer BLOCK_ORDER=ROW_MAJOR, so defining a
version number doesn't seem obvious to me.
> I see wasted storage space as important :)
> Another optimization going down a similar road would be to store the
> uint8/uint (depending on bigtiff or not) offset of the first strile in the
> IFD description, and then just having to read the short/uint TileByteCounts
> knowing that each strile is stored consecutively to its predecessor.
But that implies that you need to load the bytecounts of all predecessors. If
you want fast random access to a huge COG, that wouldn't work.
Spatialys - Geospatial professional services
More information about the gdal-dev