From vincent.sarago at gmail.com  Mon Apr 16 12:14:04 2018
From: vincent.sarago at gmail.com (Vincent Sarago)
Date: Mon, 16 Apr 2018 15:14:04 -0400
Subject: [Landsat-pds] COG Format discussion
Message-ID: <A480395F-AB8B-4924-9449-30E4BE2EC9EE@gmail.com>

Sorry for hijacking this mailing list but as Landsat dataset is one of the biggest open COG dataset, discussion about the evolution of the format here made sense to us.

Couple weeks ago we started a discussion internally about COGEO format. It’s great to see how many people are using and implementing COG right now (e.g planet, digitalglobe…) that’s say we think there is still place for improvement. 

Here are some of our ideas:
- add webp compression to libtiff. Even if webp wasn’t a big success in term of adoption, the format is still a good option in comparison to JPEG and PNG. 
- Improve mask storing inside COG. For some technical reason, when you create a COG with internal masking, the mask is appended to the end of the IFD. Some of improvement could be to append mask TILE data just after the imagery tile data.

we’d love to hear story or comments from people about the format and how they see it moving in the future.


From warmerdam at pobox.com  Mon Apr 16 14:59:51 2018
From: warmerdam at pobox.com (Frank Warmerdam)
Date: Mon, 16 Apr 2018 14:59:51 -0700
Subject: [Landsat-pds] COG Format discussion
In-Reply-To: <A480395F-AB8B-4924-9449-30E4BE2EC9EE@gmail.com>
References: <A480395F-AB8B-4924-9449-30E4BE2EC9EE@gmail.com>
Message-ID: <CA+YzLBcJ80cHqAOuvM0PsrtVdY=Wrn=mbyQDxOYnmNHLe6160A@mail.gmail.com>

On Mon, Apr 16, 2018 at 12:14 PM, Vincent Sarago <vincent.sarago at gmail.com>
wrote:

> Sorry for hijacking this mailing list but as Landsat dataset is one of the
> biggest open COG dataset, discussion about the evolution of the format here
> made sense to us.
>
> Couple weeks ago we started a discussion internally about COGEO format.
> It’s great to see how many people are using and implementing COG right now
> (e.g planet, digitalglobe…) that’s say we think there is still place for
> improvement.
>
> Here are some of our ideas:
> - add webp compression to libtiff. Even if webp wasn’t a big success in
> term of adoption, the format is still a good option in comparison to JPEG
> and PNG.
>

Vincent, and other folks,

I'd be interested in whether webp is believed to have good enough
performance that it would be a substantial improvement over the Deflate
support already available in libtiff.  I'm also interested in how it
compares to the new zstd support (
https://github.com/OSGeo/gdal/commit/1c60366a193e67ee90856e1008e3c17cb8524f60#diff-ce45424050585add924746240ffc2761).
I'm willing to support new codecs in libtiff if they add significant value,
but I don't want to just add every compression format known.


> - Improve mask storing inside COG. For some technical reason, when you
> create a COG with internal masking, the mask is appended to the end of the
> IFD. Some of improvement could be to append mask TILE data just after the
> imagery tile data.
>

That is interesting.  Even, can you comment on what it would take to ensure
extra mask IFDs are located near their corresponding imagery as part of
GDAL and the implications for COG?

I assume you are suggesting a file with the "nodata" handled as a distinct
IFD like this (our internal serving format):

http://download.osgeo.org/gdal/data/gtiff/20160929_023611_0e0f_Browse.tif

...
TIFF Directory at offset 0x2070 (8304)
  Subfile Type: reduced-resolution image (1 = 0x1)
  Image Width: 857 Image Length: 438
  Tile Width: 128 Tile Length: 128
  Bits/Sample: 8
  Sample Format: unsigned integer
  Compression Scheme: JPEG
  Photometric Interpretation: YCbCr
  YCbCr Subsampling: 2, 2
  Samples/Pixel: 3
  Planar Configuration: single image plane
  Reference Black/White:
     0:     0   255
     1:   128   255
     2:   128   255
  JPEG Tables: (142 bytes)
...
TIFF Directory at offset 0x2a0e (10766)
  Subfile Type: reduced-resolution image/transparency mask (5 = 0x5)
  Image Width: 857 Image Length: 438
  Tile Width: 128 Tile Length: 128
  Bits/Sample: 1
  Sample Format: unsigned integer
  Compression Scheme: AdobeDeflate
  Photometric Interpretation: transparency mask
  Samples/Pixel: 1
  Planar Configuration: single image plane
  Predictor: none 1 (0x1)
...

I consider this format very useful (lossy compression for the imagery, but
losslessly compress nodata masks) for some purposes and I'd be interested
in methods to optimize it for COG use even though it is pretty rare for
applications to properly support it.

Best regards,
Frank


>
> we’d love to hear story or comments from people about the format and how
> they see it moving in the future.
>
> _______________________________________________
> Landsat-pds mailing list
> Landsat-pds at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/landsat-pds
>


-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows |
and watch the world go round - Rush    | Geospatial Software Developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20180416/2c80000d/attachment.html>

From even.rouault at spatialys.com  Tue Apr 17 00:54:40 2018
From: even.rouault at spatialys.com (Even Rouault)
Date: Tue, 17 Apr 2018 09:54:40 +0200
Subject: [Landsat-pds] COG Format discussion
In-Reply-To: <CA+YzLBcJ80cHqAOuvM0PsrtVdY=Wrn=mbyQDxOYnmNHLe6160A@mail.gmail.com>
References: <A480395F-AB8B-4924-9449-30E4BE2EC9EE@gmail.com>
 <CA+YzLBcJ80cHqAOuvM0PsrtVdY=Wrn=mbyQDxOYnmNHLe6160A@mail.gmail.com>
Message-ID: <2370754.Q33c2g4Dgx@even-i700>

> I'd be interested in whether webp is believed to have good enough
> performance that it would be a substantial improvement over the Deflate
> support already available in libtiff.  I'm also interested in how it
> compares to the new zstd support (
> https://github.com/OSGeo/gdal/commit/1c60366a193e67ee90856e1008e3c17cb8524f6
> 0#diff-ce45424050585add924746240ffc2761). I'm willing to support new codecs
> in libtiff if they add significant value, but I don't want to just add
> every compression format known.

You can only compare webp vs deflate or zstd for its lossless profile.
The WebP website provides this comparison against PNG:
https://developers.google.com/speed/webp/docs/webp_lossless_alpha_study
so claiming a 42% size improvement over PNG

For lossy support, if you believe Mozilla (pushing for MozJpeg)
https://research.mozilla.org/2014/07/15/mozilla-advances-jpeg-encoding-with-mozjpeg-2-0/
rather than Google
https://developers.google.com/speed/webp/docs/webp_study
"We consider this study to be inconclusive when it comes to the question of whether
"WebP and/or JPEG XR outperform JPEG by any significant margin"

One potential advantage of webp is that lossy webp for imagery with lossless alpha would
be naturally supported (in comparison to the second point below).
One drawback of webp is that it is really RGB[A] only (would likely be unwise to use
it for 3 or 4 band image with other photometric interpretations)

> 
> > - Improve mask storing inside COG. For some technical reason, when you
> > create a COG with internal masking, the mask is appended to the end of the
> > IFD. Some of improvement could be to append mask TILE data just after the
> > imagery tile data.

> 
> I assume you are suggesting a file with the "nodata" handled as a distinct
> IFD like this (our internal serving format):

That was I understood from previous discussion with Vincent's team.

> That is interesting.  Even, can you comment on what it would take to ensure
> extra mask IFDs are located near their corresponding imagery as part of
> GDAL and the implications for COG?

Such a layout of blocks would apply only for Planar Configuration==single image plane
That's certainly doable by the GDAL GTiff driver, but would require the copying
of imagery to be done in a custom fashion, to properly interleave imagery blocks with
mask blocks, instead of using GDALDatasetCopyWholeRaster() (for imagery) and
GDALRasterBandCopyWholeRaster() (for masks). Actually, implementation wise,
that could be a new option of GDALDatasetCopyWholeRaster (INTERLEAVE_MASK=YES).
For very large images (dimensions > 100,000 pixels) where the size of the
TileOffsets and TileByteCounts tags is big this constant back and forth between IFD
could be rather costly, as they are reloaded/flushed each time you change the active
IFD in libtiff.

The COG definition would have to be updated for that use case (probably as an
allowed extra optimization, rather than forcing people to use it), and the validation
script as well.

On the GDAL read side, the GTiff driver would also have to be updated to detect this
layout and when reading a block of imagery, it should issue a GET range request that
is big enough to fetch the imagery block and its mask block at once (the base logic to
optimize GET requests for a given IRasterIO() request is already in place)

Even


-- 
Spatialys - Geospatial professional services
http://www.spatialys.com