[gdal-dev] GTiff: new DISCARD_LSB creation option

Tue Oct 21 09:11:18 PDT 2014

Hi,

Following a recent discussion on using PREDICTOR=2 with COMPRESS=DEFLATE with 
TIFF, I've implemented in trunk a trick suggested by Adobe in the TIFF 
specification to improve the effectiveness of horizontal prediction (which is 
using the difference between consecutive pixels rather than their value)

The DISCARD_LSB=nbit creation option is an initial *lossy* compression step 
that will discard nbit least-significant bits of the pixel values. A different 
value can be specified per band with nbit_band1,nbit_band2,...nbit_bandN.
A more practical view of this is that it decreases the number of colors per 
channel.

For example :

gdal_translate world.topo.bathy.200406.3x21600x21600.C1.png out_lsb1.tif \
   -co tiled=yes -co compress=deflate -co predictor=2 -co discard_lsb=1

gdal_translate world.topo.bathy.200406.3x21600x21600.C1.png out_lsb213.tif \
   -co tiled=yes -co compress=deflate -co predictor=2 -co discard_lsb=2,1,3

Resulting file sizes on the above mentionned RGB BMNG tile (21600x21600 
pixels):
world.topo.bathy.200406.3x21600x21600.C1.png: 484 696 919 bytes
out_lsb000.tif: 467 791 323 (i.e. lossless compression)
out_lsb111.tif: 352 89 5108
out_lsb213.tif: 286 368 793
out_lsb222.tif: 259 788 627
out_lsb324.tif: 210 505 787
out_lsb333.tif: 184 807 316
out_lsb334.tif: 177 060 429

--> discard_lsb=1 has really nearly undetectable visual degradation.
--> discard_lsb=2,1,3 : the rationale for that one is that the human eye is 
sensitive mostly to luminance, and in the usual computation of luminance from 
red, green, blue channels, the green channel has a weight of 72%, red 21% and 
blue 7%, so we discard more red bits than green bits, and more blue than red. 
Very good result overall. Some tiny artefacts can be seen in the blue 
gradients in the oceanic areas when watching closely.
--> the more you increase the number of discarded bits, the more artifacts in 
blue gradients. Quality on land areas remains quite good.

To be compared with JPEG compression (quality of 95% and 90%, YCbCr 4:2:0) :
out_jpeg_95_ycbcr.tif: 108 487 980
out_jpeg_90_ycbcr.tif: 72 054 360

So JPEG compression is more efficient, doesn't exhibit the issue with blue 
gradients but has the typical JPEG artifacts with high frequencies.

The advantage of DISCARD_LSB is that you have a guarantee on the error : it 
cannot exceed 2^(nbits-1) (and the mean error should be half ot that for 
evenly distributed values). It can also be used with RGBA images where JPEG 
YCbCr in TIFF cannot be used.

Important note: this is only something done on compression side, and doesn't 
change the encoded scheme. So 100% compatibility with any DEFLATE + PREDICTOR 
compatible reader.

Theoretically, that could be enhanced to do adaptative compression per tile 
(or even within a tile), by adjusting the number of discarded bits depending 
on the sensitiveness of the eye to the content.
Whereas JPEG-in-TIFF doesn't allow this (quantization tables are common to the 
whole file).

Happy experimentations !

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com