[gdal-dev] GTiff bit shuffle compression feature request

Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] jesse.r.meyer at nasa.gov
Fri Dec 8 09:06:59 PST 2023


Hi,

When using horizonal differencing to reduce the numerical range of band data, the upper bytes in the produced stream are typically 0 which leverages LZ’s byte based compression model.  But the least significant bytes can still have many significant bits as 0. Unless the whole byte is replicated, LZ compressors can’t do much to leverage the pattern however.  For data with temporal and or spatial coherence, ‘shuffling’ is another effective strategy to losslessly reform the data stream to be favorable to LZ style compressors.  And plays nicely off gains already provided by the PREDICTOR functionality.

The notion is to arrange the bit stream where the Nth “shuffled” byte contains the Nth bit from each byte in the sequence.  The sequence length is usually determined by the data type bit length.

For example (for brevity, assume bytes are 4 bits long)

Byte 1,  Byte 2, Byte 3, Byte 4
0001, 0011, 0111, 0001

They all share the top 0 bit and the bottom 1 bit,

“Shuffled”
0000, 0010, 0110, 1111

The algorithm is pretty simple to implement, and can be SIMD accelerated for high performance.

While we specifically are users of the GTIFF format, such a strategy could be employed generically for most raster and even vector formats.

Best,
Jesse
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20231208/d044b8c4/attachment.htm>


More information about the gdal-dev mailing list