<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-ligatures:standardcontextual;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">When using horizonal differencing to reduce the numerical range of band data, the upper bytes in the produced stream are typically 0 which leverages LZ’s byte based compression model. But the least significant bytes can still have many
significant bits as 0. Unless the whole byte is replicated, LZ compressors can’t do much to leverage the pattern however. For data with temporal and or spatial coherence, ‘shuffling’ is another effective strategy to losslessly reform the data stream to be
favorable to LZ style compressors. And plays nicely off gains already provided by the PREDICTOR functionality.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The notion is to arrange the bit stream where the Nth “shuffled” byte contains the Nth bit from each byte in the sequence. The sequence length is usually determined by the data type bit length.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">For example (for brevity, assume bytes are 4 bits long)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Byte 1, Byte 2, Byte 3, Byte 4<o:p></o:p></p>
<p class="MsoNormal">0001, 0011, 0111, 0001<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">They all share the top 0 bit and the bottom 1 bit,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">“Shuffled”<o:p></o:p></p>
<p class="MsoNormal">0000, 0010, 0110, 1111<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The algorithm is pretty simple to implement, and can be SIMD accelerated for high performance.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">While we specifically are users of the GTIFF format, such a strategy could be employed generically for most raster and even vector formats.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Best,<o:p></o:p></p>
<p class="MsoNormal">Jesse<o:p></o:p></p>
</div>
</body>
</html>