<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-ligatures:standardcontextual;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML-esimuotoiltu Char";
margin:0cm;
font-size:10.0pt;
font-family:"Courier New";}
span.HTML-esimuotoiltuChar
{mso-style-name:"HTML-esimuotoiltu Char";
mso-style-priority:99;
mso-style-link:HTML-esimuotoiltu;
font-family:Consolas;
mso-ligatures:standardcontextual;}
span.Shkpostityyli25
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="FI" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Could Zarr be used as Sozipped
<a href="https://gdal.org/programs/sozip.html">https://gdal.org/programs/sozip.html</a>?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">-Jukka Rahkonen-<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="mso-ligatures:none">Lähettäjä:</span></b><span style="mso-ligatures:none"> gdal-dev <gdal-dev-bounces@lists.osgeo.org>
<b>Puolesta </b>Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev<br>
<b>Lähetetty:</b> perjantai 8. joulukuuta 2023 21.44<br>
<b>Vastaanottaja:</b> Even Rouault <even.rouault@spatialys.com>; gdallists <gdal-dev@lists.osgeo.org><br>
<b>Aihe:</b> Re: [gdal-dev] [BULK] Re: [EXTERNAL] Re: GTiff bit shuffle compression feature request<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-US">Unfortunately Zarr has a design choice that won’t work for us: blocks are individual files on a file system. Our datasets are massive and this will explode our inode allocations. While we could archive the folder into
a zip archive, it adds a step for anyone to work with the data. Curiously, this sparse friendly representation seems totally baked into the format and there’s no way to opt out. I’m not quite ready to share compression ratio findings, but initial results
are consistent with my expectations. Bitshuffle is effective for our data and works very well with another Zarr feature “DELTA_DTYPE” which _<i>can</i>_ be a form of lossless compression if the max delta is known.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I understand not wanting to make third party tiff compliance worse than it already is from the GDAL project perspective. That would be minimized if the functionality were added to libtiff proper, so any project that
depends on libtiff could benefit from the enhancement. <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:12.0pt;color:black;mso-ligatures:none">From:
</span></b><span lang="EN-US" style="font-size:12.0pt;color:black;mso-ligatures:none">gdal-dev <<a href="mailto:gdal-dev-bounces@lists.osgeo.org">gdal-dev-bounces@lists.osgeo.org</a>> on behalf of "Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS
INC] via gdal-dev" <<a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>><br>
<b>Reply-To: </b>"Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC]" <<a href="mailto:jesse.r.meyer@nasa.gov">jesse.r.meyer@nasa.gov</a>><br>
<b>Date: </b>Friday, December 8, 2023 at 12:40 PM<br>
<b>To: </b>Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>>, gdallists <<a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>><br>
<b>Subject: </b>[BULK] Re: [gdal-dev] [EXTERNAL] Re: GTiff bit shuffle compression feature request<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="mso-ligatures:none"><o:p> </o:p></span></p>
</div>
<p class="MsoNormal"><span lang="EN-US">Thanks for the suggestion Even, we’ll see how effective Zarr is for our datasets.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jesse<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:12.0pt;color:black">From:
</span></b><span lang="EN-US" style="font-size:12.0pt;color:black">Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>><br>
<b>Date: </b>Friday, December 8, 2023 at 12:20 PM<br>
<b>To: </b>"Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC]" <<a href="mailto:jesse.r.meyer@nasa.gov">jesse.r.meyer@nasa.gov</a>>, gdallists <<a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>><br>
<b>Subject: </b>[EXTERNAL] Re: [gdal-dev] GTiff bit shuffle compression feature request</span><span lang="EN-US" style="font-size:12.0pt;color:black;mso-ligatures:none"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
</div>
<table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" align="left" style="border:solid black 1.5pt">
<tbody>
<tr>
<td width="100%" style="width:100.0%;border:none;background:#FFEB9C;padding:3.75pt 3.75pt 3.75pt 3.75pt">
<p class="MsoNormal" style="mso-element:frame;mso-element-frame-hspace:2.25pt;mso-element-wrap:around;mso-element-anchor-vertical:paragraph;mso-element-anchor-horizontal:column;mso-height-rule:exactly">
<b><span style="font-size:10.0pt;color:black">CAUTION:</span></b><span style="color:black">
</span><span style="font-size:10.0pt;color:black">This email originated from outside of NASA. Please take care when clicking links or opening attachments. Use the "Report Message" button to report suspicious messages to the NASA SOC.</span><span style="color:black">
</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span lang="EN-US"><o:p> </o:p></span></p>
<div>
<p><span lang="EN-US">Jesse,<o:p></o:p></span></p>
<p><span lang="EN-US">This would break interoperability with other TIFF readers... Even adding a new TIFF tag to advertize that bit shuffling is applied would probably not be a sufficient guard, as existing readers wouldn't read it, and would just display garbage,
which is worth that not being able to open the file at all. The only way I can think off of doing that in a safe way would be to use new values for the Compression tag, which isn't pretty either.<o:p></o:p></span></p>
<p><span lang="EN-US">You should probably try Zarr which has such capability with the Blosc codec. Cf
<a href="https://gdal.org/drivers/raster/zarr.html">https://gdal.org/drivers/raster/zarr.html</a> : BLOSC_SHUFFLE<o:p></o:p></span></p>
<p><span lang="EN-US">I'm curious however to know which typical compression gain you get with that.<o:p></o:p></span></p>
<p><span lang="EN-US">Even<o:p></o:p></span></p>
<p><span lang="EN-US"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-US">Le 08/12/2023 à 18:06, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev a écrit :<o:p></o:p></span></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span lang="EN-US">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">When using horizonal differencing to reduce the numerical range of band data, the upper bytes in the produced stream are typically 0 which leverages LZ’s byte based compression model. But the least significant bytes
can still have many significant bits as 0. Unless the whole byte is replicated, LZ compressors can’t do much to leverage the pattern however. For data with temporal and or spatial coherence, ‘shuffling’ is another effective strategy to losslessly reform the
data stream to be favorable to LZ style compressors. And plays nicely off gains already provided by the PREDICTOR functionality.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The notion is to arrange the bit stream where the Nth “shuffled” byte contains the Nth bit from each byte in the sequence. The sequence length is usually determined by the data type bit length.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">For example (for brevity, assume bytes are 4 bits long)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Byte 1, Byte 2, Byte 3, Byte 4<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">0001, 0011, 0111, 0001<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">They all share the top 0 bit and the bottom 1 bit,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">“Shuffled”<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">0000, 0010, 0110, 1111<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The algorithm is pretty simple to implement, and can be SIMD accelerated for high performance.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">While we specifically are users of the GTIFF format, such a strategy could be employed generically for most raster and even vector formats.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Best,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jesse<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span lang="EN-US" style="mso-ligatures:none"><o:p> </o:p></span></p>
<pre><span lang="EN-US">_______________________________________________<o:p></o:p></span></pre>
<pre><span lang="EN-US">gdal-dev mailing list<o:p></o:p></span></pre>
<pre><span lang="EN-US"><a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a><o:p></o:p></span></pre>
<pre><span lang="EN-US"><a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><o:p></o:p></span></pre>
</blockquote>
<pre><span lang="EN-US">-- <o:p></o:p></span></pre>
<pre><span lang="EN-US"><a href="http://www.spatialys.com/">http://www.spatialys.com</a><o:p></o:p></span></pre>
<pre><span lang="EN-US">My software is free, but my time generally not.<o:p></o:p></span></pre>
</div>
</div>
</body>
</html>