<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Segoe UI Emoji";
panose-1:2 11 5 2 4 2 4 2 2 3;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Hi Jeremy,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’ve definitely identified that it’s the mask generation that takes more time and not the jpeg compression. If I force the mask in a .lzw COG the time goes from ~2.5 minutes to a couple hours, and if I just generate a 3-band jpeg with no
mask, it similarly only takes about 3 minutes and exits cleanly and quickly. So any format with an alpha layer or three bands works great, but any format with a mask seems to choke, at least at the size that I’m working.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks for sharing your pipeline. I like it! You only use the default quality though? I’ve found that I can generally perceive artifacts at around 85% and more like 90% if I look hard or it’s the right kind of imagery. We try to save as
much detail as is reasonable since we’re generating imagery that fits into classification and mapping processes and working on machine learning workflows.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">My LZW COGs are around 1-2GB, and the JPEG COGs are about 200-300MB. But I am producing data for areas easily 10x this size, so I worry what that means if we stay with the JPEG pipeline. I generated some WebP images a few years ago but
hadn’t tried with COGs yet because (1) it’s incompatible with ArcMap/GlobalMapper (used by our org.) and (2) we get resistance with any file format that’s not old enough to vote. But the alpha layer support of WebP and the internal-mask-taking-just-shy-of-forever
issue with JPEG might be enough to convince them. I’ll raise the issue but I’m guessing it won’t be an option in the near term. There’s a lot of momentum building for cloud-based service though, so I could be wrong.
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I just modified my command to make webp at the same quality setting and it looks great in QGIS, and shrinks my test COG from 287MB to 195MB, but ArcMap hates it and so does GlobalMapper. Unfortunately as far as I can tell the only one that
all three of them like is the LZW COGs but those are huge. I’m working with GlobalMapper on COGs right now, and I’ll see if I can get the ear of our people who talk with ESRI.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><b>From:</b> Jeremy Palmer <palmerjnz@gmail.com> <br>
<b>Sent:</b> Wednesday, April 22, 2020 12:22 AM<br>
<b>To:</b> Ritchie, Andrew C <aritchie@usgs.gov><br>
<b>Cc:</b> Even Rouault <even.rouault@spatialys.com>; gdal-dev@lists.osgeo.org<br>
<b>Subject:</b> [EXTERNAL] Re: [gdal-dev] gdal_translate (3.1.0dev) "never" finishes on large jpeg cogs... REALLLLLY long time to unload.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Hi Andy,<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Wed, Apr 22, 2020 at 8:33 AM Ritchie, Andrew C <<a href="mailto:aritchie@usgs.gov">aritchie@usgs.gov</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Sorry I should’ve run more tests to clarify the situation re BIGTIFFs. It looks like gdal_translate honors -co BIGTIFF=NO for the raster but not the mask.<o:p></o:p></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">What's the output size of your COG when it successful completes?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Incidentally, when I kill the process with ctrl-C (on a windoze machine) GDAL fails to exit gracefully (2 of 2 times this run) with the following as the final debug message<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GDAL: Flushing dirty blocks: 0GTIFF: Waiting for worker job to finish handling block 0<o:p></o:p></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">In my experience, the progress reporting in GDAL is not very good and can spend a lot of time in the flushing dirty blocks process. It might be that you can't interrupt GDAL at this point. I would wait a little longer. Even will be able
to comment further on this. <o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">My cmd:<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">gdal_translate <infile.tif> <outfile.tif> -b 1 -b 2 -b 3 -mask 4 -of cog -co COMPRESS=LZW -co PREDICTOR=2 -co NUM_THREADS=ALL_CPUs -co RESAMPLING=AVERAGE -co BIGTIFF=NO –config
GDAL_TIF_OVR_BLOCKSIZE 128 –debug ON<o:p></o:p></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Seems ok to me. For our processing of aerial RGB photos COGs, when we are interested in web mapping use and a good balance between storage size and quality, we go for something like:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">gdalbuildvrt \<br>
-addalpha -hidenodata \<br>
$PWD/$TIF_FOLDER.vrt \<br>
$PWD/$TIF_FOLDER/*.tif<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">gdal_translate \<br>
-of COG \<br>
-co COMPRESS=WebP \<br>
-co NUM_THREADS=ALL_CPUS \<br>
-co BIGTIFF=YES \<br>
-co TILING_SCHEME=GoogleMapsCompatible \<br>
--config BIGTIFF_OVERVIEW YES \<br>
-co ALIGNED_LEVELS=3 \<br>
-co ADD_ALPHA=YES \<br>
-co BLOCKSIZE=512 \<br>
-co RESAMPLING=CUBIC \<br>
$PWD/$TIF_FOLDER.vrt $PWD/$TIF_FOLDER.webp.google.aligned.cog.tif<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Jeremy – to clarify, I have confirmed that if I wait long enough, the COG will finish, so generating in the background is feasible if slow. I was just surprised that including a
transparency mask increases the processing time so much. It’s necessary to reduce the file size using jpeg or webp compression and still provide transparency I guess, but it’s a huge performance penalty to pay. I don’t have enough programming experience (or
time) to do profiling and figure out what the bottleneck is, and don’t get me wrong – I
<span style="font-family:"Segoe UI Emoji",sans-serif">❤</span> gdal x 10^10, but I thought this was worth mentioning because of the increase in time (which is so long I initially thought it was actually a hang). <o:p></o:p></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><br>
First, I would consider using WebP if you think your users can handle that. It's way better than JPEG+Mask. Note I'm surprised that adding the mask to the tiff is adding heaps of additional time. Can you generate your dataset with and without the mask to see
the time difference? As mentioned before, most of the processing time is taken up in the overview generation (especially when compared to the data compression stage, which can use all of your CPU cores). Hopefully, some upcoming GDAL improvements can improve
this situation.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">As far as the steps to generate a COG – I output tiled tiffs, then create a VRT, then create a RGBA LZW cog, preview, and generate a JPEG COG. I only added the RGBA LZW cog because
of the issues I was having generating the JPG cog – it’s actually a good point to delete the tiles in my workflow because I can go back to the LZW cog again and again if I need to since it’s lossless.<o:p></o:p></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">What was the issue you were having with JPEG compression? Just time to process? I would try the above command to see if that gives a good result (remove warping to GoogleMap projection if you don't need that as that adds a lot to processing
times)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Cheers,<br>
Jeremy<o:p></o:p></p>
</div>
</div>
</div>
</div>
</body>
</html>