[gdal-dev] [EXTERNAL] Re: gdal_translate (3.1.0dev) "never" finishes on large jpeg cogs... REALLLLLY long time to unload.
Ritchie, Andrew C
aritchie at usgs.gov
Wed Apr 22 01:24:43 PDT 2020
Hi Jeremy,
I’ve definitely identified that it’s the mask generation that takes more time and not the jpeg compression. If I force the mask in a .lzw COG the time goes from ~2.5 minutes to a couple hours, and if I just generate a 3-band jpeg with no mask, it similarly only takes about 3 minutes and exits cleanly and quickly. So any format with an alpha layer or three bands works great, but any format with a mask seems to choke, at least at the size that I’m working.
Thanks for sharing your pipeline. I like it! You only use the default quality though? I’ve found that I can generally perceive artifacts at around 85% and more like 90% if I look hard or it’s the right kind of imagery. We try to save as much detail as is reasonable since we’re generating imagery that fits into classification and mapping processes and working on machine learning workflows.
My LZW COGs are around 1-2GB, and the JPEG COGs are about 200-300MB. But I am producing data for areas easily 10x this size, so I worry what that means if we stay with the JPEG pipeline. I generated some WebP images a few years ago but hadn’t tried with COGs yet because (1) it’s incompatible with ArcMap/GlobalMapper (used by our org.) and (2) we get resistance with any file format that’s not old enough to vote. But the alpha layer support of WebP and the internal-mask-taking-just-shy-of-forever issue with JPEG might be enough to convince them. I’ll raise the issue but I’m guessing it won’t be an option in the near term. There’s a lot of momentum building for cloud-based service though, so I could be wrong.
I just modified my command to make webp at the same quality setting and it looks great in QGIS, and shrinks my test COG from 287MB to 195MB, but ArcMap hates it and so does GlobalMapper. Unfortunately as far as I can tell the only one that all three of them like is the LZW COGs but those are huge. I’m working with GlobalMapper on COGs right now, and I’ll see if I can get the ear of our people who talk with ESRI.
From: Jeremy Palmer <palmerjnz at gmail.com>
Sent: Wednesday, April 22, 2020 12:22 AM
To: Ritchie, Andrew C <aritchie at usgs.gov>
Cc: Even Rouault <even.rouault at spatialys.com>; gdal-dev at lists.osgeo.org
Subject: [EXTERNAL] Re: [gdal-dev] gdal_translate (3.1.0dev) "never" finishes on large jpeg cogs... REALLLLLY long time to unload.
Hi Andy,
On Wed, Apr 22, 2020 at 8:33 AM Ritchie, Andrew C <aritchie at usgs.gov<mailto:aritchie at usgs.gov>> wrote:
Sorry I should’ve run more tests to clarify the situation re BIGTIFFs. It looks like gdal_translate honors -co BIGTIFF=NO for the raster but not the mask.
What's the output size of your COG when it successful completes?
Incidentally, when I kill the process with ctrl-C (on a windoze machine) GDAL fails to exit gracefully (2 of 2 times this run) with the following as the final debug message
GDAL: Flushing dirty blocks: 0GTIFF: Waiting for worker job to finish handling block 0
In my experience, the progress reporting in GDAL is not very good and can spend a lot of time in the flushing dirty blocks process. It might be that you can't interrupt GDAL at this point. I would wait a little longer. Even will be able to comment further on this.
My cmd:
gdal_translate <infile.tif> <outfile.tif> -b 1 -b 2 -b 3 -mask 4 -of cog -co COMPRESS=LZW -co PREDICTOR=2 -co NUM_THREADS=ALL_CPUs -co RESAMPLING=AVERAGE -co BIGTIFF=NO –config GDAL_TIF_OVR_BLOCKSIZE 128 –debug ON
Seems ok to me. For our processing of aerial RGB photos COGs, when we are interested in web mapping use and a good balance between storage size and quality, we go for something like:
gdalbuildvrt \
-addalpha -hidenodata \
$PWD/$TIF_FOLDER.vrt \
$PWD/$TIF_FOLDER/*.tif
gdal_translate \
-of COG \
-co COMPRESS=WebP \
-co NUM_THREADS=ALL_CPUS \
-co BIGTIFF=YES \
-co TILING_SCHEME=GoogleMapsCompatible \
--config BIGTIFF_OVERVIEW YES \
-co ALIGNED_LEVELS=3 \
-co ADD_ALPHA=YES \
-co BLOCKSIZE=512 \
-co RESAMPLING=CUBIC \
$PWD/$TIF_FOLDER.vrt $PWD/$TIF_FOLDER.webp.google.aligned.cog.tif
Jeremy – to clarify, I have confirmed that if I wait long enough, the COG will finish, so generating in the background is feasible if slow. I was just surprised that including a transparency mask increases the processing time so much. It’s necessary to reduce the file size using jpeg or webp compression and still provide transparency I guess, but it’s a huge performance penalty to pay. I don’t have enough programming experience (or time) to do profiling and figure out what the bottleneck is, and don’t get me wrong – I ❤ gdal x 10^10, but I thought this was worth mentioning because of the increase in time (which is so long I initially thought it was actually a hang).
First, I would consider using WebP if you think your users can handle that. It's way better than JPEG+Mask. Note I'm surprised that adding the mask to the tiff is adding heaps of additional time. Can you generate your dataset with and without the mask to see the time difference? As mentioned before, most of the processing time is taken up in the overview generation (especially when compared to the data compression stage, which can use all of your CPU cores). Hopefully, some upcoming GDAL improvements can improve this situation.
As far as the steps to generate a COG – I output tiled tiffs, then create a VRT, then create a RGBA LZW cog, preview, and generate a JPEG COG. I only added the RGBA LZW cog because of the issues I was having generating the JPG cog – it’s actually a good point to delete the tiles in my workflow because I can go back to the LZW cog again and again if I need to since it’s lossless.
What was the issue you were having with JPEG compression? Just time to process? I would try the above command to see if that gives a good result (remove warping to GoogleMap projection if you don't need that as that adds a lot to processing times)
Cheers,
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20200422/22219812/attachment-0001.html>
More information about the gdal-dev
mailing list