[gdal-dev] gdal_translate (3.1.0dev) "never" finishes on large jpeg cogs... REALLLLLY long time to unload.

Ritchie, Andrew C aritchie at usgs.gov
Tue Apr 21 15:33:23 PDT 2020


Hi Jeremy and Even,

Sorry I should’ve run more tests to clarify the situation re BIGTIFFs. It looks like gdal_translate honors -co BIGTIFF=NO for the raster but not the mask.

If I omit the mask, then I don’t see any bigtiff messages with -co BIGTIFF=NO. If I include the mask band (change -b 4 to -mask 4 with lzw-compressed) then with –debug on, then in the first few seconds of running I see the following lines:

COG: Generating overviews of the mask
GTIFF: File being created as a BIGTIFF.
GTIFF: Using 24 threads for compression

Incidentally, when I kill the process with ctrl-C (on a windoze machine) GDAL fails to exit gracefully (2 of 2 times this run) with the following as the final debug message

GDAL: Flushing dirty blocks: 0GTIFF: Waiting for worker job to finish handling block 0

My cmd:
gdal_translate <infile.tif> <outfile.tif> -b 1 -b 2 -b 3 -mask 4 -of cog -co COMPRESS=LZW -co PREDICTOR=2 -co NUM_THREADS=ALL_CPUs -co RESAMPLING=AVERAGE -co BIGTIFF=NO –config GDAL_TIF_OVR_BLOCKSIZE 128 –debug ON

Jeremy – to clarify, I have confirmed that if I wait long enough, the COG will finish, so generating in the background is feasible if slow. I was just surprised that including a transparency mask increases the processing time so much. It’s necessary to reduce the file size using jpeg or webp compression and still provide transparency I guess, but it’s a huge performance penalty to pay. I don’t have enough programming experience (or time) to do profiling and figure out what the bottleneck is, and don’t get me wrong – I ❤ gdal x 10^10, but I thought this was worth mentioning because of the increase in time (which is so long I initially thought it was actually a hang).

As far as the steps to generate a COG – I output tiled tiffs, then create a VRT, then create a RGBA LZW cog, preview, and generate a JPEG COG. I only added the RGBA LZW cog because of the issues I was having generating the JPG cog – it’s actually a good point to delete the tiles in my workflow because I can go back to the LZW cog again and again if I need to since it’s lossless.

Andy

From: Jeremy Palmer <palmerjnz at gmail.com>
Sent: Tuesday, April 21, 2020 2:57 PM
To: Even Rouault <even.rouault at spatialys.com>
Cc: Ritchie, Andrew C <aritchie at usgs.gov>; gdal-dev at lists.osgeo.org
Subject: [EXTERNAL] Re: [gdal-dev] gdal_translate (3.1.0dev) "never" finishes on large jpeg cogs... REALLLLLY long time to unload.

Hi Andrew,

On Wed, Apr 22, 2020 at 6:11 AM Even Rouault <even.rouault at spatialys.com<mailto:even.rouault at spatialys.com>> wrote:

Andrew,



> When I create a mask band in a large lzw-compressed or jpeg-compressed tif

> using the COG driver it dramatically increases processing time over writing

> RGBA (hours instead of minutes), so the issue is not jpeg compression, it's

> the creation of the mask band. Steps to reproduce:



There's clearly some complications & overhead internally to be able to deal with non-alpha mask bands. I wouldn't have expected it to be as large as you observed though. Would require deeper investigation to understand why that causes such performance difference.

I would also be interested to see what steps you have taken to process the COG. We have used the driver to create both WEBP, and JPEG (including the 1bit transparency mask) COGs up to 200GB in size. It did take a couple of days on a mid-sized AWS EC2 instance but completed OK. Note the most of the time is taken up with the overview generation on large COGs. This will hopefully improve soon with multi-threaded overview computation likely coming to GDAL.

Warm Regards,
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20200421/788250ee/attachment-0001.html>


More information about the gdal-dev mailing list