[gdal-dev] gdal_translate (3.1.0dev) "never" finishes on large jpeg cogs... REALLLLLY long time to unload.

Ritchie, Andrew C aritchie at usgs.gov
Tue Apr 14 20:37:15 PDT 2020


I've been working in the gdal-dev-env (version 3.1.0, installed around mid-December) on OSGeo4w (mostly because it's faster than making COGs using the GTIFF driver) on large (e.g. 102600x91100) orthophoto rasters, generating VRTs, TIFFs and COGs.

While I can do LZW, DEFLATE, and uncompressed just fine (2 minutes with all cores to make a lzw COG from a VRT), I'm struggling to make JPEG COGs. If I run a loop, I can't make it through more than one image without gdal_translate hanging at the finish for sometimes tens of hours. If I kill the process (CTRL-C doesn't always work, but task mgr does) then the resulting COG is fine (same size as if I wait n hours and the process finishes). Over the last few years I've had this issue (gdal_translate hanging at "100 - done.") on many large rasters even when building as TIFF. Also maybe worth noting, even on smaller rasters I often see GDAL hang for minutes to tens of minutes at the end of a raster build. In the past I was only been building single rasters though, so it's not that big of a deal - I can just kill the process. Not any more. I frequently build several at a time and hope to scale up.

I'm running on a threadripper 3960x with 256GB RAM that I built. All processing is on a NVMe drive. The LZW compressed tiffs (COGs) are around 1.5 - 3GB (8-bit,RGB with mask band). If I build with CPL_DEBUG=ON, depending on cachemax size, I see "potential thrashing on band one of ." at around 10-20% (even with GDAL_CACHEMAX at 80%), and if not set high enough I'm stuck at 20% for hours and hours. Then gdal hangs at "100 - done." for anywhere from 2 - 12+ hours unless I kill it. If I kill the process, the final raster builds out and appears to work fine, and is the same as if I wait X hours for it to exit. For a test with debug on I just finished, after 2.5h hung at "done" I got this line:

GDAL: GDALClose(<outfile.tif.ovr.tmp, this=000001FDC5531C50)

And another 45 minutes later the input and output tiffs closed and shared library unloaded after the RAM slowly emptied from ~30 gig over that time.

My overall command at the moment is:

gdal_translate .\<infile.tif> <outfile.tif> -of COG -co COMPRESS=JPEG -co QUALITY=90 -config GDAL_CACHEMAX "80%" -config GDAL_SWATH_SIZE "80%" -config GDAL_FORCE_CACHING YES -config GDAL_MAX_DATASET_POOL_SIZE 2048

And with lower values (and possibly if I get rid of the GDAL_FORCE_CACHING YES variable - I just added that) I have the same "hang" at 100% lasting for even longer. Again, the same COG builds in 2 minutes with LZW, but with JPEG and all the cachemax settings ramped up, it takes maybe 6 hours.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20200415/19162212/attachment.html>


More information about the gdal-dev mailing list