[gdal-dev] Python Bindings and Closed Datasets

Patrick Young patrick.mckendree.young at gmail.com
Wed Oct 30 11:05:01 PDT 2019


Hi all,

I've been experiencing some behavior using the GDAL python bindings where I
am occasionally seeing what appears to be random blocks of the tiff being
unwritten in geotiffs I've pushed to S3.  a small block(s) in one of the
bands will be all zeros while everywhere else is good.

My setup is that I have a thread pool crunching through some gdal.Warp
calls.  The main thread is polling for completed jobs and then uploading
the file to s3.  My theory is that Python's garbage collector hasn't
destroyed the dataset I've set to None before I start uploading.  Is this
plausible?   The call to FlushCache didn't solve the problem for me and I'm
not aware of another way via the Python bindings for ensure the dataset is
closed.  I'm using Ubuntu 19.10 (comes with GDAL 2.4.2), any thoughts and
ideas to try are greatly appreciated, as one can imagine, this is hard to
reproduce.

The code looks something like this:

def warp_tile(f_in, f_out, warp_opts):
    gdal_warp_opts = gdal.WarpOptions(**warp_opts,
creationOptions=["TILED=YES", "COMPRESS=DEFLATE"])
    try:
        warp_ds = gdal.Warp(f_out, f_in, options=gdal_warp_opts)
        warp_ds.FlushCache()
    finally:
        warp_ds = None


with ThreadPoolExecutor(max_workers=max_workers) as executor:

    job_d = {}
    for job in jobs:
        job_d[executor.submit(warp_tile, job.in_f, job.out_f,
job.warp_opts)] = out_f

    for future in as_completed(job_d):
        out_f = job_d[future]
        try:
            future.result()
        except Exception as e:
            ...
        else:

boto3.resource('s3').Bucket(bucket_name).upload_file(Filename=out_f,
Key=key)

Thanks,
Patrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20191030/2da2ee25/attachment.html>


More information about the gdal-dev mailing list