[gdal-dev] Occasional ERROR 4 on gdal.Open() with vrt files hosted on AWS S3

Jon Seymour jon at upowr.com.au
Sun Feb 23 20:46:03 PST 2020


G'day,

I am trying to get to the bottom of some errors I have been experiencing
with gdal 2.4.2 + Python 3 running in a long-running debian container on
AWS ECS. The files I am trying to load are VRT (actually named .tif) that
reference actual .tif files hosted in the same S3 bucket.

Most of the time the access works, the symptom is that if the container has
been running for a long time, then the gdal.Open() returns null and logs an
ERROR 4 to the stderr. For example:

ERROR 4: `/vsis3/acme-foo-bar/baz/quux.tif' not recognized as a supported
file format
If the container is restarted, then the access works as expected. Indeed,
if I call gdal.VsiCurlClearCache(), then the access works as expected, so
the issue is not the AWS credentials. Indeed, verbose vsicurl logging
indicates that there are no 4xx or 5xx errors on any request (almost all
the requests return.a 206 Partial Response response code).

I was initially using gdal 2.4.0 with IAM role-based credentials and
discovered this issue (https://github.com/OSGeo/gdal/issues/1593) that
looked very similar to the issue I was having.

However, I have since upgraded the library to gdal-2.4.2 and an issue with
identical symptoms persists - occasionally. I have even stopped using IAM
role based credentials and switched to using explicitly managed
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables but the
issue persists - occasionally. By occasionally, I mean that I can't
reproduce the problem at will even with files for which it happens
initially. The fact that I do not see any 4xx or 5xx errors indicates to me
that this is not a AWS credentials issue.

I have enabled vsicurl verbose logging and have observed that I get a 206
response of the correct length (1985, in my case), indicating that the
request to AWS S3 has not actually failed and probably has returned the
expected response.

It does seem suspicious to me that a call to gdal.VsiCurlClearCache()
appears to resolve the issue immediately, but it isn't clear why that works
or what the root cause of the underlying issue is.

Any suggestions about how I could debug this would be gratefully accepted.

jon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20200224/3a0b4a83/attachment.html>


More information about the gdal-dev mailing list