[gdal-dev] Prevent GDAL reads for .aux or any other s3 objects that do not exist

Darren Weber darren.weber at jupiterintel.com
Wed Mar 3 15:56:15 PST 2021


In https://github.com/mapbox/rasterio/issues/2119#issuecomment-790024225
I've noted how s3-COG reads can use retries to handle s3 rate throttling
(503) by enabling the `GDAL_HTTP_MAX_RETRY`.  Our dataset on s3 only
contains *.tif files and we need to prevent GDAL from issuing any GET/HEAD
requests to try to read any other files.  Is there an env-var for this?

The `GDAL_DISABLE_READDIR_ON_OPEN` is `False` so that GDAL can read the s3
prefix to list objects in one GET request.  From that read, it should have
already determined that there are only *.tif files and no *.aux* files
available.  When it gets a 503 rate throttle response, however, it seems to
auto-try to read some supplementary files, e.g. one log contains messages
that indicate GDAL is trying to read an .aux file that does not exist and
we want to prevent GDAL from issuing any GET or HEAD requests for anything
other than *.tif files (objects).

[WARNING] 2021-03-03T19:37:40.312Z CPLE_AppDefined in HTTP error code:
503 - https://a-bucket.s3.amazonaws.com/geotiff.aux. Retrying again in
0.5 secs
[WARNING] 2021-03-03T19:37:40.795Z CPLE_AppDefined in HTTP error code:
503 - https://a-bucket.s3.amazonaws.com/geotiff.aux. Retrying again in
1.0 secs
[WARNING] 2021-03-03T19:37:41.796Z CPLE_AppDefined in HTTP error code:
503 - https://a-bucket.s3.amazonaws.com/geotiff.aux. Retrying again in
2.2 secs
[WARNING] 2021-03-03T19:37:44.290Z CPLE_AppDefined in HTTP response
code on https://a-bucket.s3.amazonaws.com/geotiff.aux: 503
2021-03-03 19:37:44,155 | INFO | extract_raster_points:1036 |
s3://a-bucket/geotiff.tif extract cells (1)...
2021-03-03 19:37:44,347 | INFO | extract_raster_points:1050 |
s3://a-bucket/geotiff.tif extract cells (1) done

That log suggests that there is a huge performance hit to retry reading an
.aux file that does not exist and GDAL should not even try to issue any
kind of request for it (i.e. we want to prevent it).

When running some unit tests with debug curl details available (but no
ability to create 503 responses), it appears that GDAL finds just the file
required and reads only partial reads from that file, e.g. this is a sample
of GET requests to read from one .tif s3-COG:

$ grep 'GET' tmp.log
> GET /?delimiter=%2F&prefix=unit_tests%2Fgis%2F HTTP/1.1
> GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> GET /unit_tests/gis/cea_blocks.tif HTTP/1.1

TIA, Darren


-- 
Darren Weber, PhD
Senior Software Engineer
Jupiter Intelligence


*"Predicting risk in a changing climate"*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20210303/a4ac924c/attachment-0001.html>


More information about the gdal-dev mailing list