[gdal-dev] Prevent GDAL reads for .aux or any other s3 objects that do not exist

Vincent Sarago vincent.sarago at gmail.com
Wed Mar 3 16:19:58 PST 2021


Hi Darren

You can do this by setting `GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR` or by narrowing the allowed extension with "CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif"

Kyle is working on some docs that explains those env in TiTiler:
- https://github.com/developmentseed/titiler/blob/0a5288de30845865256a0124ed233f918812aa78/docs/concepts/performance_tuning.md#gdal_disable_readdir_on_open <https://github.com/developmentseed/titiler/blob/0a5288de30845865256a0124ed233f918812aa78/docs/concepts/performance_tuning.md#gdal_disable_readdir_on_open>
- https://github.com/developmentseed/titiler/blob/0a5288de30845865256a0124ed233f918812aa78/docs/concepts/performance_tuning.md#cpl_vsil_curl_allowed_extensions <https://github.com/developmentseed/titiler/blob/0a5288de30845865256a0124ed233f918812aa78/docs/concepts/performance_tuning.md#cpl_vsil_curl_allowed_extensions>

Vincent

> On Mar 3, 2021, at 6:56 PM, Darren Weber <darren.weber at jupiterintel.com> wrote:
> 
> In https://github.com/mapbox/rasterio/issues/2119#issuecomment-790024225 <https://github.com/mapbox/rasterio/issues/2119#issuecomment-790024225> I've noted how s3-COG reads can use retries to handle s3 rate throttling (503) by enabling the `GDAL_HTTP_MAX_RETRY`.  Our dataset on s3 only contains *.tif files and we need to prevent GDAL from issuing any GET/HEAD requests to try to read any other files.  Is there an env-var for this?
> 
> The `GDAL_DISABLE_READDIR_ON_OPEN` is `False` so that GDAL can read the s3 prefix to list objects in one GET request.  From that read, it should have already determined that there are only *.tif files and no *.aux* files available.  When it gets a 503 rate throttle response, however, it seems to auto-try to read some supplementary files, e.g. one log contains messages that indicate GDAL is trying to read an .aux file that does not exist and we want to prevent GDAL from issuing any GET or HEAD requests for anything other than *.tif files (objects).
> 
> [WARNING] 2021-03-03T19:37:40.312Z CPLE_AppDefined in HTTP error code: 503 - https://a-bucket.s3.amazonaws.com/geotiff.aux <https://a-bucket.s3.amazonaws.com/geotiff.aux>. Retrying again in 0.5 secs
> [WARNING] 2021-03-03T19:37:40.795Z CPLE_AppDefined in HTTP error code: 503 - https://a-bucket.s3.amazonaws.com/geotiff.aux <https://a-bucket.s3.amazonaws.com/geotiff.aux>. Retrying again in 1.0 secs
> [WARNING] 2021-03-03T19:37:41.796Z CPLE_AppDefined in HTTP error code: 503 - https://a-bucket.s3.amazonaws.com/geotiff.aux <https://a-bucket.s3.amazonaws.com/geotiff.aux>. Retrying again in 2.2 secs
> [WARNING] 2021-03-03T19:37:44.290Z CPLE_AppDefined in HTTP response code on https://a-bucket.s3.amazonaws.com/geotiff.aux <https://a-bucket.s3.amazonaws.com/geotiff.aux>: 503
> 2021-03-03 19:37:44,155 | INFO | extract_raster_points:1036 | s3://a-bucket/geotiff.tif extract cells (1)...
> 2021-03-03 19:37:44,347 | INFO | extract_raster_points:1050 | s3://a-bucket/geotiff.tif extract cells (1) done
> That log suggests that there is a huge performance hit to retry reading an .aux file that does not exist and GDAL should not even try to issue any kind of request for it (i.e. we want to prevent it).
> 
> When running some unit tests with debug curl details available (but no ability to create 503 responses), it appears that GDAL finds just the file required and reads only partial reads from that file, e.g. this is a sample of GET requests to read from one .tif s3-COG:
> 
> $ grep 'GET' tmp.log
> > GET /?delimiter=%2F&prefix=unit_tests%2Fgis%2F HTTP/1.1
> > GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> > GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> > GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> > GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> > GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> > GET /unit_tests/gis/cea_blocks.tif HTTP/1.1
> 
> TIA, Darren
> 
> 
> -- 
> Darren Weber, PhD
> Senior Software Engineer
> Jupiter Intelligence
> 
> 
> "Predicting risk in a changing climate"
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20210303/6d8f9911/attachment.html>


More information about the gdal-dev mailing list