[gdal-dev] Issue with GDAL VSICURL Passing Authorization Header to Redirected Pre-signed S3 URLs
Pradeep kumar
parthivpradeep at gmail.com
Sat Sep 21 22:37:07 PDT 2024
Dear GDAL Developers,
I hope this message finds you well.
I am experiencing an issue with GDAL’s VSICURL virtual file system where
the Authorization header is being passed to redirected pre-signed S3 URLs,
leading to errors when accessing AWS S3.
*Background:*
In my use case, I have a proxy URL that, when called with an Authorization
header, validates the token, verifies authorization, and then generates a
pre-signed URL to an S3 object. The proxy then redirects the client to this
pre-signed URL. When using GDAL with this reverse proxy server and setting
the configuration option --config CPL_VSIL_CURL_USE_S3_REDIRECT NO,
everything works correctly. However, the issue is that GDAL makes too many
round trips, and each time I end up generating a new pre-signed URL.
Ideally, I would like GDAL to send the initial request and then reuse the
pre-signed URLs for subsequent requests.
This desired behavior is achieved with --config
CPL_VSIL_CURL_USE_S3_REDIRECT YES. However, the problem now is that GDAL,
on subsequent requests to the pre-signed URL, is also passing the
Authorization header. I am using the configuration --config
GDAL_HTTP_HEADERS "Authorization: Bearer xxxx" to set the initial header,
so the Authorization header is being sent with every request.
When the Authorization header is sent to AWS S3 along with the pre-signed
URL, AWS returns the following error:
```Only one auth mechanism allowed; only the X-Amz-Algorithm query
parameter, Signature query string parameter or the Authorization header
should be specified.```
*Observed Behavior:*
• *First Request (302 OK):* GDAL sends a request to the proxy URL with the
Authorization header. The proxy validates the token and redirects to the
pre-signed S3 URL. GDAL follows the redirect, and since the pre-signed URL
is accessed without the Authorization header, AWS S3 responds with a *200
OK*. This behavior is as expected.
• *Subsequent Requests (400 Bad Request):* GDAL reuses the pre-signed URL
but includes the Authorization header in the request. AWS S3, seeing both
the pre-signed URL’s query parameters and the Authorization header, returns
a *400 Bad Request* error, stating that only one authentication mechanism
is allowed.
*Expected Behavior:*
I expect that GDAL’s VSICURL should send the initial request with the
Authorization header to the proxy URL. Upon receiving the redirect to the
pre-signed URL, it should not include the Authorization header in
subsequent requests to AWS S3. This would allow AWS S3 to accept the
pre-signed URL without conflicts.
*Questions:*
• Is there a way to configure GDAL so that it does not pass the
Authorization header to the redirected pre-signed URLs while retaining it
for the initial request?
• If this feature is not currently available, would it be feasible to
implement such functionality?
• Are there any plans to address the handling of HTTP redirect response
codes 301 and 307 in future GDAL releases to better support this use case?
*GDAL CLI Command:*
*``` *gdalinfo --debug on --config CPL_CURL_VERBOSE YES --config
GDAL_DISABLE_READDIR_ON_OPEN EMPTY_DIR --config
CPL_VSIL_CURL_USE_S3_REDIRECT YES --config
GDAL_HTTP_HEADERS="Authorization: Bearer xxxx" /vsicurl/
https://example.com/stac/collections/items/assets?path=s3://your-bucket/red.tif
```
*Example GDAL Logs:*
Below are the GDAL logs illustrating the issue (sensitive information has
been redacted):
*First Request (200 OK):*
*```*
HTTP: libcurl/8.7.1 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12
nghttp2/1.61.0
HTTP: GDAL was built against curl 8.4.0, but is running against 8.7.1.
CURL_INFO_TEXT: [HTTP/2] [1] OPENED stream for
https://example.com/stac/collections/items/assets?path=s3://your-bucket/red.tif
CURL_INFO_TEXT: [HTTP/2] [1] [:method: HEAD]
CURL_INFO_TEXT: [HTTP/2] [1] [:scheme: https]
CURL_INFO_TEXT: [HTTP/2] [1] [:authority: example.com]
CURL_INFO_TEXT: [HTTP/2] [1] [:path:
/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif]
CURL_INFO_TEXT: [HTTP/2] [1] [user-agent: GDAL/3.9.2]
CURL_INFO_TEXT: [HTTP/2] [1] [accept: */*]
CURL_INFO_TEXT: [HTTP/2] [1] [authorization: Bearer [REDACTED]]
CURL_INFO_HEADER_OUT: HEAD
/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif HTTP/2
Host: example.com
User-Agent: GDAL/3.9.2
Accept: */*
Authorization: Bearer [REDACTED]
CURL_INFO_TEXT: Request completely sent off
CURL_INFO_HEADER_IN: HTTP/2 301
CURL_INFO_HEADER_IN: date: Sun, 22 Sep 2024 04:44:28 GMT
CURL_INFO_HEADER_IN: content-type: text/plain; charset=utf-8
CURL_INFO_HEADER_IN: content-length: 43
CURL_INFO_HEADER_IN: location:
https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
CURL_INFO_HEADER_IN: x-content-length: 176995703
CURL_INFO_HEADER_IN: apigw-requestid: [REDACTED]
CURL_INFO_HEADER_IN:
CURL_INFO_TEXT: Ignoring the response-body
CURL_INFO_TEXT: Connection #0 to host example.com left intact
CURL_INFO_TEXT: Issue another request to this URL: '
https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
'
CURL_INFO_TEXT: Couldn't find host s3-bucket-placeholder in the .netrc
file; using defaults
CURL_INFO_TEXT: Host s3-bucket-placeholder:443 was resolved.
CURL_INFO_TEXT: Trying [REDACTED]...
CURL_INFO_TEXT: Connected to s3-bucket-placeholder ([REDACTED]) port 443
CURL_INFO_TEXT: SSL connection using TLSv1.3 / [REDACTED]
CURL_INFO_TEXT: Server certificate:
CURL_INFO_TEXT: subject: CN=*.s3.amazonaws.com
CURL_INFO_TEXT: start date: Apr 22 00:00:00 2024 GMT
CURL_INFO_TEXT: expire date: Apr 7 23:59:59 2025 GMT
CURL_INFO_TEXT: issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
CURL_INFO_TEXT: SSL certificate verify ok.
CURL_INFO_HEADER_OUT: HEAD
/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
HTTP/1.1
Host: s3-bucket-placeholder
User-Agent: GDAL/3.9.2
Accept: */*
CURL_INFO_TEXT: Request completely sent off
CURL_INFO_HEADER_IN: HTTP/1.1 200 OK
CURL_INFO_HEADER_IN: x-amz-id-2: [REDACTED]
CURL_INFO_HEADER_IN: x-amz-request-id: [REDACTED]
CURL_INFO_HEADER_IN: Date: Sun, 22 Sep 2024 04:44:29 GMT
CURL_INFO_HEADER_IN: Last-Modified: Tue, 17 Sep 2024 17:44:40 GMT
CURL_INFO_HEADER_IN: ETag: "[REDACTED]"
CURL_INFO_HEADER_IN: x-amz-server-side-encryption: AES256
CURL_INFO_HEADER_IN: Accept-Ranges: bytes
CURL_INFO_HEADER_IN: Content-Type: image/tiff
CURL_INFO_HEADER_IN: Server: AmazonS3
CURL_INFO_HEADER_IN: Content-Length: 176995703
CURL_INFO_HEADER_IN:
CURL_INFO_TEXT: Connection #1 to host s3-bucket-placeholder left intact
VSICURL: Effective URL:
https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
VSICURL: Will use redirect URL for the next 3599 seconds
VSICURL: GetFileSize(
https://example.com/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif)=176995703
response_code=200
```
*Subsequent Request (400 Bad Request):*
*```*
VSICURL: Using redirect URL as it looks to be still valid (3599 seconds
left)
VSICURL: Downloading 0-16383 (
https://s3-bucket-placeholder/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868).
..
CURL_INFO_TEXT: Couldn't find host s3-bucket-placeholder in the .netrc
file; using defaults
CURL_INFO_TEXT: Found bundle for host: 0x600001aa8270 [serially]
CURL_INFO_TEXT: Can not multiplex, even if we wanted to
CURL_INFO_TEXT: Re-using existing connection with host s3-bucket-placeholder
CURL_INFO_HEADER_OUT: GET
/red.tif?response-content-type=image%2Ftiff&AWSAccessKeyId=[REDACTED]&Signature=[REDACTED]&x-amz-security-token=[REDACTED]&Expires=1726983868
HTTP/1.1
Host: s3-bucket-placeholder
User-Agent: GDAL/3.9.2
Accept: */*
Authorization: Bearer [REDACTED]
Range: bytes=0-16383
CURL_INFO_TEXT: Request completely sent off
CURL_INFO_HEADER_IN: HTTP/1.1 400 Bad Request
CURL_INFO_HEADER_IN: x-amz-request-id: [REDACTED]
CURL_INFO_HEADER_IN: x-amz-id-2: [REDACTED]
CURL_INFO_HEADER_IN: Content-Type: application/xml
CURL_INFO_HEADER_IN: Transfer-Encoding: chunked
CURL_INFO_HEADER_IN: Date: Sun, 22 Sep 2024 04:44:28 GMT
CURL_INFO_HEADER_IN: Server: AmazonS3
CURL_INFO_HEADER_IN: Connection: close
CURL_INFO_HEADER_IN:
CURL_INFO_TEXT: Closing connection
VSICURL: Got response_code=400
ERROR 4: `/vsicurl/
https://example.com/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif'
not recognized as being in a supported file format.
gdalinfo failed - unable to open '/vsicurl/
https://example.com/stac/collections/items/assets/foo?path=s3://your-bucket/red.tif
'.
```
*Additional Information:*
If you require any further details or clarification, please let me know. I
would be happy to provide more information. Additionally, if necessary, I
can create an issue on the GDAL GitHub repository to track this problem.
Thank you very much for your time and assistance. I appreciate any guidance
or suggestions you can provide to help resolve this issue.
Best Regards,
Pradeep Gulla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240921/e6711ffb/attachment-0001.htm>
More information about the gdal-dev
mailing list