[gdal-dev] Problem accessing NASA Cloud Optimized GeoTIFF data

Aaron Friesz amfriesz at gmail.com
Thu Sep 30 14:20:40 PDT 2021


Hi,

I was recently updating the HLS tutorial to access version 2.0 and
discovered that the tutorial is no longer able to access the HLS data in
the cloud from a local Python environment. I've been trying to troubleshoot
for a couple days now with no positive results. The failure can be boiled
down to just accessing an HLS COG file using rasterio in Python (note, a
.netrc file with Earthdata login credentials is needed to access data).

```
import rasterio as rio
import os

s30_url = '
https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TEK.2021261T175011.v2.0/HLS.S30.T13TEK.2021261T175011.v2.0.B04.tif
'

rio_env = rio.Env(GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR',
                  #CPL_VSIL_CURL_ALLOWED_EXTENSIONS='tif',
                  #CPL_VSIL_CURL_USE_HEAD='FALSE',
                  CPL_DEBUG='ON',
                  CPL_CURL_VERBOSE='ON',
                  GDAL_HTTP_UNSAFESSL='YES',
                  GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/.edl_cookies'),
                  GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/.edl_cookies'))

rio_env.__enter__()

ds = rio.open(s30_url)

```
While running a local python env (via conda) the result of the rio.open()
is a '206 status', which essentially fails the command. If I add
CPL_VSIL_CURL_USE_HEAD='FALSE' to the rio environment, I get a 'file format
not supported' error, which seems to be a red herring. I'm curious if
you've run into this type of thing before with COGs in the cloud.

The same behavior can be observed when just using the gdal command line
utilities.
```
#without --config CPL_VSIL_CURL_USE_HEAD FALSE
> gdalinfo /vsicurl/
https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10TEK.2021192T184511.v2.0/HLS.L30.T10TEK.2021192T184511.v2.0.B04.tif
--config GDAL_HTTP_COOKIEFILE ~/cookies.txt --config GDAL_HTTP_COOKIEJAR
~/cookies.txt --config GDAL_DISABLE_READDIR_ON_OPEN EMPTY_DIR --config
CPL_CURL_VERBOSE ON

...
#output
* Mark bundle as not supporting multiuse
< HTTP/1.1 206 Partial Content
< Content-Type: image/tiff
< Content-Length: 1
< Connection: keep-alive
< Date: Thu, 30 Sep 2021 20:52:14 GMT
< Last-Modified: Tue, 20 Jul 2021 20:27:33 GMT
< ETag: "85d0b4b089083135d26188a51e187788-1"
< x-amz-server-side-encryption: AES256
< Accept-Ranges: bytes
< Content-Range: bytes 0-0/14178318
< Server: AmazonS3
< X-Edge-Origin-Shield-Skipped: 0
< X-Cache: Miss from cloudfront
< Via: 1.1 130ce7c752c5865952ded89032560b33.cloudfront.net (CloudFront)
< X-Amz-Cf-Pop: MIA3-C3
< X-Amz-Cf-Id: qEq_2x3pF25dN5YMclkgRwhyTozXDexPdaQwOyJaI2VvIKhNnajozQ==
<
* Connection #1 to host d1nklfio7vscoe.cloudfront.net left intact
ERROR 11: HTTP response code: 206
gdalinfo failed - unable to open '/vsicurl/
https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10TEK.2021192T184511.v2.0/HLS.L30.T10TEK.2021192T184511.v2.0.B04.tif
'.
```

```
#with --config CPL_VSIL_CURL_USE_HEAD FALSE
> gdalinfo /vsicurl/
https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10TEK.2021192T184511.v2.0/HLS.L30.T10TEK.2021192T184511.v2.0.B04.tif
--config GDAL_HTTP_COOKIEFILE ~/cookies.txt --config GDAL_HTTP_COOKIEJAR
~/cookies.txt --config GDAL_DISABLE_READDIR_ON_OPEN EMPTY_DIR  --config
CPL_VSIL_CURL_USE_HEAD FALSE --config CPL_CURL_VERBOSE ON

...
#output
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: image/tiff
< Content-Length: 14178318
< Connection: keep-alive
< Date: Thu, 30 Sep 2021 21:12:52 GMT
< Last-Modified: Tue, 20 Jul 2021 20:27:33 GMT
< ETag: "85d0b4b089083135d26188a51e187788-1"
< x-amz-server-side-encryption: AES256
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Edge-Origin-Shield-Skipped: 0
< X-Cache: Miss from cloudfront
< Via: 1.1 8a771ca27e5a3c9e06b12b7af5d25aa4.cloudfront.net (CloudFront)
< X-Amz-Cf-Pop: MIA3-C3
< X-Amz-Cf-Id: Xzv3KWP6glYk68odo5d4KAPnzxsNUFjHRHe90EFXOKNGeYL3ea7jiA==
* Failed writing header
* Closing connection 2
* schannel: shutting down SSL/TLS connection with
d1nklfio7vscoe.cloudfront.net port 443
* WARNING: failed to save cookies in ~/cookies.txt: Failed writing received
data to disk/application
ERROR 4: `/vsicurl/
https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10TEK.2021192T184511.v2.0/HLS.L30.T10TEK.2021192T184511.v2.0.B04.tif'
not recognized as a supported file format.
gdalinfo failed - unable to open '/vsicurl/
https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10TEK.2021192T184511.v2.0/HLS.L30.T10TEK.2021192T184511.v2.0.B04.tif
'.
```

Strangely, I'm able to use the original rio/gdal configuration options if
I'm running the code in a cloud environment, that is:

> gdalinfo /vsicurl/
https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10TEK.2021192T184511.v2.0/HLS.L30.T10TEK.2021192T184511.v2.0.B04.tif
--config GDAL_HTTP_COOKIEFILE ~/cookies.txt --config GDAL_HTTP_COOKIEJAR
~/cookies.txt --config GDAL_DISABLE_READDIR_ON_OPEN EMPTY_DIR --config
CPL_CURL_VERBOSE ON

...works fine in the USGS Pangeo and Openscapes 2i2c instance in us-west2.
I get the failures on both Windows and MacOS, but my colleagues seem to be
able to run the code just fine from linux within a Docker container.

I've tried the gdalinfo commands using the gdal versions 3.1.4 through
3.2.2. Oddly, v3.2.0 succeeds in printing out the gdalinfo content, but
I've been unable to create a conda environment with that version of gdal.

I'm talking to our NASA cloud team as well to see if there was anything on
the backend that changed recently just to cover that angle. I'm hoping,
though, that I'm missing something obvious in the local configurations.

Any guidance would be appreciated.

-Aaron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20210930/cf0b3621/attachment.html>


More information about the gdal-dev mailing list