[gdal-dev] /vsis3/ on NetCDF from Earthdata

Even Rouault even.rouault at spatialys.com
Fri Jul 21 06:43:42 PDT 2023


Le 21/07/2023 à 11:46, b.coerver at mailbox.org a écrit :
>
> One more follow-up question:
>
> The datasets that I’m interested in contains subdatasets. I can get 
> the info of a subdataset like this:
>
>     sub_ds_path= 
> 'HDF5:"/vsis3/prod-lads/VNP02IMG/VNP02IMG.A2021064.2342.002.2021128145323.nc"://observation_data/I04'
>
>     info = gdal.Info(sub_ds_path)
>
> This works fine and finishes in a few seconds. However, when I do the 
> same thing for a different dataset (which contains the geolocation of 
> the dataset above):
>
> sub_ds_path = 
> 'HDF5:"/vsis3/prod-lads/VNP03IMG/VNP03IMG.A2021065.2324.002.2021127011303.nc"://geolocation_data/latitude'
>
> info = gdal.Info(sub_ds_path)
>
> This takes about 2.5 minutes and I can see on my network that Python 
> is downloading data at about 1MB/s the whole time. The info from this 
> subdataset contains a lot of ground-control-points, so I tried setting 
> “showGCPs=False”, but that doesn’t solve it. I’m not sure if it’s 
> really the GCPs that’s causing this (when I save the info as a json, 
> it is about 750kb in size).
>
The second product has georeferencing information, and upon opening of 
one of its subsdataset, GDAL samples the latitude and longitude arrays 
to expose ground control points, hence it reads those arrays.

> Any ideas what else can cause this difference in execution time?
>
> Regards,
>
> Bert
>
> *From: *gdal-dev <gdal-dev-bounces at lists.osgeo.org> on behalf of 
> b.coerver--- via gdal-dev <gdal-dev at lists.osgeo.org>
> *Date: *Thursday, 20 July 2023 at 11:51
> *To: *Even Rouault <even.rouault at spatialys.com>, 
> gdal-dev at lists.osgeo.org <gdal-dev at lists.osgeo.org>
> *Subject: *Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata
>
> That does it, thank you so much!
>
> *From: *Even Rouault <even.rouault at spatialys.com>
> *Date: *Thursday, 20 July 2023 at 11:44
> *To: *bcoerver at mailbox.org <b.coerver at mailbox.org>, 
> gdal-dev at lists.osgeo.org <gdal-dev at lists.osgeo.org>
> *Subject: *Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata
>
> Bert,
>
> Also set the GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR config option, 
> otherwise the generic open mechanism of GDAL tries to list the content 
> of the VNP02IMG/ directory and it seems there are tons of files there
>
> When doing that, I get a result within a few seconds
>
> Even
>
> Le 20/07/2023 à 09:59, b.coerver--- via gdal-dev a écrit :
>
>     Hello,
>
>     I'm trying to access data from NASA's Earthdata S3 buckets, but I
>     get a `"<filename> does not exist in the file system, and is not
>     recognized as a supported dataset name."` error after waiting a
>     long time (± 50 minutes, the process is downloading some data the
>     whole time) doing the following:
>
>     from osgeo import gdal
>
>     gdal_config_options = {
>
>         "AWS_ACCESS_KEY_ID": creds["accessKeyId"],
>
>         "AWS_SESSION_TOKEN": creds["sessionToken"],
>
>         "AWS_SECRET_ACCESS_KEY": creds["secretAccessKey"],
>
>         "AWS_REGION": "us-west-2",
>
>     }
>
>     url =
>     "/vsis3/prod-lads/VNP02IMG/VNP02IMG.A2023193.1942.0022023194025636.nc"
>
>     for k, v in gdal_config_options.items():
>
>         gdal.SetConfigOption(k, v)
>
>     out =gdal.Info(url)
>
>     The `creds` variable is a dictionary with temporary credential
>     information that I get from
>     [here](https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentials
>     <https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentials>),
>     you need a free account to get them.
>
>     When I introduce an error in one of the keys/tokens (e.g.
>     `"AWS_ACCESS_KEY_ID": creds["accessKeyId"] + "x"`, I do get a
>     message immediately saying my credentials are unknown. So I do
>     think they are being ingested correctly. I’m using GDAL version 3.7.1.
>
>     I also managed to download the entire file using `boto3`, by doing
>     the following:
>
>         import boto3
>
>         client = boto3.client(
>
>             's3',
>
>     aws_access_key_id=creds["accessKeyId"],
>
>     aws_secret_access_key=creds["secretAccessKey"],
>
>     aws_session_token=creds["sessionToken"]
>
>         )
>
>         client.download_file('prod-lads',
>     'VNP02IMG/VNP02IMG.A2023193.1942.002.2023194025636.nc', 'test.nc')
>
>     Any ideas what I'm doing wrong or how to make this work? In the
>     end I'm interested in accessing the files metadata without
>     downloading the entire file
>
>     Regards,
>
>     Bert
>
>     _______________________________________________
>
>     gdal-dev mailing list
>
>     gdal-dev at lists.osgeo.org
>
>     https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> -- 
> http://www.spatialys.com
> My software is free, but my time generally not.

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230721/cdb0a7cb/attachment-0001.htm>


More information about the gdal-dev mailing list