[gdal-dev] GRIB file being scanned despite .idx being present

Daniel Evans daniel.fred.evans at gmail.com
Fri Oct 10 03:47:02 PDT 2025


Hmm, yes - I see it jumping straight to the relevant band when run via a
locally compiled GDAL 3.11.4 using your code, but when using rasterio built
on top of that same GDAL 3.11.4, it's paging through the whole file. Seems
like there's something with how my Python environment/code configures
things when using rasterio, or something that rasterio configures, that is
modifying the behaviour.

Thoughts on where to look welcome, but it doesn't appear to be a GDAL-level
problem.

Cheers,
Daniel

On Thu, 9 Oct 2025 at 16:36, Daniel Baston <dbaston at gmail.com> wrote:

> FWIW, the following snippet is working with gdal master:
>
> from osgeo import gdal
>
> with gdal.config_options({"AWS_NO_SIGN_REQUEST":"True",
> "CPL_DEBUG":"True", "CPL_CURL_VERBOSE":"True"}):
>     ds =
> gdal.Open("/vsis3/noaa-gfs-bdp-pds/gfs.20250918/00/atmos/gfs.t00z.pgrb2.0p25.f012")
>     band = ds.GetRasterBand(636)
>     x = band.ReadAsArray()
>     print(x.mean())
>
> Dan
>
> On Thu, Oct 9, 2025 at 10:27 AM Daniel Evans via gdal-dev <
> gdal-dev at lists.osgeo.org> wrote:
>
>> Hi all,
>>
>> I am attempting to read a single band from a NOAA GRIB2 file on S3, with
>> an associated .idx file. Reading the GRIB2 driver documentation, it is
>> stated that the existence of such an idx file allows a file to be opened
>> without reading all bands.
>>
>> However, looking at the CPL_CURL_VERBOSE=True logs, it appears that GDAL
>> is still paging through the file from the start until reaching the
>> requested band.
>>
>> GDAL identifies the existence of the .idx file:
>>
>> DEBUG:CPLE_None in GRIB: Reading inventories from sidecar file
>> /vsis3/noaa-gfs-bdp-pds/gfs.20250918/00/atmos/gfs.t00z.pgrb2.0p25.f012.idx
>> DEBUG:CPLE_None in S3: Downloading 0-41215 (
>> https://s3.amazonaws.com/noaa-gfs-bdp-pds/gfs.20250918/00/atmos/gfs.t00z.pgrb2.0p25.f012.idx).
>> ..
>>
>> But it then appears to scan the file from the start until it has passed
>> the requested band:
>>
>> DEBUG:CPLE_None in S3: Downloading 16384-999423 (
>> https://s3.amazonaws.com/noaa-gfs-bdp-pds/gfs.20250918/00/atmos/gfs.t00z.pgrb2.0p25.f012).
>> ..
>> DEBUG:CPLE_None in S3: Downloading 999424-2965503 (
>> https://s3.amazonaws.com/noaa-gfs-bdp-pds/gfs.20250918/00/atmos/gfs.t00z.pgrb2.0p25.f012).
>> ..
>> DEBUG:CPLE_None in S3: Downloading 2965504-6897663 (
>> https://s3.amazonaws.com/noaa-gfs-bdp-pds/gfs.20250918/00/atmos/gfs.t00z.pgrb2.0p25.f012).
>> ..
>> [...]
>> DEBUG:S3: Downloading 449626112-450461695 (
>> https://s3.amazonaws.com/noaa-gfs-bdp-pds/gfs.20250918/00/atmos/gfs.t00z.pgrb2.0p25.f012).
>> ..
>>
>> Band 636 is listed in the .idx with offset 443333308, Band 637 having
>> offset 444174665. The total filesize is 545533166.
>>
>>
>> Do I need to do something extra to trigger GDAL to read only the
>> requested band based on the .idx? Are some GRIB/.idx files not able to be
>> loaded in this way?
>>
>> I am running via rasterio v1.4.3 which is using GDAL v3.9.3. My code is
>> below, the file is in a public NOAA-hosted bucket.
>>
>> Cheers,
>> Daniel
>>
>> ###
>>
>> import logging
>> import rasterio
>>
>> logging.basicConfig(format="%(levelname)s:%(message)s",
>> level=logging.DEBUG)
>>
>> with rasterio.Env(USE_IDX=True, AWS_VIRTUAL_HOSTING=False,
>> CPL_DEBUG=True, CPL_CURL_VERBOSE=True):
>>     with
>> rasterio.open("s3://noaa-gfs-bdp-pds/gfs.20250918/00/atmos/gfs.t00z.pgrb2.0p25.f012")
>> as ds:
>>         band = ds.read(636)
>>
>> ###
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20251010/c6fd4ecb/attachment-0001.htm>


More information about the gdal-dev mailing list