[gdal-dev] Ignore content-length in vsicurl?

Daniel Evans daniel.fred.evans at gmail.com
Tue Sep 10 08:34:20 PDT 2024


To partially appease the crowd, the data provider has since acknowledged
the issue on their end and are working on a fix - thankfully not one of
those providers that take a month to respond with a shrug.

Cheers,
Daniel



On Tue, 10 Sept 2024 at 16:11, thomas bonfort <thomas.bonfort at gmail.com>
wrote:

> I'm not sure that providing a fix to work around this very broken behavior
> is the best way of action to make them fix their server...
>
> On Tue, Sep 10, 2024 at 5:07 PM Even Rouault via gdal-dev <
> gdal-dev at lists.osgeo.org> wrote:
>
>>
>> Le 10/09/2024 à 16:10, Rahkonen Jukka via gdal-dev a écrit :
>>
>> Hi,
>>
>>
>>
>> Have you tried with configuration option
>> “CPL_VSIL_CURL_USE_HEAD=[YES/NO]: Defaults to YES. Controls whether to use
>> a HEAD request when opening a remote URL.”
>>
>> I was just going to suggest that too. It "works", but not really. It just
>> postpones the core issue: the server doesn't support GET Range requests, so
>> can't be used with /vsicurl/
>>
>> As it has a COG organization with overview data first in the file, If you
>> want to read the smallest overview(s), you can use /vsicurl_streaming/
>> instead, but that won't be efficient to read the bottom-right most tile of
>> the full resoultion late, which will require reading the whole file...
>>
>> Nothing GDAL can do about that.
>>
>> Actually... digging further... it somehow supports Range requests, but in
>> what I believe a non-compliant way. It does return the expected content,
>> but returns HTTP 200 and not HTTP 206 (Partial content). And it never
>> returns the Content-Length header.
>>
>> Well, I've implemented a workaround in
>> https://github.com/OSGeo/gdal/pull/10760 that might be useful in other
>> similar cases too.
>>
>> With that, the following works:
>>
>> gdal_translate "/vsicurl?file_size=unlimited&url=https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif" --config GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR out.tif -srcwin 5000 5000 50 50
>>
>> file_size=unlimited works here since the GTiff driver doesn't really need
>> to have the right file size, it will just check we don't try to read beyond
>> at some points, so unlimited is OK. In other situations/drivers, the exact
>> value could be needed.
>>
>> But they should really fix their servers
>>
>> Even
>>
>>
>>
>> -Jukka Rahkonen-
>>
>>
>>
>> *Lähettäjä:* gdal-dev <gdal-dev-bounces at lists.osgeo.org>
>> <gdal-dev-bounces at lists.osgeo.org> *Puolesta *Daniel Evans via gdal-dev
>> *Lähetetty:* tiistai 10. syyskuuta 2024 16.57
>> *Vastaanottaja:* 'gdal-dev at lists.osgeo.org' (gdal-dev at lists.osgeo.org)
>> <gdal-dev at lists.osgeo.org> <gdal-dev at lists.osgeo.org>
>> *Aihe:* [gdal-dev] Ignore content-length in vsicurl?
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I am attempting to read a dataset via /vsicurl/ where I believe the
>> server is incorrectly returning `content-length: 0` in response to HEAD
>> requests. This causes GDAL to believe it's a zero-length file, and it
>> therefore can't be read.
>>
>>
>>
>> If I download the file via HTTP GET, it's valid, and GDAL can read it
>> locally. I've also confirmed I can use /vsicurl/ on some test datasets in
>> the GDAL repo.
>>
>>
>>
>> Is it possible to force GDAL to work around the faulty content-length
>> header, or is it too fundamental a problem to ignore?
>>
>>
>>
>> I've separately got in touch with the data provider to see if they are
>> able to fix the issue at their end.
>>
>>
>>
>> Cheers,
>>
>> Daniel
>>
>>
>>
>>
>>
>> URL of the troublesome dataset:
>>
>>
>> https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif
>>
>>
>>
>>
>>
>> Example HTTP header responses I'm seeing:
>>
>>
>>
>> GET
>>
>>
>>
>> HTTP/2 200
>> date: Tue, 10 Sep 2024 13:47:54 GMT
>> content-type: binary/octet-stream
>> content-length: 278198294
>> vary: Origin, Access-Control-Request-Method,
>> Access-Control-Request-Headers
>> etag: "a79f3f685281d6681e4d362536c5b3eb-34"
>> last-modified: Thu, 25 Jul 2024 13:16:08 GMT
>> x-version: 0.0.16
>> access-control-allow-credentials: true
>>
>>
>>
>> HEAD
>>
>>
>>
>> HTTP/2 200
>> date: Tue, 10 Sep 2024 13:48:08 GMT
>> content-type: binary/octet-stream
>> content-length: 0
>> x-version: 0.0.16
>> access-control-allow-credentials: true
>> etag: "a79f3f685281d6681e4d362536c5b3eb-34"
>> last-modified: Thu, 25 Jul 2024 13:16:08 GMT
>> vary: Origin, Access-Control-Request-Method,
>> Access-Control-Request-Headers
>>
>> _______________________________________________
>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>> -- http://www.spatialys.com
>> My software is free, but my time generally not.
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240910/59e4b9b1/attachment-0001.htm>


More information about the gdal-dev mailing list