[gdal-dev] Ignore content-length in vsicurl?

Even Rouault even.rouault at spatialys.com
Wed Sep 11 13:47:55 PDT 2024


The server seems to have been fixed, but on reflection, I won't merge 
the workaround PR as this is (thanksfully) quite an unusual case. At 
least the PR is there if someone badly needed to patch their build with 
such a workaround.

Le 10/09/2024 à 17:11, thomas bonfort a écrit :
> I'm not sure that providing a fix to work around this very broken 
> behavior is the best way of action to make them fix their server...
>
> On Tue, Sep 10, 2024 at 5:07 PM Even Rouault via gdal-dev 
> <gdal-dev at lists.osgeo.org> wrote:
>
>
>     Le 10/09/2024 à 16:10, Rahkonen Jukka via gdal-dev a écrit :
>>
>>     Hi,
>>
>>     Have you tried with configuration option
>>     “CPL_VSIL_CURL_USE_HEAD=[YES/NO]: Defaults to YES. Controls
>>     whether to use a HEAD request when opening a remote URL.”
>>
>     I was just going to suggest that too. It "works", but not really.
>     It just postpones the core issue: the server doesn't support GET
>     Range requests, so can't be used with /vsicurl/
>
>     As it has a COG organization with overview data first in the file,
>     If you want to read the smallest overview(s), you can use
>     /vsicurl_streaming/ instead, but that won't be efficient to read
>     the bottom-right most tile of the full resoultion late, which will
>     require reading the whole file...
>
>     Nothing GDAL can do about that.
>
>     Actually... digging further... it somehow supports Range requests,
>     but in what I believe a non-compliant way. It does return the
>     expected content, but returns HTTP 200 and not HTTP 206 (Partial
>     content). And it never returns the Content-Length header.
>
>     Well, I've implemented a workaround in
>     https://github.com/OSGeo/gdal/pull/10760 that might be useful in
>     other similar cases too.
>
>     With that, the following works:
>
>     |gdal_translate
>     "/vsicurl?file_size=unlimited&url=https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif"
>     --config GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR out.tif -srcwin
>     5000 5000 50 50|
>
>     file_size=unlimited works here since the GTiff driver doesn't
>     really need to have the right file size, it will just check we
>     don't try to read beyond at some points, so unlimited is OK. In
>     other situations/drivers, the exact value could be needed.
>
>     But they should really fix their servers
>
>     Even
>
>>     -Jukka Rahkonen-
>>
>>     *Lähettäjä:* gdal-dev <gdal-dev-bounces at lists.osgeo.org>
>>     <mailto:gdal-dev-bounces at lists.osgeo.org> *Puolesta *Daniel Evans
>>     via gdal-dev
>>     *Lähetetty:* tiistai 10. syyskuuta 2024 16.57
>>     *Vastaanottaja:* 'gdal-dev at lists.osgeo.org'
>>     (gdal-dev at lists.osgeo.org) <gdal-dev at lists.osgeo.org>
>>     <mailto:gdal-dev at lists.osgeo.org>
>>     *Aihe:* [gdal-dev] Ignore content-length in vsicurl?
>>
>>     Hi all,
>>
>>     I am attempting to read a dataset via /vsicurl/ where I believe
>>     the server is incorrectly returning `content-length: 0` in
>>     response to HEAD requests. This causes GDAL to believe it's a
>>     zero-length file, and it therefore can't be read.
>>
>>     If I download the file via HTTP GET, it's valid, and GDAL can
>>     read it locally. I've also confirmed I can use /vsicurl/ on some
>>     test datasets in the GDAL repo.
>>
>>     Is it possible to force GDAL to work around the faulty
>>     content-length header, or is it too fundamental a problem to ignore?
>>
>>     I've separately got in touch with the data provider to see if
>>     they are able to fix the issue at their end.
>>
>>     Cheers,
>>
>>     Daniel
>>
>>     URL of the troublesome dataset:
>>
>>     https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif
>>
>>     Example HTTP header responses I'm seeing:
>>
>>     GET
>>
>>     HTTP/2 200
>>     date: Tue, 10 Sep 2024 13:47:54 GMT
>>     content-type: binary/octet-stream
>>     content-length: 278198294
>>     vary: Origin, Access-Control-Request-Method,
>>     Access-Control-Request-Headers
>>     etag: "a79f3f685281d6681e4d362536c5b3eb-34"
>>     last-modified: Thu, 25 Jul 2024 13:16:08 GMT
>>     x-version: 0.0.16
>>     access-control-allow-credentials: true
>>
>>     HEAD
>>
>>     HTTP/2 200
>>     date: Tue, 10 Sep 2024 13:48:08 GMT
>>     content-type: binary/octet-stream
>>     content-length: 0
>>     x-version: 0.0.16
>>     access-control-allow-credentials: true
>>     etag: "a79f3f685281d6681e4d362536c5b3eb-34"
>>     last-modified: Thu, 25 Jul 2024 13:16:08 GMT
>>     vary: Origin, Access-Control-Request-Method,
>>     Access-Control-Request-Headers
>>
>>
>>     _______________________________________________
>>     gdal-dev mailing list
>>     gdal-dev at lists.osgeo.org
>>     https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
>     -- 
>     http://www.spatialys.com
>     My software is free, but my time generally not.
>
>     _______________________________________________
>     gdal-dev mailing list
>     gdal-dev at lists.osgeo.org
>     https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240911/262f4227/attachment-0001.htm>


More information about the gdal-dev mailing list