[gdal-dev] Ignore content-length in vsicurl?
Even Rouault
even.rouault at spatialys.com
Wed Sep 11 13:47:55 PDT 2024
The server seems to have been fixed, but on reflection, I won't merge
the workaround PR as this is (thanksfully) quite an unusual case. At
least the PR is there if someone badly needed to patch their build with
such a workaround.
Le 10/09/2024 à 17:11, thomas bonfort a écrit :
> I'm not sure that providing a fix to work around this very broken
> behavior is the best way of action to make them fix their server...
>
> On Tue, Sep 10, 2024 at 5:07 PM Even Rouault via gdal-dev
> <gdal-dev at lists.osgeo.org> wrote:
>
>
> Le 10/09/2024 à 16:10, Rahkonen Jukka via gdal-dev a écrit :
>>
>> Hi,
>>
>> Have you tried with configuration option
>> “CPL_VSIL_CURL_USE_HEAD=[YES/NO]: Defaults to YES. Controls
>> whether to use a HEAD request when opening a remote URL.”
>>
> I was just going to suggest that too. It "works", but not really.
> It just postpones the core issue: the server doesn't support GET
> Range requests, so can't be used with /vsicurl/
>
> As it has a COG organization with overview data first in the file,
> If you want to read the smallest overview(s), you can use
> /vsicurl_streaming/ instead, but that won't be efficient to read
> the bottom-right most tile of the full resoultion late, which will
> require reading the whole file...
>
> Nothing GDAL can do about that.
>
> Actually... digging further... it somehow supports Range requests,
> but in what I believe a non-compliant way. It does return the
> expected content, but returns HTTP 200 and not HTTP 206 (Partial
> content). And it never returns the Content-Length header.
>
> Well, I've implemented a workaround in
> https://github.com/OSGeo/gdal/pull/10760 that might be useful in
> other similar cases too.
>
> With that, the following works:
>
> |gdal_translate
> "/vsicurl?file_size=unlimited&url=https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif"
> --config GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR out.tif -srcwin
> 5000 5000 50 50|
>
> file_size=unlimited works here since the GTiff driver doesn't
> really need to have the right file size, it will just check we
> don't try to read beyond at some points, so unlimited is OK. In
> other situations/drivers, the exact value could be needed.
>
> But they should really fix their servers
>
> Even
>
>> -Jukka Rahkonen-
>>
>> *Lähettäjä:* gdal-dev <gdal-dev-bounces at lists.osgeo.org>
>> <mailto:gdal-dev-bounces at lists.osgeo.org> *Puolesta *Daniel Evans
>> via gdal-dev
>> *Lähetetty:* tiistai 10. syyskuuta 2024 16.57
>> *Vastaanottaja:* 'gdal-dev at lists.osgeo.org'
>> (gdal-dev at lists.osgeo.org) <gdal-dev at lists.osgeo.org>
>> <mailto:gdal-dev at lists.osgeo.org>
>> *Aihe:* [gdal-dev] Ignore content-length in vsicurl?
>>
>> Hi all,
>>
>> I am attempting to read a dataset via /vsicurl/ where I believe
>> the server is incorrectly returning `content-length: 0` in
>> response to HEAD requests. This causes GDAL to believe it's a
>> zero-length file, and it therefore can't be read.
>>
>> If I download the file via HTTP GET, it's valid, and GDAL can
>> read it locally. I've also confirmed I can use /vsicurl/ on some
>> test datasets in the GDAL repo.
>>
>> Is it possible to force GDAL to work around the faulty
>> content-length header, or is it too fundamental a problem to ignore?
>>
>> I've separately got in touch with the data provider to see if
>> they are able to fix the issue at their end.
>>
>> Cheers,
>>
>> Daniel
>>
>> URL of the troublesome dataset:
>>
>> https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif
>>
>> Example HTTP header responses I'm seeing:
>>
>> GET
>>
>> HTTP/2 200
>> date: Tue, 10 Sep 2024 13:47:54 GMT
>> content-type: binary/octet-stream
>> content-length: 278198294
>> vary: Origin, Access-Control-Request-Method,
>> Access-Control-Request-Headers
>> etag: "a79f3f685281d6681e4d362536c5b3eb-34"
>> last-modified: Thu, 25 Jul 2024 13:16:08 GMT
>> x-version: 0.0.16
>> access-control-allow-credentials: true
>>
>> HEAD
>>
>> HTTP/2 200
>> date: Tue, 10 Sep 2024 13:48:08 GMT
>> content-type: binary/octet-stream
>> content-length: 0
>> x-version: 0.0.16
>> access-control-allow-credentials: true
>> etag: "a79f3f685281d6681e4d362536c5b3eb-34"
>> last-modified: Thu, 25 Jul 2024 13:16:08 GMT
>> vary: Origin, Access-Control-Request-Method,
>> Access-Control-Request-Headers
>>
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
--
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240911/262f4227/attachment-0001.htm>
More information about the gdal-dev
mailing list