[gdal-dev] Ignore content-length in vsicurl?

Even Rouault even.rouault at spatialys.com
Tue Sep 10 08:05:11 PDT 2024


Le 10/09/2024 à 16:10, Rahkonen Jukka via gdal-dev a écrit :
>
> Hi,
>
> Have you tried with configuration option 
> “CPL_VSIL_CURL_USE_HEAD=[YES/NO]: Defaults to YES. Controls whether to 
> use a HEAD request when opening a remote URL.”
>
I was just going to suggest that too. It "works", but not really. It 
just postpones the core issue: the server doesn't support GET Range 
requests, so can't be used with /vsicurl/

As it has a COG organization with overview data first in the file, If 
you want to read the smallest overview(s), you can use 
/vsicurl_streaming/ instead, but that won't be efficient to read the 
bottom-right most tile of the full resoultion late, which will require 
reading the whole file...

Nothing GDAL can do about that.

Actually... digging further... it somehow supports Range requests, but 
in what I believe a non-compliant way. It does return the expected 
content, but returns HTTP 200 and not HTTP 206 (Partial content). And it 
never returns the Content-Length header.

Well, I've implemented a workaround in 
https://github.com/OSGeo/gdal/pull/10760 that might be useful in other 
similar cases too.

With that, the following works:

|gdal_translate 
"/vsicurl?file_size=unlimited&url=https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif" 
--config GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR out.tif -srcwin 5000 
5000 50 50|

file_size=unlimited works here since the GTiff driver doesn't really 
need to have the right file size, it will just check we don't try to 
read beyond at some points, so unlimited is OK. In other 
situations/drivers, the exact value could be needed.

But they should really fix their servers

Even

> -Jukka Rahkonen-
>
> *Lähettäjä:* gdal-dev <gdal-dev-bounces at lists.osgeo.org> *Puolesta 
> *Daniel Evans via gdal-dev
> *Lähetetty:* tiistai 10. syyskuuta 2024 16.57
> *Vastaanottaja:* 'gdal-dev at lists.osgeo.org' (gdal-dev at lists.osgeo.org) 
> <gdal-dev at lists.osgeo.org>
> *Aihe:* [gdal-dev] Ignore content-length in vsicurl?
>
> Hi all,
>
> I am attempting to read a dataset via /vsicurl/ where I believe the 
> server is incorrectly returning `content-length: 0` in response to 
> HEAD requests. This causes GDAL to believe it's a zero-length file, 
> and it therefore can't be read.
>
> If I download the file via HTTP GET, it's valid, and GDAL can read it 
> locally. I've also confirmed I can use /vsicurl/ on some test datasets 
> in the GDAL repo.
>
> Is it possible to force GDAL to work around the faulty content-length 
> header, or is it too fundamental a problem to ignore?
>
> I've separately got in touch with the data provider to see if they are 
> able to fix the issue at their end.
>
> Cheers,
>
> Daniel
>
> URL of the troublesome dataset:
>
> https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif
>
> Example HTTP header responses I'm seeing:
>
> GET
>
> HTTP/2 200
> date: Tue, 10 Sep 2024 13:47:54 GMT
> content-type: binary/octet-stream
> content-length: 278198294
> vary: Origin, Access-Control-Request-Method, 
> Access-Control-Request-Headers
> etag: "a79f3f685281d6681e4d362536c5b3eb-34"
> last-modified: Thu, 25 Jul 2024 13:16:08 GMT
> x-version: 0.0.16
> access-control-allow-credentials: true
>
> HEAD
>
> HTTP/2 200
> date: Tue, 10 Sep 2024 13:48:08 GMT
> content-type: binary/octet-stream
> content-length: 0
> x-version: 0.0.16
> access-control-allow-credentials: true
> etag: "a79f3f685281d6681e4d362536c5b3eb-34"
> last-modified: Thu, 25 Jul 2024 13:16:08 GMT
> vary: Origin, Access-Control-Request-Method, 
> Access-Control-Request-Headers
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240910/dc525d2d/attachment-0001.htm>


More information about the gdal-dev mailing list