[gdal-dev] Ignore content-length in vsicurl?

thomas bonfort thomas.bonfort at gmail.com
Tue Sep 10 08:11:21 PDT 2024


I'm not sure that providing a fix to work around this very broken behavior
is the best way of action to make them fix their server...

On Tue, Sep 10, 2024 at 5:07 PM Even Rouault via gdal-dev <
gdal-dev at lists.osgeo.org> wrote:

>
> Le 10/09/2024 à 16:10, Rahkonen Jukka via gdal-dev a écrit :
>
> Hi,
>
>
>
> Have you tried with configuration option “CPL_VSIL_CURL_USE_HEAD=[YES/NO]:
> Defaults to YES. Controls whether to use a HEAD request when opening a
> remote URL.”
>
> I was just going to suggest that too. It "works", but not really. It just
> postpones the core issue: the server doesn't support GET Range requests, so
> can't be used with /vsicurl/
>
> As it has a COG organization with overview data first in the file, If you
> want to read the smallest overview(s), you can use /vsicurl_streaming/
> instead, but that won't be efficient to read the bottom-right most tile of
> the full resoultion late, which will require reading the whole file...
>
> Nothing GDAL can do about that.
>
> Actually... digging further... it somehow supports Range requests, but in
> what I believe a non-compliant way. It does return the expected content,
> but returns HTTP 200 and not HTTP 206 (Partial content). And it never
> returns the Content-Length header.
>
> Well, I've implemented a workaround in
> https://github.com/OSGeo/gdal/pull/10760 that might be useful in other
> similar cases too.
>
> With that, the following works:
>
> gdal_translate "/vsicurl?file_size=unlimited&url=https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif" --config GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR out.tif -srcwin 5000 5000 50 50
>
> file_size=unlimited works here since the GTiff driver doesn't really need
> to have the right file size, it will just check we don't try to read beyond
> at some points, so unlimited is OK. In other situations/drivers, the exact
> value could be needed.
>
> But they should really fix their servers
>
> Even
>
>
>
> -Jukka Rahkonen-
>
>
>
> *Lähettäjä:* gdal-dev <gdal-dev-bounces at lists.osgeo.org>
> <gdal-dev-bounces at lists.osgeo.org> *Puolesta *Daniel Evans via gdal-dev
> *Lähetetty:* tiistai 10. syyskuuta 2024 16.57
> *Vastaanottaja:* 'gdal-dev at lists.osgeo.org' (gdal-dev at lists.osgeo.org)
> <gdal-dev at lists.osgeo.org> <gdal-dev at lists.osgeo.org>
> *Aihe:* [gdal-dev] Ignore content-length in vsicurl?
>
>
>
> Hi all,
>
>
>
> I am attempting to read a dataset via /vsicurl/ where I believe the server
> is incorrectly returning `content-length: 0` in response to HEAD requests.
> This causes GDAL to believe it's a zero-length file, and it therefore can't
> be read.
>
>
>
> If I download the file via HTTP GET, it's valid, and GDAL can read it
> locally. I've also confirmed I can use /vsicurl/ on some test datasets in
> the GDAL repo.
>
>
>
> Is it possible to force GDAL to work around the faulty content-length
> header, or is it too fundamental a problem to ignore?
>
>
>
> I've separately got in touch with the data provider to see if they are
> able to fix the issue at their end.
>
>
>
> Cheers,
>
> Daniel
>
>
>
>
>
> URL of the troublesome dataset:
>
>
> https://data.source.coop/earthgenome/sentinel2-temporal-mosaics/20NMH_2024-04-01_2024-08-01/B08.tif
>
>
>
>
>
> Example HTTP header responses I'm seeing:
>
>
>
> GET
>
>
>
> HTTP/2 200
> date: Tue, 10 Sep 2024 13:47:54 GMT
> content-type: binary/octet-stream
> content-length: 278198294
> vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
> etag: "a79f3f685281d6681e4d362536c5b3eb-34"
> last-modified: Thu, 25 Jul 2024 13:16:08 GMT
> x-version: 0.0.16
> access-control-allow-credentials: true
>
>
>
> HEAD
>
>
>
> HTTP/2 200
> date: Tue, 10 Sep 2024 13:48:08 GMT
> content-type: binary/octet-stream
> content-length: 0
> x-version: 0.0.16
> access-control-allow-credentials: true
> etag: "a79f3f685281d6681e4d362536c5b3eb-34"
> last-modified: Thu, 25 Jul 2024 13:16:08 GMT
> vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
>
> _______________________________________________
> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> -- http://www.spatialys.com
> My software is free, but my time generally not.
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240910/012acd86/attachment.htm>


More information about the gdal-dev mailing list