[gdal-dev] Parallel s3 requests with vsis3?
Christian Roth
code at christianroth.dev
Thu Nov 10 22:28:39 PST 2022
Hi,
I am trying to understand whether it is possible to read with parallel requests when using the vsis3 virtual file system - I am currently experimenting with a regular gtiff file and have not yet found a way to read with speeds beyond s3 single-read bandwidth.
I am no expert in http communication, so I would be very happy if you point out misconceptions. My thought process is:
As s3 does not support multiple ranges per request (see https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html), I assume that GDAL_HTTP_MULTIPLEX would not be an option.
The old documentation (https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_HTTP_MULTIRANGE) made me hopeful that maybe I can convince gdal to generate multi-range requests, which it then could send in parallel as single-range request via several HTTP connections.
> (GDAL >= 2.3) Can be set to SINGLE_GET, SERIAL or YES. Defaults to YES. Controls how ReadMultiRange() requests emitted by the GeoTIFF driver are satisfied. SINGLE_GET means that several ranges will be expressed in the Range header of a single GET requests, which is not supported by a majority of servers (including AWS S3 or Google GCS). SERIAL means that each range will be requested sequentially. YES means that each range will be requested in parallel, using HTTP/2 multiplexing or several HTTP connections.
(Option seems to still exist, but uses PARALLEL and SERIAL now - found here: https://github.com/OSGeo/gdal/blob/8633aa6f0c57d38b768059ac5dc137531421b9d7/port/cpl_vsil_curl.cpp#L3405)
I tried playing around with GDAL_HTTP_VERSION and GDAL_HTTP_MULTIRANGE, but it doesn’t seem they change the parallelism (at least I don’t seem to be able to get beyond ~20Mb/s even with optimal reading conditions).
Is currently at all possible to get parallel reads from S3 - if so what settings would I need?
Thanks so much,
Christian
More information about the gdal-dev
mailing list