[gdal-dev] Relationship between GTiff multi-threaded read and vsicurl multirange requests

Peter Schmitt pschmitt at gmail.com
Thu Dec 1 16:00:23 PST 2022


Hi,

I am experimenting with curl range requests on servers supporting HTTP/1.1
and the new GTiff multi-threaded read released in gdal-3.6.0 from
https://github.com/OSGeo/gdal/pull/6438.

First, here's a command that uses single-threaded reads with HTTP/1.1. Note
the default value
https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_HTTP_MULTIRANGE =YES.
For HTTP/1.1,  I expect each range will be requested in parallel, using
several HTTP connections.

env GDAL_NUM_THREADS=1 CPL_CURL_VERBOSE=1 GDAL_HTTP_VERSION=1.1
GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=tif
gdal_translate /vsicurl/
https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/small_world.tif
small_world_jpeg.tif -co tiled=yes -co compress=jpeg -co photometric=ycbcr
2>&1 | grep "Content-Range"
< Content-Range: bytes 0-16383/240574
< Content-Range: bytes 229376-240573/240574
< Content-Range: bytes 16384-81919/240574
< Content-Range: bytes 81920-212991/240574
< Content-Range: bytes 212992-229375/240574


I try the same thing using 2 threads. It is quite a bit slower and there
are much more range requests than I would anticipate:

time env GDAL_NUM_THREADS=2 CPL_CURL_VERBOSE=1 GDAL_HTTP_VERSION=1.1
GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=tif
gdal_translate /vsicurl/
https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/small_world.tif
small_world_jpeg_multi.tif -co tiled=yes -co compress=jpeg -co
photometric=ycbcr 2>&1 | grep "Content-Range"
< Content-Range: bytes 0-16383/240574
< Content-Range: bytes 229376-240573/240574
< Content-Range: bytes 232008-240007/240574
< Content-Range: bytes 88008-96007/240574
< Content-Range: bytes 152008-160007/240574
< Content-Range: bytes 72008-80007/240574
< Content-Range: bytes 224008-232007/240574
< Content-Range: bytes 144008-152007/240574
< Content-Range: bytes 64008-72007/240574
< Content-Range: bytes 216008-224007/240574
< Content-Range: bytes 136008-144007/240574
< Content-Range: bytes 56008-64007/240574
< Content-Range: bytes 208008-216007/240574
< Content-Range: bytes 128008-136007/240574
< Content-Range: bytes 48008-56007/240574
< Content-Range: bytes 200008-208007/240574
< Content-Range: bytes 120008-128007/240574
< Content-Range: bytes 40008-48007/240574
< Content-Range: bytes 192008-200007/240574
< Content-Range: bytes 112008-120007/240574
< Content-Range: bytes 32008-40007/240574
< Content-Range: bytes 184008-192007/240574
< Content-Range: bytes 104008-112007/240574
< Content-Range: bytes 24008-32007/240574
< Content-Range: bytes 176008-184007/240574
< Content-Range: bytes 96008-104007/240574
< Content-Range: bytes 16008-24007/240574
< Content-Range: bytes 168008-176007/240574
< Content-Range: bytes 8008-16007/240574
< Content-Range: bytes 160008-168007/240574
< Content-Range: bytes 80008-88007/240574
< Content-Range: bytes 8-8007/240574


I am confused by the large number of curl range requests when using the new
multithreaded reading.  Some questions:

- with GDAL_NUM_THREADS_1 and GDAL_HTTP_MULTIRANGE=YES, "each range will be
requested in parallel, using several HTTP connections"... are those
requests multithreaded?
- Is it a bad idea to use multithreaded reads and GDAL_HTTP_MULTIRANGE=YES
when data is accessed with /vsicurl/ served by HTTP/1.1?  I am guessing the
GTiff multithreaded reads are splitting up contiguous byte ranges to be
noncontiguous, which may yield worse performance on some virtual
filesystems.

Thanks,
Pete
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20221201/5a25e877/attachment.htm>


More information about the gdal-dev mailing list