[gdal-dev] Relationship between GTiff multi-threaded read and vsicurl multirange requests
Even Rouault
even.rouault at spatialys.com
Thu Dec 1 16:13:17 PST 2022
Hi Pete,
Those are good questions
> I am confused by the large number of curl range requests when using
> the new multithreaded reading. Some questions:
>
> - with GDAL_NUM_THREADS_1 and GDAL_HTTP_MULTIRANGE=YES, "each range
> will be requested in parallel, using several HTTP connections"... are
> those requests multithreaded?
Not multi-threaded, but using Curl multi handle interface
(https://curl.se/libcurl/c/libcurl-multi.html), which enables to start
several connections in parallel within the same user-space thread (not
sure what the kernel/OS does behinds the scenes) and listen to the
corresponding file descriptors/network handles to collect responses as
soon as they arive. Thus, the GeoTIFF driver waits for all those queued
requests to have returned their result to continue its processing.
> - Is it a bad idea to use multithreaded reads and
> GDAL_HTTP_MULTIRANGE=YES when data is accessed with /vsicurl/ served
> by HTTP/1.1? I am guessing the GTiff multithreaded reads are
> splitting up contiguous byte ranges to be noncontiguous, which may
> yield worse performance on some virtual filesystems.
What you've experienced with multithreaded reads and HTTP reads falls
into the suggestion of https://github.com/OSGeo/gdal/issues/6456. So
basically the multi-threaded optimization works best for now to read
local files. The choice of reading small_world.tif over network is a bit
a worst case here as it is definitely not a cloud optimized files,
having a strip organization, and each strip being only 8 KB large. So
the absence of range merging currently in the multithreaded GTiff
decoding code path particularly hurts for that use case. For normally
tiled files, it shouldn't be that bad.
Even
--
http://www.spatialys.com
My software is free, but my time generally not.
More information about the gdal-dev
mailing list