[gdal-dev] Relationship between GTiff multi-threaded read and vsicurl multirange requests

Even Rouault even.rouault at spatialys.com
Thu Dec 1 16:13:17 PST 2022


Hi Pete,

Those are good questions

> I am confused by the large number of curl range requests when using 
> the new multithreaded reading.  Some questions:
>
> - with GDAL_NUM_THREADS_1 and GDAL_HTTP_MULTIRANGE=YES, "each range 
> will be requested in parallel, using several HTTP connections"... are 
> those requests multithreaded?
Not multi-threaded, but using Curl multi handle interface 
(https://curl.se/libcurl/c/libcurl-multi.html), which enables to start 
several connections in parallel within the same user-space thread (not 
sure what the kernel/OS does behinds the scenes) and listen to the 
corresponding file descriptors/network handles to collect responses as 
soon as they arive. Thus, the GeoTIFF driver waits for all those queued 
requests to have returned their result to continue its processing.
> - Is it a bad idea to use multithreaded reads and 
> GDAL_HTTP_MULTIRANGE=YES when data is accessed with /vsicurl/ served 
> by HTTP/1.1?  I am guessing the GTiff multithreaded reads are 
> splitting up contiguous byte ranges to be noncontiguous, which may 
> yield worse performance on some virtual filesystems.

What you've experienced with multithreaded reads and HTTP reads falls 
into the suggestion of https://github.com/OSGeo/gdal/issues/6456. So 
basically the multi-threaded optimization works best for now to read 
local files. The choice of reading small_world.tif over network is a bit 
a worst case here as it is definitely not a cloud optimized files, 
having a strip organization, and each strip being only 8 KB large. So 
the absence of range merging currently in the multithreaded GTiff 
decoding code path particularly hurts for that use case. For normally 
tiled files, it shouldn't be that bad.

Even

-- 

http://www.spatialys.com
My software is free, but my time generally not.



More information about the gdal-dev mailing list