[gdal-dev] Call for review on RFC 101: Raster dataset read-only thread-safety
Frank Warmerdam
warmerdam at pobox.com
Fri Sep 6 20:33:28 PDT 2024
On Fri, Sep 6, 2024 at 6:35 PM Even Rouault <even.rouault at spatialys.com>
wrote:
> Looking through the GTiff code I find myself not able to work out where
> locking is done and how we are managing the directory loading/switching as
> block reads are done.
>
> No locking is done. To be noted that the logic of TIFF directory
> loading/switching changed "a few" years ago. In the previous situation, the
> main dataset and its overview datasets were sharing the same TIFF* handle.
> But that strategy of switching from TIFF directory when switching between
> overview datasets had also a major performance drawback of invalidating
> TIFF bytecount[] and offset[] arrays, which was particularly annoying when
> computing overviews, where that back&forth is done consistently.
>
Even,
Ah, good!
...
> I would certainly like to work towards read-only thread safety for some
> drivers with:
> - a shared block cache for all threads reading from a dataset
>
> That easiest way of doing that would be to put a per-dataset mutex in
> classes GDALHashSetBandBlockCache and GDALArrayBandBlockCache, but that
> would hurt scalability even when several threads try to access different
> blocks at the same time (unless smart things are done to drop the mutex as
> soon as possible)
>
For read-only purposes I would have thought we only had to briefly seize a
mutex when introducing new blocks into the cache, and I thought there was
already some sort of lock around that.
- no duplication of file handles per-thread
>
> I just don't see how to do that. Even a mutex around file operations would
> not be sufficient since we'd want both Seek()+Read() to be atomic per
> thread.
>
I do agree that we would need a mutex containing both the seek and read for
a block read. Also, on reflection I can see that this could add a lot of
condition as the mutex would be held for very long periods of time and it
would intefere with the ability to do parallel reads on filesystems where
that is possible (I have to get out of a "file system is a single spinning
disk" mindset). So perhaps I'm not so attached to the avoiding per-thread
file descriptors though I'm not sure per thread file descriptors would
always be the right answer.
...
libtiff itself has a lot of state variables, in its core or in its codecs
> (e.g. the deflate codec has to maintain a zlib handle as a state variable
> if doing partial reads of a tile/strip). Even getting the location of a
> TIFF tile is not thread-safe when defered byteount[]/offset[] array loading
> is enabled (which is necessary when dealing with extremely large remote
> COGs) since it involves file accesses.
>
I can see the point though it is not clear this can't be addressed, and
that it wouldn't be a useful thing in libtiff.
- a way of checking that a dataset is considered to be thread safe.
>
> Not totally sure to understand what you mean here. The RFC introduces a
> IsThreadSafe() virtual method that a driver can potentially implement (only
> set to true currently by instances of GDALThreadSafeDataset).
>
Yes, this is what I meant, and it would be the driver implementor
indicating they believe the driver to be threadsafe (or at least that
dataset).
> But finding if a driver is thread-safe seems to be a non-solvable/provable
> problem to me, apart from staring at the code for a sufficient long time to
> convince oneself it is actually + adding some torture tests and hope they
> catch most issues. But things can get very complicated if you need to test
> all combinations of virtual methods that can be called.
>
...
> Also, doesn't gdalwarp's multithreading support already depend on
> read-only thread safety for source files?
>
> Definitely not. The initial -multi mode works with 2 threads: one for I/O,
> one for compute. The later was further extended to also multi-thread the
> compute part. But I/O is still done by a single thread, hence no
> requirement on the thread-safety of drivers.
>
Yes, this occurred to me thinking about it a bit later in the day.
Thanks for your extended answer. I'll need to spend some more time
digesting. In any event, I am supportive of the proposed RFC and
implementation.
Best regards,
Frank
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows | USA: +1 650-701-7823
<http://voice.google.com/calls?a=nc,%2B16507017823>
<http://voice.google.com/calls?a=nc,%2B16507017823>
and watch the world go round - Rush | CAN: +1 343-550-9984
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240906/02778877/attachment.htm>
More information about the gdal-dev
mailing list