[gdal-dev] Race condition between forked processes with opened Tiff dataset on Linux

Jiri Drbalek jiri.drbalek at gmail.com
Sat Dec 16 01:04:56 PST 2017


> Can I assume that we are talking about opening for read and not for write ?

Yes, it's only for reading.

I've concluded eventually that I will try to use memory mapping.

Here is a post about memory mapping between Even and me I forgot to
resend to the mailing list:

> Dear Even,
>
> Thank you for your helpful answer.
>
> I forgot to mention that when I was testing the fork() situation with
> libtiff alone, it was working fine when the tiff file was memory
> mapped. Unfortunately, for some reason, GDAL doesn't support memory
> mapping of compressed tiffs. Libtiff can read them, at least those
> with deflate compression I've tested. What is the reason for that
> restriction?

mmap is platform specific, so there was a need for a more general
mechanism. And at the time the GeoTIFF driver was created, 32 bit
processes were still common but large files already existed, so even
on Linux, mmap wasn't really always usable. Another potential issue is
that the OS might not behave appropriately if you mmap() a file larger
than the available RAM and read it entirely. At least that was my
experience with some older kernels where the OS wouldn't unload cached
pages aggressively enough, making it irresponsive due to heavy cache
swapping.

Another option at the /vsi level I looked yesterday was the use of
pread() that takes both a file offset and size to read, and makes it
possible to use the same underlying file descriptor from multiple
threads/processes. But the caveat is that this translates directly as
a system call, bypassing file stream buffering.

> Should I try to enable mmaping of compressed tiffs?

That could be a solution worth investigating. Perhaps restricted to
64bit posix platforms, and with a configuration option, such as the
existing GTIFF_USE_MMAP that you'll see if you look at
frmts/gtiff/tifvsi.cpp. The name is a bit misleading since that's
currently only available for a pseudo-mmap emulation for /vsimem/
files, that was added per https://trac.osgeo.org/gdal/changeset/39555.
As you may wonder why this was done, the aim was to be able to test
the code paths in libtiff that are mmap() specific, when GDAL is
tortured by oss-fuzz.


Even

2017-12-16 8:53 GMT+01:00, Andrew C Aitchison <andrew at aitchison.me.uk>:
>
> On Thu, 14 Dec 2017, Jiri Drbalek wrote:
>
>> Hello.
>>
>> If a Linux process with opened Tiff dataset is forked, it is not possible
>> to read from the dataset concurrently in these forked processes, because
>> file offsets and other attributes of the opened Tiff file are shared
>> between those processes.
>
>> I've made a patch which optionally close the underlying Tiff file once a
>> dataset is opened. One can then fork safely, underlying file is lazily
>> opened again in each subprocess.
>>
>> What do you think about this problem and proposed solutions? Is there
>> some
>> more elegant solution?
>
> Can I assume that we are talking about opening for read and not for write ?
>
> For writing, I was taught that multi-process programs should do all file
> writing in a dedicated thread.
>
> --
> Andrew C. Aitchison					Cambridge, UK
>  			andrew at aitchison.me.uk
>


More information about the gdal-dev mailing list