[gdal-dev] Multithread deadlock
Even Rouault
even.rouault at spatialys.com
Mon Sep 26 08:53:07 PDT 2016
Hi,
I admire Andrew's enthousiasm and would happily let him tackle the next bug
reported in this area ;-)
I could reproduce the deadlock with the same stack trace and have pushed a fix
per https://trac.osgeo.org/gdal/ticket/6661. This was actually not a typical
deadlock situation, but a undefined behaviour caused by trying to acquire a
recursive mutex than was previously released more times than it had been
acquired.
Without this patch, a "workaround" would be to define the
GDAL_ENABLE_READ_WRITE_MUTEX config option to NO to disable the per-dataset
mutex. had added this option since I wasn't really sure that the per-dataset
mutex wouldn't introduce deadlock situations. But when defining it, you'll get
undefined behaviour (=potentially crashing or causing corruptions) due to 2
threads potentially calling the IWriteBlock() method of the same dataset,which
was the GDAL 1.X behaviour.
Clearly multi-threading scenarios involving writing is the point where the
global block cache mechanism + the band-aid of the per-dataset R/W mutex are
showing their limit in terms of design&maintenance complexity, and
scalability. A per-dataset block cache would avoid such headaches (the
drawback would be to define a per-dataset block cache size)
Even
> Sure Andrew,
>
> Here it is the call stack from Visual Studio for both threads (I just
> copied the top calls where GDAL is involved, just for easy reading. If you
> need the whole stack just let me know):
>
> THREAD 1:
>
> ntdll.dll!_NtWaitForSingleObject at 12‑() Unknown
> ntdll.dll!_RtlpWaitOnCriticalSection at 8‑()
> Unknown ntdll.dll!_RtlEnterCriticalSection at 4‑() Unknown
> gdal201.dll!CPLAcquireMutex(_CPLMutex * hMutexIn, double
> dfWaitInSeconds) Line 806 C++
> gdal201.dll!GDALDataset::EnterReadWrite(GDALRWFlag eRWFlag) Line 6102
> C++ gdal201.dll!GDALRasterBand::EnterReadWrite(GDALRWFlag eRWFlag) Line
> 5290 C++ gdal201.dll!GDALRasterBlock::Write() Line 742 C++
> gdal201.dll!GDALRasterBlock::Internalize() Line 917 C++
> gdal201.dll!GDALRasterBand::GetLockedBlockRef(int nXBlockOff, int
> nYBlockOff, int bJustInitialize) Line 1126 C++
>
> > Test.exe!RasterBandPixelAccess::SetValueAtPixel<short>(const
> > int & pX, const int & pY, const short & value) Line 180
> > C++
>
> THREAD 2:
>
> ntdll.dll!_NtWaitForSingleObject at 12‑() Unknown
> KernelBase.dll!_WaitForSingleObjectEx at 12‑() Unknown
> kernel32.dll!_WaitForSingleObjectExImplementation at 12‑()
> Unknown kernel32.dll!_WaitForSingleObject at 8‑() Unknown
> gdal201.dll!CPLCondWait(_CPLCond * hCond, _CPLMutex *
> hClientMutex) Line 937 C++
> gdal201.dll!GDALAbstractBandBlockCache::WaitKeepAliveCounter() Line 134
> C++ gdal201.dll!GDALArrayBandBlockCache::FlushCache() Line 312 C++
> gdal201.dll!GDALRasterBand::FlushCache() Line 865 C++
> gdal201.dll!GDALDataset::FlushCache() Line 386 C++
> gdal201.dll!GDALPamDataset::FlushCache() Line 159 C++
> gdal201.dll!GTiffDataset::Finalize() Line 6180 C++
> gdal201.dll!GTiffDataset::~GTiffDataset() Line 6135
> C++ gdal201.dll!GTiffDataset::`scalar deleting destructor'(unsigned int)
> C++ gdal201.dll!GDALClose(void * hDS) Line 2998 C++
>
> > Test.exe!main::__l2::<lambda>(std::basic_string<char,std::cha
> > r_traits<char>,std::allocator<char> > sourcefilePath,
> > std::basic_string<char,std::char_traits<char>,std::allocator
> > <char> > targetFilePath, int threadID) Line 66 C++
>
> From: Andrew Bell [mailto:andrew.bell.ia at gmail.com]
> Sent: 26 September, 2016 16:06
> To: Francisco Javier Calzado <francisco.javier.calzado at ericsson.com>
> Cc: gdal-dev at lists.osgeo.org
> Subject: Re: [gdal-dev] Multithread deadlock
>
> Deadlocks are usually easy to debug if you can get a traceback when
> deadlocked. If you can attach with gdb (or run in the debugger) and
> reproduce and post the stack at the time ('where' from gdb), it should be
> no problem to fix. Trying to reproduce on different hardware can be
> difficult.
>
> On Mon, Sep 26, 2016 at 9:33 AM, Francisco Javier Calzado
> <francisco.javier.calzado at ericsson.com<mailto:francisco.javier.calzado at eri
> csson.com>> wrote: Hi guys,
>
> I am experiencing a deadlock with just 2 threads in a single reader &
> multiple writer scenario. This is, threads read from the same input file
> (using different handlers) and then write different output files. Deadlock
> comes when the block cache gets filled. The situation is the following:
>
>
> - T1 and T2 read datasets D1 and D2, both pointing to the same
> input raster (GTiff).
>
> - Block cache gets filled.
>
> - T1 tries to lock one block in the cache to write data. But cache
> is full, so it tries to free dirty blocks from T2 (as seen in
> Internalize() method). For that purpose, it requires apparently a mutex
> from D2.
>
> - However T2 is in a state where must wait for thread T1 to finish
> working with T2’s blocks. In this state, T2 has a mutex acquired from D2.
>
> At least, that is what it seems to be happening based on source code. Maybe
> I’m wrong, I don’t have a full picture overview about how GDAL is
> internally working. The thing is that I can reproduce this issue with the
> following test code and dataset:
> https://drive.google.com/file/d/0B-OCl1FjBi0YSkU3RUozZjc5SnM/view?usp=shar
> ing
>
> Oddly enough, ticket with number #6163 is supposed to fix this, but its
> failing in my case. I am working with GDAL 2.1.0 version under VS2015
> (x32, Debug) compilation.
>
> Even, what do you think?
>
> Thanks!
> Javier C.
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
>
>
> --
> Andrew Bell
> andrew.bell.ia at gmail.com<mailto:andrew.bell.ia at gmail.com>
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list