[gdal-dev] Multithread deadlock
Francisco Javier Calzado
francisco.javier.calzado at ericsson.com
Tue Sep 27 00:50:19 PDT 2016
Hi Even,
Thanks for such a quick fix! I'm gonna apply the patch and recompile GDAL and will let you know :)
Keep in touch.
Best Regards,
Javier Calzado
-----Original Message-----
From: Even Rouault [mailto:even.rouault at spatialys.com]
Sent: 26 September, 2016 17:53
To: gdal-dev at lists.osgeo.org
Cc: Francisco Javier Calzado <francisco.javier.calzado at ericsson.com>; Andrew Bell <andrew.bell.ia at gmail.com>
Subject: Re: [gdal-dev] Multithread deadlock
Hi,
I admire Andrew's enthousiasm and would happily let him tackle the next bug reported in this area ;-)
I could reproduce the deadlock with the same stack trace and have pushed a fix per https://trac.osgeo.org/gdal/ticket/6661. This was actually not a typical deadlock situation, but a undefined behaviour caused by trying to acquire a recursive mutex than was previously released more times than it had been acquired.
Without this patch, a "workaround" would be to define the GDAL_ENABLE_READ_WRITE_MUTEX config option to NO to disable the per-dataset mutex. had added this option since I wasn't really sure that the per-dataset mutex wouldn't introduce deadlock situations. But when defining it, you'll get undefined behaviour (=potentially crashing or causing corruptions) due to 2 threads potentially calling the IWriteBlock() method of the same dataset,which was the GDAL 1.X behaviour.
Clearly multi-threading scenarios involving writing is the point where the global block cache mechanism + the band-aid of the per-dataset R/W mutex are showing their limit in terms of design&maintenance complexity, and scalability. A per-dataset block cache would avoid such headaches (the drawback would be to define a per-dataset block cache size)
Even
> Sure Andrew,
>
> Here it is the call stack from Visual Studio for both threads (I just
> copied the top calls where GDAL is involved, just for easy reading. If
> you need the whole stack just let me know):
>
> THREAD 1:
>
> ntdll.dll!_NtWaitForSingleObject at 12‑() Unknown
> ntdll.dll!_RtlpWaitOnCriticalSection at 8‑()
> Unknown ntdll.dll!_RtlEnterCriticalSection at 4‑() Unknown
> gdal201.dll!CPLAcquireMutex(_CPLMutex * hMutexIn, double
> dfWaitInSeconds) Line 806 C++
> gdal201.dll!GDALDataset::EnterReadWrite(GDALRWFlag eRWFlag) Line 6102
> C++ gdal201.dll!GDALRasterBand::EnterReadWrite(GDALRWFlag eRWFlag) Line
> 5290 C++ gdal201.dll!GDALRasterBlock::Write() Line 742 C++
> gdal201.dll!GDALRasterBlock::Internalize() Line 917 C++
> gdal201.dll!GDALRasterBand::GetLockedBlockRef(int nXBlockOff, int
> nYBlockOff, int bJustInitialize) Line 1126 C++
>
> > Test.exe!RasterBandPixelAccess::SetValueAtPixel<short>(const
> > int & pX, const int & pY, const short & value) Line 180
> > C++
>
> THREAD 2:
>
> ntdll.dll!_NtWaitForSingleObject at 12‑() Unknown
> KernelBase.dll!_WaitForSingleObjectEx at 12‑() Unknown
> kernel32.dll!_WaitForSingleObjectExImplementation at 12‑()
> Unknown kernel32.dll!_WaitForSingleObject at 8‑() Unknown
> gdal201.dll!CPLCondWait(_CPLCond * hCond, _CPLMutex *
> hClientMutex) Line 937 C++
> gdal201.dll!GDALAbstractBandBlockCache::WaitKeepAliveCounter() Line 134
> C++ gdal201.dll!GDALArrayBandBlockCache::FlushCache() Line 312 C++
> gdal201.dll!GDALRasterBand::FlushCache() Line 865 C++
> gdal201.dll!GDALDataset::FlushCache() Line 386 C++
> gdal201.dll!GDALPamDataset::FlushCache() Line 159 C++
> gdal201.dll!GTiffDataset::Finalize() Line 6180 C++
> gdal201.dll!GTiffDataset::~GTiffDataset() Line 6135
> C++ gdal201.dll!GTiffDataset::`scalar deleting destructor'(unsigned int)
> C++ gdal201.dll!GDALClose(void * hDS) Line 2998 C++
>
> > Test.exe!main::__l2::<lambda>(std::basic_string<char,std::cha
> > r_traits<char>,std::allocator<char> > sourcefilePath,
> > std::basic_string<char,std::char_traits<char>,std::allocator
> > <char> > targetFilePath, int threadID) Line 66 C++
>
> From: Andrew Bell [mailto:andrew.bell.ia at gmail.com]
> Sent: 26 September, 2016 16:06
> To: Francisco Javier Calzado <francisco.javier.calzado at ericsson.com>
> Cc: gdal-dev at lists.osgeo.org
> Subject: Re: [gdal-dev] Multithread deadlock
>
> Deadlocks are usually easy to debug if you can get a traceback when
> deadlocked. If you can attach with gdb (or run in the debugger) and
> reproduce and post the stack at the time ('where' from gdb), it should
> be no problem to fix. Trying to reproduce on different hardware can
> be difficult.
>
> On Mon, Sep 26, 2016 at 9:33 AM, Francisco Javier Calzado
> <francisco.javier.calzado at ericsson.com<mailto:francisco.javier.calzado
> @eri
> csson.com>> wrote: Hi guys,
>
> I am experiencing a deadlock with just 2 threads in a single reader &
> multiple writer scenario. This is, threads read from the same input
> file (using different handlers) and then write different output files.
> Deadlock comes when the block cache gets filled. The situation is the following:
>
>
> - T1 and T2 read datasets D1 and D2, both pointing to the same
> input raster (GTiff).
>
> - Block cache gets filled.
>
> - T1 tries to lock one block in the cache to write data. But cache
> is full, so it tries to free dirty blocks from T2 (as seen in
> Internalize() method). For that purpose, it requires apparently a
> mutex from D2.
>
> - However T2 is in a state where must wait for thread T1 to finish
> working with T2’s blocks. In this state, T2 has a mutex acquired from D2.
>
> At least, that is what it seems to be happening based on source code.
> Maybe I’m wrong, I don’t have a full picture overview about how GDAL
> is internally working. The thing is that I can reproduce this issue
> with the following test code and dataset:
> https://drive.google.com/file/d/0B-OCl1FjBi0YSkU3RUozZjc5SnM/view?usp=
> shar
> ing
>
> Oddly enough, ticket with number #6163 is supposed to fix this, but
> its failing in my case. I am working with GDAL 2.1.0 version under
> VS2015 (x32, Debug) compilation.
>
> Even, what do you think?
>
> Thanks!
> Javier C.
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
>
>
> --
> Andrew Bell
> andrew.bell.ia at gmail.com<mailto:andrew.bell.ia at gmail.com>
--
Spatialys - Geospatial professional services http://www.spatialys.com
More information about the gdal-dev
mailing list