[gdal-dev] Multithread deadlock

Francisco Javier Calzado francisco.javier.calzado at ericsson.com
Mon Sep 26 07:38:11 PDT 2016


Sure Andrew,

Here it is the call stack from Visual Studio for both threads (I just copied the top calls where GDAL is involved, just for easy reading. If you need the whole stack just let me know):

THREAD 1:

                ntdll.dll!_NtWaitForSingleObject at 12‑() Unknown
               ntdll.dll!_RtlpWaitOnCriticalSection at 8‑()             Unknown
               ntdll.dll!_RtlEnterCriticalSection at 4‑()    Unknown
               gdal201.dll!CPLAcquireMutex(_CPLMutex * hMutexIn, double dfWaitInSeconds) Line 806               C++
               gdal201.dll!GDALDataset::EnterReadWrite(GDALRWFlag eRWFlag) Line 6102        C++
               gdal201.dll!GDALRasterBand::EnterReadWrite(GDALRWFlag eRWFlag) Line 5290 C++
               gdal201.dll!GDALRasterBlock::Write() Line 742    C++
               gdal201.dll!GDALRasterBlock::Internalize() Line 917          C++
               gdal201.dll!GDALRasterBand::GetLockedBlockRef(int nXBlockOff, int nYBlockOff, int bJustInitialize) Line 1126                C++
>             Test.exe!RasterBandPixelAccess::SetValueAtPixel<short>(const int & pX, const int & pY, const short & value) Line 180                C++


THREAD 2:

                ntdll.dll!_NtWaitForSingleObject at 12‑() Unknown
               KernelBase.dll!_WaitForSingleObjectEx at 12‑()   Unknown
               kernel32.dll!_WaitForSingleObjectExImplementation at 12‑()        Unknown
               kernel32.dll!_WaitForSingleObject at 8‑() Unknown
               gdal201.dll!CPLCondWait(_CPLCond * hCond, _CPLMutex * hClientMutex) Line 937           C++
               gdal201.dll!GDALAbstractBandBlockCache::WaitKeepAliveCounter() Line 134       C++
               gdal201.dll!GDALArrayBandBlockCache::FlushCache() Line 312    C++
               gdal201.dll!GDALRasterBand::FlushCache() Line 865         C++
               gdal201.dll!GDALDataset::FlushCache() Line 386 C++
               gdal201.dll!GDALPamDataset::FlushCache() Line 159        C++
               gdal201.dll!GTiffDataset::Finalize() Line 6180       C++
               gdal201.dll!GTiffDataset::~GTiffDataset() Line 6135           C++
               gdal201.dll!GTiffDataset::`scalar deleting destructor'(unsigned int)           C++
               gdal201.dll!GDALClose(void * hDS) Line 2998       C++
>             Test.exe!main::__l2::<lambda>(std::basic_string<char,std::char_traits<char>,std::allocator<char> > sourcefilePath, std::basic_string<char,std::char_traits<char>,std::allocator<char> > targetFilePath, int threadID) Line 66           C++


From: Andrew Bell [mailto:andrew.bell.ia at gmail.com]
Sent: 26 September, 2016 16:06
To: Francisco Javier Calzado <francisco.javier.calzado at ericsson.com>
Cc: gdal-dev at lists.osgeo.org
Subject: Re: [gdal-dev] Multithread deadlock

Deadlocks are usually easy to debug if you can get a traceback when deadlocked.  If you can attach with gdb (or run in the debugger) and reproduce and post the stack at the time ('where' from gdb), it should be no problem to fix.  Trying to reproduce on different hardware can be difficult.

On Mon, Sep 26, 2016 at 9:33 AM, Francisco Javier Calzado <francisco.javier.calzado at ericsson.com<mailto:francisco.javier.calzado at ericsson.com>> wrote:
Hi guys,

I am experiencing a deadlock with just 2 threads in a single reader & multiple writer scenario. This is, threads read from the same input file (using different handlers) and then write different output files. Deadlock comes when the block cache gets filled. The situation is the following:


-          T1 and T2 read datasets D1 and D2, both pointing to the same input raster (GTiff).

-          Block cache gets filled.

-          T1 tries to lock one block in the cache to write data. But cache is full, so it tries to free dirty blocks from T2 (as seen in Internalize() method). For that purpose, it requires apparently a mutex from D2.

-          However T2 is in a state where must wait for thread T1 to finish working with T2’s blocks. In this state, T2 has a mutex acquired from D2.

At least, that is what it seems to be happening based on source code. Maybe I’m wrong, I don’t have a full picture overview about how GDAL is internally working. The thing is that I can reproduce this issue with the following test code and dataset:
https://drive.google.com/file/d/0B-OCl1FjBi0YSkU3RUozZjc5SnM/view?usp=sharing

Oddly enough, ticket with number #6163 is supposed to fix this, but its failing in my case. I am working with GDAL 2.1.0 version under VS2015 (x32, Debug) compilation.

Even, what do you think?

Thanks!
Javier C.


_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
http://lists.osgeo.org/mailman/listinfo/gdal-dev



--
Andrew Bell
andrew.bell.ia at gmail.com<mailto:andrew.bell.ia at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20160926/ffad8ef3/attachment-0001.html>


More information about the gdal-dev mailing list