[gdal-dev] Multithread deadlock

Even Rouault even.rouault at spatialys.com
Mon Sep 26 08:53:07 PDT 2016


Hi,

I admire Andrew's enthousiasm and would happily let him tackle the next bug 
reported in this area ;-)

I could reproduce the deadlock with the same stack trace  and have pushed a fix 
per https://trac.osgeo.org/gdal/ticket/6661. This was actually not a typical 
deadlock situation, but a undefined behaviour caused by trying to acquire a 
recursive mutex than was previously released more times than it had been 
acquired.

Without this patch, a "workaround" would be to define the 
GDAL_ENABLE_READ_WRITE_MUTEX config option to NO to disable the per-dataset 
mutex. had added this option since I wasn't really sure that the per-dataset 
mutex wouldn't introduce deadlock situations. But when defining it, you'll get  
undefined behaviour (=potentially crashing or causing corruptions) due to 2 
threads potentially calling the IWriteBlock() method of the same dataset,which 
was the GDAL 1.X behaviour.

Clearly multi-threading scenarios involving writing is the point where the 
global block cache mechanism + the band-aid of the per-dataset R/W mutex are 
showing their limit in terms of design&maintenance complexity, and 
scalability. A per-dataset block cache would avoid such headaches (the 
drawback would be to define a per-dataset block cache size)

Even

> Sure Andrew,
> 
> Here it is the call stack from Visual Studio for both threads (I just
> copied the top calls where GDAL is involved, just for easy reading. If you
> need the whole stack just let me know):
> 
> THREAD 1:
> 
>                 ntdll.dll!_NtWaitForSingleObject at 12‑() Unknown
>                ntdll.dll!_RtlpWaitOnCriticalSection at 8‑()            
> Unknown ntdll.dll!_RtlEnterCriticalSection at 4‑()    Unknown
>                gdal201.dll!CPLAcquireMutex(_CPLMutex * hMutexIn, double
> dfWaitInSeconds) Line 806               C++
> gdal201.dll!GDALDataset::EnterReadWrite(GDALRWFlag eRWFlag) Line 6102     
>   C++ gdal201.dll!GDALRasterBand::EnterReadWrite(GDALRWFlag eRWFlag) Line
> 5290 C++ gdal201.dll!GDALRasterBlock::Write() Line 742    C++
> gdal201.dll!GDALRasterBlock::Internalize() Line 917          C++
> gdal201.dll!GDALRasterBand::GetLockedBlockRef(int nXBlockOff, int
> nYBlockOff, int bJustInitialize) Line 1126                C++
> 
> >             Test.exe!RasterBandPixelAccess::SetValueAtPixel<short>(const
> >             int & pX, const int & pY, const short & value) Line 180     
> >                       C++
> 
> THREAD 2:
> 
>                 ntdll.dll!_NtWaitForSingleObject at 12‑() Unknown
>                KernelBase.dll!_WaitForSingleObjectEx at 12‑()   Unknown
>                kernel32.dll!_WaitForSingleObjectExImplementation at 12‑()     
>   Unknown kernel32.dll!_WaitForSingleObject at 8‑() Unknown
>                gdal201.dll!CPLCondWait(_CPLCond * hCond, _CPLMutex *
> hClientMutex) Line 937           C++
> gdal201.dll!GDALAbstractBandBlockCache::WaitKeepAliveCounter() Line 134   
>    C++ gdal201.dll!GDALArrayBandBlockCache::FlushCache() Line 312    C++
> gdal201.dll!GDALRasterBand::FlushCache() Line 865         C++
> gdal201.dll!GDALDataset::FlushCache() Line 386 C++
>                gdal201.dll!GDALPamDataset::FlushCache() Line 159        C++
>                gdal201.dll!GTiffDataset::Finalize() Line 6180       C++
>                gdal201.dll!GTiffDataset::~GTiffDataset() Line 6135         
>  C++ gdal201.dll!GTiffDataset::`scalar deleting destructor'(unsigned int) 
>          C++ gdal201.dll!GDALClose(void * hDS) Line 2998       C++
> 
> >             Test.exe!main::__l2::<lambda>(std::basic_string<char,std::cha
> >             r_traits<char>,std::allocator<char> > sourcefilePath,
> >             std::basic_string<char,std::char_traits<char>,std::allocator
> >             <char> > targetFilePath, int threadID) Line 66           C++
> 
> From: Andrew Bell [mailto:andrew.bell.ia at gmail.com]
> Sent: 26 September, 2016 16:06
> To: Francisco Javier Calzado <francisco.javier.calzado at ericsson.com>
> Cc: gdal-dev at lists.osgeo.org
> Subject: Re: [gdal-dev] Multithread deadlock
> 
> Deadlocks are usually easy to debug if you can get a traceback when
> deadlocked.  If you can attach with gdb (or run in the debugger) and
> reproduce and post the stack at the time ('where' from gdb), it should be
> no problem to fix.  Trying to reproduce on different hardware can be
> difficult.
> 
> On Mon, Sep 26, 2016 at 9:33 AM, Francisco Javier Calzado
> <francisco.javier.calzado at ericsson.com<mailto:francisco.javier.calzado at eri
> csson.com>> wrote: Hi guys,
> 
> I am experiencing a deadlock with just 2 threads in a single reader &
> multiple writer scenario. This is, threads read from the same input file
> (using different handlers) and then write different output files. Deadlock
> comes when the block cache gets filled. The situation is the following:
> 
> 
> -          T1 and T2 read datasets D1 and D2, both pointing to the same
> input raster (GTiff).
> 
> -          Block cache gets filled.
> 
> -          T1 tries to lock one block in the cache to write data. But cache
> is full, so it tries to free dirty blocks from T2 (as seen in
> Internalize() method). For that purpose, it requires apparently a mutex
> from D2.
> 
> -          However T2 is in a state where must wait for thread T1 to finish
> working with T2’s blocks. In this state, T2 has a mutex acquired from D2.
> 
> At least, that is what it seems to be happening based on source code. Maybe
> I’m wrong, I don’t have a full picture overview about how GDAL is
> internally working. The thing is that I can reproduce this issue with the
> following test code and dataset:
> https://drive.google.com/file/d/0B-OCl1FjBi0YSkU3RUozZjc5SnM/view?usp=shar
> ing
> 
> Oddly enough, ticket with number #6163 is supposed to fix this, but its
> failing in my case. I am working with GDAL 2.1.0 version under VS2015
> (x32, Debug) compilation.
> 
> Even, what do you think?
> 
> Thanks!
> Javier C.
> 
> 
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> 
> 
> 
> --
> Andrew Bell
> andrew.bell.ia at gmail.com<mailto:andrew.bell.ia at gmail.com>

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list