[gdal-dev] TSAN lock-order-inversion on moving GDAL init to main()

Simon Eves simon.eves at heavy.ai
Mon Apr 7 21:22:21 PDT 2025


I just refactored our GDAL init code which required me to add calls to the
init function to the *main()* of several of our unit test executables. Now
they all fail our TSAN tests with the following:

*18:46:51* 29: WARNING: ThreadSanitizer: lock-order-inversion
(potential deadlock) (pid=111437)*18:46:51* 29:   Cycle in lock order
graph: M889273480187642592 (0x000000000000) => M889596235056216640
(0x000000000000) => M889273480187642592*18:46:51* 29: *18:46:51* 29:
Mutex M889596235056216640 acquired here while holding mutex
M889273480187642592 in main thread:*18:46:51* 29:     #0
pthread_mutex_lock
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4240
(libtsan.so.0+0x53908)*18:46:51* 29:     #1 CPLAcquireMutex <null>
(RasterTableTests+0x61da93a)*18:46:51* 29:     #2 main
/home/jenkins-slave/workspace/core-tsan-gcc/Tests/ee/RasterTableTests.cpp:2292
(RasterTableTests+0x34705ba)

The relevant parts of the init() function are:

GDALAllRegister();
> OGRRegisterAll();
> CPLSetErrorHandler(*gdal_error_handler);


I found https://github.com/OSGeo/gdal/issues/1108 and related which seems
to indicate that something was fixed perhaps in 3.9.

We are in the process of updating from 3.7.3 to 3.10.2 and the problem
occurs in CI builds on machines which still have 3.7.3. It does not occur
with the new TSAN build container which has 3.10.2.

Can I assume that the reworking of mutex handling in the above issue was
indeed the cause of this? We can suppress the failure until we are past the
GDAL update.

-- 
Simon Eves
Senior Rendering Engineer
+1 (415) 902-1996
simon.eves at heavy.ai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20250407/9d557d5c/attachment.htm>


More information about the gdal-dev mailing list