[mapguide-internals] RE: Death by mutex

Bruce Dechant bruce.dechant at autodesk.com
Thu Jan 13 12:17:07 EST 2011


Nice find. Issues like this are always fun to find :)
Please keep us posted and I would be happy to help with any code review for this.


-----Original Message-----
From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Trevor Wekel
Sent: Thursday, January 13, 2011 10:08 AM
To: MapGuide Internals Mail List
Subject: [mapguide-internals] Death by mutex

Hi everyone,

I have found a serious interaction issue in the MapGuide Server.  I did a quick check in Subversion and this issue is present in MGOS 2.1, MGOS 2.2, MGE 2010 and MGE 2011.

Basically, if you update any feature source while the MapGuide Server is under load serving pooled feature sources (ODBC, SQL Server, Oracle, etc.) then there is a risk of a deadlock occurring between the feature service cache and the Fdo connection manager.

What does this mean?  Creating or editing features sources on a live site can take it down.  How did I find it?  Would you believe a 10,000 line The Grinder script?

For those with code access, here's a pseudo stack:

Case 1: A thread has the connection manager lock and wants the feature service cache lock
MgFdoConnectionManager::Open(MgResourceIdentifier* resourceIdentifier)
  acquires MgFdoConnectionManager mutex
  if a pooled connection:
    MgFdoConnectionManager::FindFdoConnection(MgResourceIdentifier* resourceIdentifier)
          acquires MgFeatureServiceCache mutex

Case 2: A thread has the feature service cache lock and wants the connection manager lock MgApplicationRepositoryManager::NotifyResourceChanged
  MgCacheManager::NotifyResourceChanged(MgResourceIdentifier* resource)
    if a feature source:
      acquires MgFeatureServiceCache mutex
        acquires MgFdoConnectionManager mutex

And this is the code we need to change.  If we acquire the connection manager mutex first, we should be able to avoid the deadlock. I have not done a complete code review yet so there may be other cases where this could occur.

void MgCacheManager::NotifyResourceChanged(CREFSTRING resource) {
    if (STRING::npos != resource.rfind(MgResourceType::FeatureSource))
        // The mutex usage and the method call order here are important
        // because they ensure all the caches are in sync.
        ACE_MT(ACE_GUARD(ACE_Recursive_Thread_Mutex, ace_mon, m_featureServiceCache.m_mutex));



More information about the mapguide-internals mailing list