[mapguide-internals] RE: std::string not thread safe on Linux

Trevor Wekel trevor_wekel at otxsystems.com
Wed Aug 4 18:45:12 EDT 2010


Hi Bob,

Yes.  I absolutely agree.  They are both risky.  ICU will definitely be more churn because it is a full internationalization library.  It will add additional dlls and so's to the installation image.  ICU is not realistic for 2.2.  On the plus side, ICU uses a 16 bit Unicode representation on both Windows and Linux.  Xerces can also be compiled with ICU.  This could be significant for DB XML multi-language support.  I pulled this from http://www.oracle.com/technology/products/berkeley-db/faq/xml_faq.html

>Can DB XML parse my unusually encoded XML document?
>
>DB XML uses the Xerces-C library for a lot of its XML parsing. Out of the box, Xerces-C has the ability to parse XML document in a number of well known encodings, including (but not limited to) UTF-8, UTF-16 and ISO-8859-1. However, if you have documents that use an unsupported encoding (Big-5 for instance) there is still a solution. You can compile the Xerces-C library with ICU support, which allows BDB XML to transcode and parse over 500 different character encodings. Using the following options to the buildall.sh script that comes with BDB XML is one way to do this:
>
>./buildall.sh --with-xerces-conf="-t icu"
>
>Using the ICU library also fixes a bug in versions of DB XML up to 2.2.13, where the fn:upper-case() and fn:lower-case() functions did not handle unicode characters correctly.
>

So I assume this means that we could store multiple character representations in DB XML if we went with ICU.  Or we may be able to convert documents to UTF-8 or UTF-16 as we write them to the repository. 


Regards,
Trevor


-----Original Message-----
From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Robert Bray
Sent: August 4, 2010 4:32 PM
To: 'mapguide-internals at lists.osgeo.org'
Subject: Re: [mapguide-internals] RE: std::string not thread safe on Linux

Hmm,

Option 2 and 3 seem almost identical, but with 3 you don't need the #ifdefs. Given the amount of code change, both seem equally risky unless I am missing something.

Bob 

----- Original Message -----
From: mapguide-internals-bounces at lists.osgeo.org <mapguide-internals-bounces at lists.osgeo.org>
To: MapGuide Internals Mail List <mapguide-internals at lists.osgeo.org>
Sent: Wed Aug 04 15:22:42 2010
Subject: [mapguide-internals] RE: std::string not thread safe on Linux

Trevor,

Nice find!

I would go with option 2, but is option 3 really needed if 2 works. 
I'm just thinking of the extra work needed to implement ICU - maybe the work is trivial.

Thanks,
Bruce

-----Original Message-----
From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Trevor Wekel
Sent: Wednesday, August 04, 2010 3:41 PM
To: MapGuide Internals Mail List
Subject: [mapguide-internals] std::string not thread safe on Linux

Hi everyone,

After some in-depth digging, I have determined that there is a fundamental difference between the implementation of std::string on Linux/GCC and std::string on Windows/Visual Studio 2008.  std::string on Linux uses an internally reference counted data structure to perform shallow copies of the string data during assignment or operator=().  On Windows, assignment and operator=() are deep copied so no information is shared between std::string instances.

The following GCC bug was logged years ago and was suspended due to performance implications "Lack of Posix compliant thread safety in std::basic_string" http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334.  In other words, std::string on Linux is absolutely not thread safe.

What does this mean to MapGuide?  Primarily, this issue comes into play with logging (access, trace, error, etc).  The log writing is performed on a separate thread and we use std::string to propagate information to that thread.   On Linux/GCC, the reference counted structure can be modified by both the logging thread and the worker thread simultaneously causing unexpected behaviour.  Unexpected behaviour takes the form of "glibc double free" and in some cases, a crash of the MapGuide Server process.
 
This is easily reproducible on servers processing very high operation rates with logging turned on.  For example, I can reproduce the "double free" in under five minutes when serving GETTILEIMAGE requests to 40+ simultaneous users on an 8 core box.  

How do we fix this?  We have a few options:

1.  Identify all areas where strings can propagate from one thread to another and recode them to avoid the propagation of std::string between threads.

2.  Replace std::string on Linux with something else that is thread safe.  On Linux/GCC there is another "string" implementation called versa_string defined in ext/vstring.h.  As far as I know, this implementation performs a deep copy like VS 2008 and should be safer.

3. Drop std::string altogether and use ICU on both Windows and Linux http://site.icu-project.org/.  The documentation seems to suggest that UnicodeString from ICU can be copied in a thread safe manner.  UnicodeString also uses an internally reference counted object so it should improve performance over the deep copied std::string.  As a side effect, this may boost performance on Windows.

I think option 1 will be difficult to achieve due to threading interactions an object caches in MapGuide.  Option 2 should be possible with some #ifdefs (I hope).  And option 3 might be an appropriate course of action for MapGuide 2.3.


So, what do we do?  I don't not believe we can release MapGuide 2.2 until this issue is resolved.


Regards,
Trevor 

_______________________________________________
mapguide-internals mailing list
mapguide-internals at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/mapguide-internals


More information about the mapguide-internals mailing list