[mapserver-users] Confirmation of status of UTF8 support, and where transcoding to Latin-1 may be happening.

Russell McOrmond russell at flora.ca
Sun Jan 4 18:00:30 EST 2009


Replying to an older message.

Howard Butler wrote on November 24, 2008 @ 05:17 PM:
> Yes, I was asking if this was an NSTRING column or a STRING column.  The 
> MapServer ArcSDE code makes an attempt to transcode the data if it is an 
> nstring column, but it is highly likely this is broken.  I didn't have 
> very extensive data to test this with, and all I looked to ensure was 
> that a few pictures looked right.  The problem might be as simple as the 
> function msConvertWideStringToUTF8 being broken.  Here's where MapServer 
> tries to convert it: 
> http://trac.osgeo.org/mapserver/browser/trunk/mapserver/mapsde.c#L750


The more I look at this, the more confused I get.

msConvertWideStringToUTF8 would in our case use iconv.

This means that:

msConvertWideStringToUTF8((const wchar_t*) wide, "UTF-16");

comes down to:

iconv_open("UTF-8", encoding);

where encoding is "UTF-16"

   The characters between 0xa0 and 0xff are the same for Latin-1 and 
UTF-8, so there must be something I don't know about that differentiates 
these strings as far as the browser is concerned.  The displayed page 
comes out correct if I force the browser (Firefox) to display as 
ISO-8859-1, but incorrect if I display as UTF-8 (what we need to use for 
policy reasons, and thus what the server says the encoding is).


   I have created a stopgap until we (likely not me, given my lack of 
familiarity at this point) figure out what is going.  It is a patch to 
msEncodeHTMLEntities to encode these characters.  As entities they will 
work as the browser won't care what encoding it thinks the page should 
be in.

   I added the patch here: http://trac.osgeo.org/mapserver/ticket/2842

> The reason that this is such a problem is that MapServer doesn't really 
> have its unicode/wide character strategy well defined. ArcSDE forces the 
> issue on us now because by default they're using nstring columns for 
> loading data by default, so the ArcSDE users are on the front lines of 
> this, but I think it will be more of a problem as time marches on.  

   Understood.  While my customer may not want me to spend much more 
time on this, I will be monitoring the list/etc to see if there are any 
patches I should be testing and sending feedback on.  In our specific 
case we are only encoding English and French, so have a much simpler 
situation than those encoding more complex languages.

> Questions the MapServer devs need to answer are:

Curious: not knowing who the devs are, did they get your feedback?

-- 
  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
  Please help us tell the Canadian Parliament to protect our property
  rights as owners of Information Technology. Sign the petition!
  http://www.digital-copyright.ca/petition/ict/

  "The government, lobbied by legacy copyright holders and hardware
   manufacturers, can pry my camcorder, computer, home theatre, or
   portable media player from my cold dead hands!"


More information about the mapserver-users mailing list