[mapserver-users] Confirmation of status of UTF8 support, and where transcoding to Latin-1 may be happening.

Howard Butler hobu.inc at gmail.com
Mon Nov 24 17:17:20 EST 2008


On Nov 24, 2008, at 7:40 AM, Russell McOrmond wrote:

>
> On Sat, 22 Nov 2008, Russell McOrmond wrote:
>
>> I have a customer that is using MapServer, talking through ArcSDE  
>> to an Oracle database.
>
>  Chris Whitteker is working on the same project, and I would like to  
> apologize for duplicating the question.
>
>  Howard Butler replied to Chris and suggested that the transcoding  
> from UTF-8 to Latin-1 is most likely happening within ArcSDE.  We  
> have some peole here that are experts on SDE, but want to avoid the  
> finger-pointing (IE: They say the problem is in mapserver, and this  
> list suggests the problem is in SDE).

The problem is most definitely MapServer's, not ArcSDE's.  If I  
suggested otherwise, I was wrong.

>
>
>  Can I get a quick confirmation that Mapserver supports UTF8  
> encoding, and doesn't internally transcode to Latin-1?

It supports UTF8 for *rendering* with GD and iconv.  I don't think  
MapServer internally supports UTF8 for string comparisons (ie,  
FILTERs, queries, etc), which is why you're seeing problems doing a  
GetFeatureInfo.

> Does it have the option to transcode that we might have  
> inadvertantly enabled?  This would help us know that we should focus  
> our investigation on the settings of SDE and not settings within  
> mapserver.
>
>  Howard Butler then asked "Do you know what the SOIL_ORDER_NAME_FR  
> column type is defined as?"

Yes, I was asking if this was an NSTRING column or a STRING column.   
The MapServer ArcSDE code makes an attempt to transcode the data if it  
is an nstring column, but it is highly likely this is broken.  I  
didn't have very extensive data to test this with, and all I looked to  
ensure was that a few pictures looked right.  The problem might be as  
simple as the function msConvertWideStringToUTF8 being broken.  Here's  
where MapServer tries to convert it: http://trac.osgeo.org/mapserver/browser/trunk/mapserver/mapsde.c#L750

The reason that this is such a problem is that MapServer doesn't  
really have its unicode/wide character strategy well defined. ArcSDE  
forces the issue on us now because by default they're using nstring  
columns for loading data by default, so the ArcSDE users are on the  
front lines of this, but I think it will be more of a problem as time  
marches on.  Questions the MapServer devs need to answer are:

- how are strings to be internally represented in MapServer (lots of  
things to balance by answering this question)?
- who is responsible for transforming the data?  The driver for both  
the in and out?  Have a smarter string object that carries its own  
encoding around?
- Unicode is a conspiracy :)

Sorry I'm not much help,

Howard


More information about the mapserver-users mailing list