[mapserver-users] Encoding issues

Murty Maganti MMaganti at oriongis.com
Tue Mar 10 12:51:58 EDT 2009


Hi

 

As per the code documentation in the method msGetEncodedString (as shown below), the characters are assumed to be UTF-8 by default.

 

char *msGetEncodedString(const char *string, const char *encoding)

{

---

  if (len == 0 || (encoding && strcasecmp(encoding, "UTF-8")==0))

      return strdup(string);    /* Nothing to do: string already in UTF-8 */

 

Where as in the ‘values’ property of shapeObj.cs in C#, it is using System.Runtime.InteropServices.Marshal.PtrToStringAnsi  (to marshal characters from c to c#). Shoudn’t it be using System.Runtime.InteropServices.Marshal.PtrToStringUni method since as the charcters are by default held in UTF-8 encoding?

 

Thanks

Murty

 

From: Tamas Szekeres [mailto:szekerest at gmail.com] 
Sent: Wednesday, March 04, 2009 5:25 PM
To: Murty Maganti
Cc: mapserver-users at lists.osgeo.org
Subject: Re: [mapserver-users] Encoding issues

 

Hi,

I don't know much about the hindi character sets.
I guess you could extent that byte array to string copy function with arbitrary character sizes, like for double bytes something like:

for (int i = 0; i < bytes.Length; i=i+2)
                s.Append(Convert.ToChar(bytes[i] + 256*bytes[i+1]));

Best regards,

Tamas




2009/3/4 Murty Maganti <MMaganti at oriongis.com>

Hi Tamas 

 

This is still not working for some of the Asian languages. 

 

I suspect the issue could be in this line of your sample code below

s.Append(Convert.ToChar(bytes[i])); 

 

Here, one single byte is used  to convert to a character. But my understanding is that UTF-8 can consume from 1 to 4 bytes to represent one character code point. It worked fine for Arabic may be because all Arabic characters can be represented using a single byte. 

 

When I tried the same code below with Hindi, an Indian language, some of the characters are shown junk (but not all characters). I guess those characters which occupy more than a byte turned out to be junk.

 

I am also trying the opposite of the sample code below i.e. read a field value from map server (shapeObj.values), which is in Hindi, and show on a web page, again it turns out to be junk. I tried to look at the byte values of the string in VS by using 

 

Byte[] bites = Encoding.Unicode.GetBytes(shapeObj.values[0])

 

I notice that they are actually code point of UTF-8 but interpreted as UTF-16 and may be the reason I see junk characters on web page. But I don’t know how to extract those UTF-8 byte values from UTF-16.

 

I am just giving sample code here to explain

 

                byte[] utf16 = Encoding.Unicode.GetBytes("कीचनर"); //The text is in Hindi, an Indian language

                byte[] utf8 = Encoding.UTF8.GetBytes("कीचनर");

 

                shapeObj shape = layer.getFeature(result.shapeindex, result.tileindex);

                string value = shape.values[1]; //This contains the same text (in Hindi) as above in the shape file.

 

                byte[] bytes = Encoding.Unicode.GetBytes(value); //There are byte values of characters decoded from UTF-16. .Net internally stores all strings in UTF-16 

 

Now if I examine the values of ‘utf8’ and ‘bytes’ arrays

 

utf8 – 224,164,149,224,165,128,224,164,154,224,164,168,224,164,176

bytes – 224,0,164,0,34,32,224,0,165,0,172,32,224,0,164,0,97,1,224,0,164,0,168,0,224,0,164,0,176,0

utf16 – 21,9,64,9,26,9,40,9,48,9

 

The first byte value is same as UTF-8. Second byte value is 0 as UTF-16 takes atleast 2 bytes for a character. This gives me impression that the byte values are in UTF-8 and are not converted to UTF-16 to by .Net.

 

Appreciate if you see any solution for this and let me know. 

 

Thanks

Murty

From: Tamas Szekeres [mailto:szekerest at gmail.com] 
Sent: Friday, February 06, 2009 6:59 PM


To: Murty Maganti
Cc: mapserver-users at lists.osgeo.org
Subject: Re: [mapserver-users] Encoding issues

 

You might have to make explicit conversion maually something like:

            string value = "لققافعععىىةةونه"; //I actually get this (in arabic) through user input
            byte[] bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1256), Encoding.Unicode.GetBytes(value));
            StringBuilder s = new StringBuilder();
            for (int i = 0; i < bytes.Length; i++)
                s.Append(Convert.ToChar(bytes[i]));
            shpObj.text = s.ToString();

Best regards,

Tamas



2009/2/6 Murty Maganti <MMaganti at oriongis.com>

HI 

 

I am doing a simple thing. I have a map file and trying to show some static text in Arabic on map. You can try this with any map file as it is nothing to do with layers from map.

 

At run time (like on a button click), please add this

 

                layerObj lyr = new layerObj(mapObj);

                lyr.name = "TextAcetate";

                lyr.status = mapscript.MS_ON;

                lyr.type = MS_LAYER_TYPE.MS_LAYER_ANNOTATION;

                lyr.labelcache = mapscript.MS_TRUE;

 

                double locationX = 50;

                double locationY = 50;

 

lyr.transform = (int)mapscript.MS_FALSE;

 

classObj layerClass = new classObj(lyr);

 

//All label properties

layerClass.label.size = 15;

layerClass.label.type = MS_FONT_TYPE.MS_TRUETYPE;

…

…

layerClass.label.encoding = "CP1256";

 

 

                shapeObj shpObj = new shapeObj((int)MS_SHAPE_TYPE.MS_SHAPE_POINT);

                lineObj lnObj = new lineObj();

 

                pointObj pt = new pointObj(locationX, locationY, 0, 0);

                lnObj.add(pt);

 

                shpObj.add(lnObj);

 

                shpObj.text = "لققافعععىىةةونه"; //I actually get this (in arabic) through user input

 

                lyr.addFeature(shpObj);

 

mapObj.draw(); //Onto a picture box or save as file

 

(In the map file, my output format is set to GD/PNG)

 

Please let me know if you need more information.

 

Thanks

Murty

 

 

From: mapserver-users-bounces at lists.osgeo.org [mailto:mapserver-users-bounces at lists.osgeo.org] On Behalf Of Tamas Szekeres
Sent: Friday, February 06, 2009 4:12 PM


To: Murty Maganti
Cc: mapserver-users at lists.osgeo.org
Subject: Re: [mapserver-users] Encoding issues

 

Please send me your example so that I could examine what's going on.

Best regards,

Tamas

2009/2/6 Murty Maganti <MMaganti at oriongis.com>

Hi

 

I tried with the suggested encoding but still no success.

From the output below, I guess ICONV support is included.

 

E:\Utils\MapServer\Map Server 5.2 RC\ms4w\Apache\cgi-bin>mapserv -v

MapServer version 5.2.0 OUTPUT=GIF OUTPUT=PNG OUTPUT=JPEG OUTPUT=WBMP OUTPUT=PDF

 OUTPUT=SWF OUTPUT=SVG SUPPORTS=PROJ SUPPORTS=AGG SUPPORTS=FREETYPE SUPPORTS=ICO

NV SUPPORTS=FRIBIDI SUPPORTS=WMS_SERVER SUPPORTS=WMS_CLIENT SUPPORTS=WFS_SERVER

SUPPORTS=WFS_CLIENT SUPPORTS=WCS_SERVER SUPPORTS=SOS_SERVER SUPPORTS=FASTCGI SUP

PORTS=THREADS SUPPORTS=GEOS SUPPORTS=RGBA_PNG INPUT=JPEG INPUT=POSTGIS INPUT=OGR

 INPUT=GDAL INPUT=SHAPEFILE

 

Where can get some details on how to build the C# mapscript (Managed assembly only) from Visual Studio, keeping all unmanaged dlls from binaries from ms4w. I just want to give a try using MarshalAsAttribute.

 

Thanks

Murty

From: Tamas Szekeres [mailto:szekerest at gmail.com] 
Sent: Friday, February 06, 2009 3:02 PM
To: Murty Maganti
Cc: mapserver-users at lists.osgeo.org
Subject: Re: [mapserver-users] Encoding issues

 

Hi,

You might want to try with encoding="ISO-8859-6" assuming you have libiconv compiled in.
The c# mapscript doesn't specify explicit conversion during the marshaling. In this case I assume an unicode to Charset.Ansi conversion will automatically takes place by default.

Best regards,

Tamas



2009/2/6 Murty Maganti <MMaganti at oriongis.com>

Hello 

 

I am having some issues using Arabic text as labels. I am using C# map script. I am setting the following at runtime

 

labelObj label = classObj.label;

label.encoding = "CP1256";

label.text = "some text in Arabic"; (At rune time in VS, I can see the text is actually in Arabic)

 

But labels are displayed as '?????'.

 

 Is there any conversion I need to do before setting the text value. How are the string represented in the underlying mapscript dll (ASCII or Unicode?). As I was reading in the MSDN, the default marshalling uses LPStr which is a single byte of ASCII. Does it mean that first I need to convert from Unicode to ASCII in C# before setting the value.

 

Appreciate any help.

 

Thanks

Murty

 


_______________________________________________
mapserver-users mailing list
mapserver-users at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/mapserver-users

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/mapserver-users/attachments/20090310/49e6414a/attachment-0001.html


More information about the mapserver-users mailing list