[mapserver-users] Encoding issues
Murty Maganti
MMaganti at oriongis.com
Wed Mar 4 08:27:08 PST 2009
Hi Tamas
This is still not working for some of the Asian languages.
I suspect the issue could be in this line of your sample code below
s.Append(Convert.ToChar(bytes[i]));
Here, one single byte is used to convert to a character. But my understanding is that UTF-8 can consume from 1 to 4 bytes to represent one character code point. It worked fine for Arabic may be because all Arabic characters can be represented using a single byte.
When I tried the same code below with Hindi, an Indian language, some of the characters are shown junk (but not all characters). I guess those characters which occupy more than a byte turned out to be junk.
I am also trying the opposite of the sample code below i.e. read a field value from map server (shapeObj.values), which is in Hindi, and show on a web page, again it turns out to be junk. I tried to look at the byte values of the string in VS by using
Byte[] bites = Encoding.Unicode.GetBytes(shapeObj.values[0])
I notice that they are actually code point of UTF-8 but interpreted as UTF-16 and may be the reason I see junk characters on web page. But I don’t know how to extract those UTF-8 byte values from UTF-16.
I am just giving sample code here to explain
byte[] utf16 = Encoding.Unicode.GetBytes("कीचनर"); //The text is in Hindi, an Indian language
byte[] utf8 = Encoding.UTF8.GetBytes("कीचनर");
shapeObj shape = layer.getFeature(result.shapeindex, result.tileindex);
string value = shape.values[1]; //This contains the same text (in Hindi) as above in the shape file.
byte[] bytes = Encoding.Unicode.GetBytes(value); //There are byte values of characters decoded from UTF-16. .Net internally stores all strings in UTF-16
Now if I examine the values of ‘utf8’ and ‘bytes’ arrays
utf8 – 224,164,149,224,165,128,224,164,154,224,164,168,224,164,176
bytes – 224,0,164,0,34,32,224,0,165,0,172,32,224,0,164,0,97,1,224,0,164,0,168,0,224,0,164,0,176,0
utf16 – 21,9,64,9,26,9,40,9,48,9
The first byte value is same as UTF-8. Second byte value is 0 as UTF-16 takes atleast 2 bytes for a character. This gives me impression that the byte values are in UTF-8 and are not converted to UTF-16 to by .Net.
Appreciate if you see any solution for this and let me know.
Thanks
Murty
From: Tamas Szekeres [mailto:szekerest at gmail.com]
Sent: Friday, February 06, 2009 6:59 PM
To: Murty Maganti
Cc: mapserver-users at lists.osgeo.org
Subject: Re: [mapserver-users] Encoding issues
You might have to make explicit conversion maually something like:
string value = "لققافعععىىةةونه"; //I actually get this (in arabic) through user input
byte[] bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1256), Encoding.Unicode.GetBytes(value));
StringBuilder s = new StringBuilder();
for (int i = 0; i < bytes.Length; i++)
s.Append(Convert.ToChar(bytes[i]));
shpObj.text = s.ToString();
Best regards,
Tamas
2009/2/6 Murty Maganti <MMaganti at oriongis.com>
HI
I am doing a simple thing. I have a map file and trying to show some static text in Arabic on map. You can try this with any map file as it is nothing to do with layers from map.
At run time (like on a button click), please add this
layerObj lyr = new layerObj(mapObj);
lyr.name = "TextAcetate";
lyr.status = mapscript.MS_ON;
lyr.type = MS_LAYER_TYPE.MS_LAYER_ANNOTATION;
lyr.labelcache = mapscript.MS_TRUE;
double locationX = 50;
double locationY = 50;
lyr.transform = (int)mapscript.MS_FALSE;
classObj layerClass = new classObj(lyr);
//All label properties
layerClass.label.size = 15;
layerClass.label.type = MS_FONT_TYPE.MS_TRUETYPE;
…
…
layerClass.label.encoding = "CP1256";
shapeObj shpObj = new shapeObj((int)MS_SHAPE_TYPE.MS_SHAPE_POINT);
lineObj lnObj = new lineObj();
pointObj pt = new pointObj(locationX, locationY, 0, 0);
lnObj.add(pt);
shpObj.add(lnObj);
shpObj.text = "لققافعععىىةةونه"; //I actually get this (in arabic) through user input
lyr.addFeature(shpObj);
mapObj.draw(); //Onto a picture box or save as file
(In the map file, my output format is set to GD/PNG)
Please let me know if you need more information.
Thanks
Murty
From: mapserver-users-bounces at lists.osgeo.org [mailto:mapserver-users-bounces at lists.osgeo.org] On Behalf Of Tamas Szekeres
Sent: Friday, February 06, 2009 4:12 PM
To: Murty Maganti
Cc: mapserver-users at lists.osgeo.org
Subject: Re: [mapserver-users] Encoding issues
Please send me your example so that I could examine what's going on.
Best regards,
Tamas
2009/2/6 Murty Maganti <MMaganti at oriongis.com>
Hi
I tried with the suggested encoding but still no success.
From the output below, I guess ICONV support is included.
E:\Utils\MapServer\Map Server 5.2 RC\ms4w\Apache\cgi-bin>mapserv -v
MapServer version 5.2.0 OUTPUT=GIF OUTPUT=PNG OUTPUT=JPEG OUTPUT=WBMP OUTPUT=PDF
OUTPUT=SWF OUTPUT=SVG SUPPORTS=PROJ SUPPORTS=AGG SUPPORTS=FREETYPE SUPPORTS=ICO
NV SUPPORTS=FRIBIDI SUPPORTS=WMS_SERVER SUPPORTS=WMS_CLIENT SUPPORTS=WFS_SERVER
SUPPORTS=WFS_CLIENT SUPPORTS=WCS_SERVER SUPPORTS=SOS_SERVER SUPPORTS=FASTCGI SUP
PORTS=THREADS SUPPORTS=GEOS SUPPORTS=RGBA_PNG INPUT=JPEG INPUT=POSTGIS INPUT=OGR
INPUT=GDAL INPUT=SHAPEFILE
Where can get some details on how to build the C# mapscript (Managed assembly only) from Visual Studio, keeping all unmanaged dlls from binaries from ms4w. I just want to give a try using MarshalAsAttribute.
Thanks
Murty
From: Tamas Szekeres [mailto:szekerest at gmail.com]
Sent: Friday, February 06, 2009 3:02 PM
To: Murty Maganti
Cc: mapserver-users at lists.osgeo.org
Subject: Re: [mapserver-users] Encoding issues
Hi,
You might want to try with encoding="ISO-8859-6" assuming you have libiconv compiled in.
The c# mapscript doesn't specify explicit conversion during the marshaling. In this case I assume an unicode to Charset.Ansi conversion will automatically takes place by default.
Best regards,
Tamas
2009/2/6 Murty Maganti <MMaganti at oriongis.com>
Hello
I am having some issues using Arabic text as labels. I am using C# map script. I am setting the following at runtime
labelObj label = classObj.label;
label.encoding = "CP1256";
label.text = "some text in Arabic"; (At rune time in VS, I can see the text is actually in Arabic)
But labels are displayed as '?????'.
Is there any conversion I need to do before setting the text value. How are the string represented in the underlying mapscript dll (ASCII or Unicode?). As I was reading in the MSDN, the default marshalling uses LPStr which is a single byte of ASCII. Does it mean that first I need to convert from Unicode to ASCII in C# before setting the value.
Appreciate any help.
Thanks
Murty
_______________________________________________
mapserver-users mailing list
mapserver-users at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/mapserver-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/mapserver-users/attachments/20090304/278748f6/attachment.htm>
More information about the MapServer-users
mailing list