Anyone good with character encodings (bug 1921)?
Stephen Woodbridge
woodbri at SWOODBRIDGE.COM
Tue Jun 12 01:34:23 EDT 2007
Steve Lime wrote:
> Hi all: I'm trying to address bug 1921 which deals with path
> following labels and extended character sets. I'm using the encoding
> code contributed by Orkney. It used libiconv to coverty from one
> character set to Unicode/UTF8. I've move that conversion to the right
> place so that it happens before label placement. So positioning
> relative to a point works now.
>
> That's not the real problem. Character sets for many languages result
> in multibyte characters and the ANGLE FOLLOW code is assuming 1-byte
> per character. Anyone know how to determine the number of bytes per
> character in a string? Is it a constant value (looks like 3-bytes for
> the Big5-HKSCS data sample I have now) for Unicode post iconv? I'm
> flying blind here. I think I know where to fix code but just not sure
> how...
>
> Steve
Steve,
utf8 is a variable number of bytes per character, it is not a fixed
number. Here is a url with some code that might help:
http://209.85.165.104/search?q=cache:F6MrrkWhxMIJ:publib.boulder.ibm.com/infocenter/iadthelp/v6r0/topic/com.ibm.etools.iseries.langref.doc/rzan5mst164.htm+utf8+multibyte+character+lengths&hl=en&ct=clnk&cd=10&gl=us
http://czyborra.com/utf/ good explaination and enough to right a scan
algorithm to count the number of bytes for a given character.
-Steve
More information about the mapserver-dev
mailing list