Anyone good with character encodings (bug 1921)?

Tue Jun 12 01:34:23 EDT 2007

Steve Lime wrote:
> Hi all: I'm trying to address bug 1921 which deals with path
> following labels and extended character sets. I'm using the encoding
> code contributed by Orkney. It used libiconv to coverty from one
> character set to Unicode/UTF8. I've move that conversion to the right
> place so that it happens before label placement. So positioning
> relative to a point works now.
> 
> That's not the real problem. Character sets for many languages result
> in multibyte characters and the ANGLE FOLLOW code is assuming 1-byte
> per character. Anyone know how to determine the number of bytes per
> character in a string? Is it a constant value (looks like 3-bytes for
> the Big5-HKSCS data sample I have now) for Unicode post iconv? I'm
> flying blind here. I think I know where to fix code but just not sure
> how...
> 
> Steve

Steve,

utf8 is a variable number of bytes per character, it is not a fixed 
number. Here is a url with some code that might help:

http://209.85.165.104/search?q=cache:F6MrrkWhxMIJ:publib.boulder.ibm.com/infocenter/iadthelp/v6r0/topic/com.ibm.etools.iseries.langref.doc/rzan5mst164.htm+utf8+multibyte+character+lengths&hl=en&ct=clnk&cd=10&gl=us
http://czyborra.com/utf/  good explaination and enough to right a scan 
algorithm to count the number of bytes for a given character.

-Steve