Anyone good with character encodings (bug 1921)?

Tue Jun 12 10:55:03 EDT 2007

Cool, thanks. A bit of reading last night led me to the same conclusion but not the code sample although
I found others. GD doesn't support wide characters so that must be happening deeper into Freetype.

Steve

>>> On 6/12/2007 at 12:34 AM, in message <466E305F.8050303 at swoodbridge.com>,
Stephen Woodbridge <woodbri at SWOODBRIDGE.COM> wrote:
> Steve Lime wrote:
>> Hi all: I'm trying to address bug 1921 which deals with path
>> following labels and extended character sets. I'm using the encoding
>> code contributed by Orkney. It used libiconv to coverty from one
>> character set to Unicode/UTF8. I've move that conversion to the right
>> place so that it happens before label placement. So positioning
>> relative to a point works now.
>> 
>> That's not the real problem. Character sets for many languages result
>> in multibyte characters and the ANGLE FOLLOW code is assuming 1-byte
>> per character. Anyone know how to determine the number of bytes per
>> character in a string? Is it a constant value (looks like 3-bytes for
>> the Big5-HKSCS data sample I have now) for Unicode post iconv? I'm
>> flying blind here. I think I know where to fix code but just not sure
>> how...
>> 
>> Steve
> 
> Steve,
> 
> utf8 is a variable number of bytes per character, it is not a fixed 
> number. Here is a url with some code that might help:
> 
> http://209.85.165.104/search?q=cache:F6MrrkWhxMIJ:publib.boulder.ibm.com/inf 
> ocenter/iadthelp/v6r0/topic/com.ibm.etools.iseries.langref.doc/rzan5mst164.ht
> m+utf8+multibyte+character+lengths&hl=en&ct=clnk&cd=10&gl=us
> http://czyborra.com/utf/  good explaination and enough to right a scan 
> algorithm to count the number of bytes for a given character.
> 
> 
> -Steve