[GRASS5] Re: Freetype failure

Tue Mar 28 16:00:38 EST 2006

Glynn Clements writes: 

>> UTF-8 represents the entire range of UCS.  Existing Japanese, Korean, 
>> Chinese (etc.) character encodings are encorporated in UCS and are 
>> represented by UTF-8.  That does not mean that everyone's software is 
>> delivering UTF-8 encoding, but the time when that happens is probably not 
>> too far off. 
> 
> That's wishful thinking. Most of the CJK world is quite happy to stick
> with their existing encodings regardless of how much western
> programmers would like them all to switch to Unicode.

Perhaps it is wishful thinking, but according to the document at 
http://www.cl.cam.ac.uk/~mgk25/unicode.html
China, Korea and Japan already have national standards based on UCS.  
Microsoft uses Unicode, which is similar. 

Don't confuse encoding with Unicode or UCS.  Unicode and UCS standardize the 
values that represent different glyphs.  Encoding determines how the values 
are stored and processed. 

> The main reason it is used in the FreeType code is that it's the
> simplest encoding to decode to an integer codepoint.

This is true for multibyte characters, but not single byte characters.  
Besides, I'm offering the decoder, so it shouldn't make a lot of difference 
whether it is more complex or not. 

>> It makes more sense to translate anything that isn't already encoded in 
>> UTF-8 into UTF-8, then decode UTF-8 to FreeType.  That way UTF-8 systems 
>> would not have to go through an encode-decode cycle. 
> 
> That's easier to program (all conversions other than UTF-8 to
> UCS-2/UCS-4 become the responsibility of the user), but it's a lot
> less useful (because the user has to explicitly convert everything).

Sorry if I mislead you.  My suggestion was that the code would retain 
convert_str and convert_str would use iconv to convert all user-supplied 
encodings to UTF-8 instead of to UCS-2BE as it does now.  Draw_text would 
decode UTF-8 to FT_ULong.  There would be no responsibility on the user that 
isn't there now.  Anything coming in from a UTF-8 system could skip 
convert_str. 

But now that you mention it, just using iconv to convert everything to 
UCS-4BE and casting that to FT_ULong might be a simpler solution yet.  That 
would leave iconv with the responsibility for checking the UTF-8 stream for 
malformed encodings.  I'm not sure how much of that checking iconv actually 
does. 

Roger Miller