[fdo-internals] UTF 8

Trevor Wekel trevor_wekel at otxsystems.com
Sat Nov 6 21:26:42 EDT 2010


Hi Haris,

If you wanted to get really crazy, you could switch to ICU "International Components for Unicode".  It is a UTF-16 string library available on Windows and Linux.  This would be a major change and would provide much better internationalized capability around text strings.  It includes localized date/time/currency formatting, code page conversion, and collation.

For example, I would expect that collation would allow FDO to sort strings according to the conventions and standards for a particular language, region, or country.

From what I recall ICU also supports copy on write behaviour, an internal string reference, and stack allocation for refcounted wrapper.   It may not be quite as fast as passing a straight wchar_t* but it would simplify allocation and deallocation mechanisms.

Other third party libraries C++ like Xerces,Xalan and boost also support ICU strings.

UTF-16 can be a better processing format than UTF-8.  Many/(all?) European languages fit into a single UTF-16 (2 byte) character.  Windows also uses UTF-16 as it's native format.

http://www.mail-archive.com/unicode@unicode.org/msg00156.html

http://www.lingua-systems.com/knowledge/unicode-mappings/iso-8859-16-to-unicode.html


Regards,
Trevor



-----Original Message-----
From: fdo-internals-bounces at lists.osgeo.org [mailto:fdo-internals-bounces at lists.osgeo.org] On Behalf Of Haris Kurtagic
Sent: November 6, 2010 7:43 AM
To: FDO Internals Mail List
Subject: [fdo-internals] UTF 8

Hi,

I would like to add support for UTF8 into FDO.
Before writing RFC for it, I would like to discuss it with others.

Primary goal of RFC would be to be able to get UTF8 strings from FDO.
Right now there are data sources which stores their string as UTF8 (
sqlite, sdf , ..). Providers are converting them into wide char in fdo
readers function GetString.
If application would want to use utf8 , it would need to convert from
wide to utf8. For some data sources it means twice a conversion from
utf8 -> wide -> utf8.

One solution would be to add "const char* GetStringUtf8" to reader.
Function could have default implemented in default reader (as it is
done for property index access ) and would convert from current wide
char to utf8.
So, nothing would change for existing providers. Providers which have
utf8 as native would then implement this function to gain speed.

Thanks,
Haris
_______________________________________________
fdo-internals mailing list
fdo-internals at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/fdo-internals



More information about the fdo-internals mailing list