[fdo-internals] UTF 8

Haris Kurtagic haris at sl-king.com
Sun Nov 7 05:32:41 EST 2010


Hi Trevor,

Goal of my proposal is to avoid encoding/decoding when data is already
stored as utf-8 as is case in few data stores we use now..

There are types of applications where utf-8 is preferred choice over
utf16 and my suggestion is simply to support both.

Regarding ICU library, I don't think FDO particularly needs it. FDO
providers doesn't do much with data, just pass it trough to the app,
the less FDO provider adds overhead the better it is.

Haris

On Sun, Nov 7, 2010 at 2:26 AM, Trevor Wekel
<trevor_wekel at otxsystems.com> wrote:
> Hi Haris,
>
> If you wanted to get really crazy, you could switch to ICU "International Components for Unicode".  It is a UTF-16 string library available on Windows and Linux.  This would be a major change and would provide much better internationalized capability around text strings.  It includes localized date/time/currency formatting, code page conversion, and collation.
>
> For example, I would expect that collation would allow FDO to sort strings according to the conventions and standards for a particular language, region, or country.
>
> From what I recall ICU also supports copy on write behaviour, an internal string reference, and stack allocation for refcounted wrapper.   It may not be quite as fast as passing a straight wchar_t* but it would simplify allocation and deallocation mechanisms.
>
> Other third party libraries C++ like Xerces,Xalan and boost also support ICU strings.
>
> UTF-16 can be a better processing format than UTF-8.  Many/(all?) European languages fit into a single UTF-16 (2 byte) character.  Windows also uses UTF-16 as it's native format.
>
> http://www.mail-archive.com/unicode@unicode.org/msg00156.html
>
> http://www.lingua-systems.com/knowledge/unicode-mappings/iso-8859-16-to-unicode.html
>
>
> Regards,
> Trevor
>
>
>
> -----Original Message-----
> From: fdo-internals-bounces at lists.osgeo.org [mailto:fdo-internals-bounces at lists.osgeo.org] On Behalf Of Haris Kurtagic
> Sent: November 6, 2010 7:43 AM
> To: FDO Internals Mail List
> Subject: [fdo-internals] UTF 8
>
> Hi,
>
> I would like to add support for UTF8 into FDO.
> Before writing RFC for it, I would like to discuss it with others.
>
> Primary goal of RFC would be to be able to get UTF8 strings from FDO.
> Right now there are data sources which stores their string as UTF8 (
> sqlite, sdf , ..). Providers are converting them into wide char in fdo
> readers function GetString.
> If application would want to use utf8 , it would need to convert from
> wide to utf8. For some data sources it means twice a conversion from
> utf8 -> wide -> utf8.
>
> One solution would be to add "const char* GetStringUtf8" to reader.
> Function could have default implemented in default reader (as it is
> done for property index access ) and would convert from current wide
> char to utf8.
> So, nothing would change for existing providers. Providers which have
> utf8 as native would then implement this function to gain speed.
>
> Thanks,
> Haris
> _______________________________________________
> fdo-internals mailing list
> fdo-internals at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/fdo-internals
>
>
> _______________________________________________
> fdo-internals mailing list
> fdo-internals at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/fdo-internals
>
>


More information about the fdo-internals mailing list