[Gdal-dev] RFC 5: Unicode support in GDAL

Andrey Kiselev dron at ak4719.spb.edu
Mon Sep 25 12:25:42 EDT 2006


On Mon, Sep 25, 2006 at 10:27:53AM -0400, Frank Warmerdam wrote:
> 1) I don't think I like the constraint that CPLString's must always be in
>    UTF-8.  Instead, I think I would prefer CPLString know it's encoding,
>    and that the recoding operations change the encoding "in place" within
>    the CPLString.  The current recode() method still leaves us with the
>    "remember to free" issue I was hoping to avoid by moving some of this
>    work into the CPLString.

If encoding will be stored inside CPLString object you should always
remember what encoding assotiated with each object. I thought that
recode() method should return pointer to internally allocated buffer,
just like c_str() method does so there is no need to worry about its
freeing. 

> 2) How is the "int nEncoding" parameter enumerated?  Is there some
> well known universal list of encodings mapped to integers we would be
> using?

Well, I am still not know how to enumerate them in a best way, but one
of the existing lists can be choosen. This question closely related to
encoding software we will use. I am prefer ICU, it is big gun, but it
does a lot of things we need (or will need in the future). Iconv is also
a choice, but it has problems on platforms, other than glibc. Are there
any objections against ICU? String API should be defined finally only
after choosing the underlying software.

> 3) I don't understand your comment about having to call setlocale(LC_ALL,"")
>    in order to use non-ASCII characters.  Why?  I do think the RFC will need
>    to address how it relates to the previous work to force things into the
>    C locale at strategic points.  Perhaps that does not related since it
>    was really just about numeric locale processing?

setlocale(LC_ALL,"") call required to set process' locale to user's
current locale. Otherwise "C" will be used. RFC talks about conversions
between local and Unicode character sets, it is not possible without
knowlege of local system encoding, that is why we need that setlocale
call. Of course, later in the code setlocale(LC_ALL,NULL) should be
called to obtain actual value (it looks like ICU can do it itself).

Anyway, any good behaved program should do setlocale(LC_ALL,"") at
start, otherwise its behaviour will be a bit incorrect in localized
environments. Depending on program nature it can be heavily incorrect.

Best regards,
Andrey

-- 
Andrey V. Kiselev
ICQ# 26871517



More information about the Gdal-dev mailing list