[gdal-dev] Motion: To adopt RFC 23: Unicode Support in OGR

Andrey Kiselev dron at ak4719.spb.edu
Fri Apr 25 10:33:04 EDT 2008


On Fri, Apr 25, 2008 at 08:59:16AM -0400, Frank Warmerdam wrote:
> I'm -1 on suddenly having CPLString do automatic recoding from the
> current locale to UTF-8, or from UTF-8 back to the current locale
> on removal.  This adds a hard to predict amount of extra work and
> it is likely to break lots of pieces of existing code that use CPLString
> on text that is not necessarily in the current locale.
> 
> I deliberately avoiding making the RFC 5 assumption that CPLString
> is UTF-8 internally at all times to avoid what could be a lot of
> risky and expensive transformations.
> 
> I also think it might be hard to get all CPLString methods to honour
> the same rules that you imply for the constructors.  For instance,
> you do not address the assignment operator but it would be perverse
> (to me) if CPLString X("abc") gave me a different string than assigning
> "abc" to an existing CPLString.

Frank,

There are no sudden recodings and all inherited methods will work
without changes. Recoding happens when you are calling constructor with
encoding specified. Everything that already worked will work exactly in
the same way, nothing changed and there will be no recoding in the old
code. For the code that requires recoding you just set the encoding name
in constructor and this is the only case when recoding happens (and in
GetAs() method, if you need string in some encoding other that UTF-8).
And you should not use assignment in case when you need recoding. But I
do not think it is a big deal, because usage of inherited operations is
limited and you need to be careful with that anyway. For example, []
operator has no good for multibyte encodings.

And do we want to move to UTF-8 internal representation? If yes then we
should not be afraid of transformations from/to UTF-8, because it will
be so at some point in the future anyway, it is unavoidable.

Also my API does not add more transformations than your API.

Your API:

"string in native encoding"->CPLString()->recode(to UTF-8)->c_str()->OGR field
OGR field->CPLString()->recode(to native)->c_str()->"string in native encoding"

My API:

"string in native encoding"->CPLString(to UTF-8)->c_str()->OGR field
OGR field->CPLString(UTF-8)->GetAs(to native)->"string in native encoding"

> Does my rationale make sense?  I'm particularly wanting to avoid any
> changes in the behavior of existing CPLString use without careful
> consideration.

Yes, I see your point. But my willing is to prepare ground for further
internationalization of GDAL. And tracking encodings separately from the
strings certainly is not a way in that direction!

Best regards,
Andrey

-- 
Andrey V. Kiselev
ICQ# 26871517


More information about the gdal-dev mailing list