[gdal-dev] Motion: To adopt RFC 23: Unicode Support in OGR

Frank Warmerdam warmerdam at pobox.com
Fri Apr 25 12:38:24 EDT 2008


Andrey Kiselev wrote:

> There are no sudden recodings and all inherited methods will work
> without changes. Recoding happens when you are calling constructor with
> encoding specified.

Andrey,

My mistake.  I saw:

CPLString( const char *pszString, const char *pszEncoding = NULL );

and thought that when pszEncoding was NULL it was going to treat it
as being in the current locale, but on closer reading of your original
post I see that was only true when pszEncoding was "".

You also wrote "What I finally suggesting is to assume that the
internal encoding either UTF-8 when you are constructing string
specifying encoding or unknown (as it is now) and then you are
keeping encoding of the string somwhere else."  But I disagree
with assuming that values of CPLString are always UTF-8.  There
are lots of places where I construct CPLStrings from text without
having any idea what the encoding is.  As you note, normal assigment
with "=" or CPLString constructors with no encoding argument will
not result in any conversion to UTF-8. So how have we achieved
your goal of having CPLString always be UTF-8?

I think we have not, we are only fooling ourselves then to make
an assertion that CPLString contents are UTF-8.

> And do we want to move to UTF-8 internal representation? If yes then we
> should not be afraid of transformations from/to UTF-8, because it will
> be so at some point in the future anyway, it is unavoidable.

I do not mind doing recoding where needed, but one way or the other
the application code needs to be keeping track of where it is needed.

> Yes, I see your point. But my willing is to prepare ground for further
> internationalization of GDAL. And tracking encodings separately from the
> strings certainly is not a way in that direction!

As I mentioned before, I would not mind a CPLString subclass that is
intentionally always UTF-8, or perhaps that even carries it's encoding
along.  I just don't think we can apply this logic to CPLString and
ensure that CPLString contents are always UTF-8 without careful review
of all existing code.

If you feel strongly enough about this you can -1 the RFC, but I
am not willing to modify it to be based on the assumption that CPLString
is always in UTF-8.

I claim that the proposed CPLString changes I propose do not in any way
prevent more sophisticated internationalization support in the future.
And they give us a portable mechanism to do recoding when needed.

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | President OSGeo, http://osgeo.org



More information about the gdal-dev mailing list