[Gdal-dev] RFC DRAFT: Unicode support in GDAL

Andrey Kiselev andrey.kiselev at gmail.com
Wed Sep 27 04:57:06 EDT 2006


On 9/26/06, Marek Brudka <mbrudka at aster.pl> wrote:

> > On Mon, Sep 25, 2006 at 06:45:13PM -0400, Frank Warmerdam wrote:
> > I am seconded on this. For me it looks like 8-bit UTF-8 is the same as
> > wide chars in terms of multilingual support.
> Sure, one may encode everything in UTF8 as well as in wide chars. But
> the convention to use UTF-8 encoding in plain strings is only
> *implicite*  and may be violated at the runtime. Wide chars area
> *explicite* and enable partial compile-time validation of i18n handling
> as well as clean distiction between i18n aware and not aware interfaces.

OK, I see the point. I am agree that this is better solution for
programming interfaces.
But see below.


> Rewriting GDAL/OGR to use wide chars is pointless. Too much effort for
> GDAL developers as well as GDAL users. It is better to provide some
> additional interfaces for wide chars eg.
>
> class OGRSFDriverRegistrar
> {
>     static OGRDataSource *Open( const char *pszName, int bUpdate=FALSE,
>                                 OGRSFDriver ** ppoDriver = NULL );
>
>     OGRDataSource *OpenShared( const char *pszName, int bUpdate=FALSE,
>                                OGRSFDriver ** ppoDriver = NULL );
>
>     static OGRDataSource *Open( const wchar_t *pszName, int bUpdate=FALSE,
>                                 OGRSFDriver ** ppoDriver = NULL );
>
>     OGRDataSource *OpenShared( const wchar_t *pszName, int bUpdate=FALSE,
>                                OGRSFDriver ** ppoDriver = NULL );
> }

Marek, you are still talking about file names, but it is just a part
of the problem, and not the most significant part from my point of
view. I am thinking about whole GDAL i18n. In particular, I want to
have multilingual support in each driver where it makes sense.
Actually we need not only these several functions, but much more. I
think that almost every GDAL/OGR function which takes char* argument
should have its wide-character counterpart. Well, it is more or less
mechanical work to add a bunch of additional interfaces, but what to
do with drivers? Converting to UTF-8 is relatively simple and
non-intrusive and all functionality will be preserved even with
untouched code. Introduction of wchar_t* strings will take much more
efforts, because _every_ driver must be converted in order to support
this.

I am agree that introduction of UTF-8 looks mostly as a hack, not a
clean solution, but it should quickly bring the new functionality to
us. If we will go toward wchar_t* way I am afraid that the actual work
will never be started. That way is only acceptable if there will be
volunteers willing to help in wchar_t* transition, otherwise we have
no chances to complete it.

Best regards,
Andrey

-- 
Andrey V. Kiselev
ICQ# 26871517



More information about the Gdal-dev mailing list