[Gdal-dev] RFC DRAFT: Unicode support in GDAL

Ben Discoe ben at vterrain.org
Thu Sep 28 04:16:52 EDT 2006


I've just returned to the list to check on whether my UTF-8 proposal was
being considered.  Looks like it's quite gone from a molehill to a mountain
:|  Very sorry about provoking that.  My original proposal is quite small
and modest and easily accomplished..

Let's split this very clearly into two separate subjects:

1. GDAL to support all filenames by declaring paths passed in are UTF-8.
This is simple and critical (and i'm volunteering to implement and submit
it).

2. GDAL to try to convert all sorts of charsets on content strings in data
files.  This is complex, messy and non-critical.

> -----
> From: Andrey Kiselev
> Sent: Tuesday, September 26, 2006 10:57 PM
> 
> [..] talking about file names, but it is just 
> a part of the problem, and not the most significant part from 
> my point of view. I am thinking about whole GDAL i18n.

Andrey, i appreciate your enthusiasm for i18n :), but let me illustrate why
i believe ilenames are the more significant problem.

A. I have a file on my disk with Chinese characters in its name.  Currently,
i *cannot open* it with GDAL.

B. I have a bunch of SHP files on my disk with various charsets in them
(mostly UTF-8).  Currently, i can read and process all of them just fine,
because i know their encoding.  This is true around the world, e.g. Akio
knows when his SHP have Shift-JIS in them.

All content strings are supported now, because GDAL is low-level and
charset-agnostic (as one could argue, it should be).

> In particular, I want to have multilingual support in each 
> driver where it makes sense.

OK if you want to propose that, but i would consider it an entirely
different RFC than international filename support.

> I am agree that introduction of UTF-8 looks mostly as a hack, 
> not a clean solution

I would say it's actually quite elegant and clean..

> but it should quickly bring the new 
> functionality to us. If we will go toward wchar_t* way I am 
> afraid that the actual work will never be started. That way 
> is only acceptable if there will be volunteers willing to 
> help in wchar_t* transition, otherwise we have no chances to 
> complete it.

Agreed.

-Ben




More information about the Gdal-dev mailing list