[Gdal-dev] Wide-character filenames with GDAL file IO?

Ben Discoe ben at vterrain.org
Tue Sep 19 17:33:40 EDT 2006


> -----
> From:Andrey Kiselev
> Sent: Monday, September 11, 2006 1:06 AM
> 
> I think the simplest way to add multibyte support to GDAL is 
> using UTF-8 as an internal character set. All strings and 
> filenames should be passed in UTF-8 encoding and properly 
> converted to UCS-16 when needed (_wopen on Windows).

I agree this seems like a workable solution.

> The major drawback of this change is additional
> requirement: we will need an Unicode library to replace all 
> string handling functions from the C lib with the Unicode aware ones.

Not necessarily.  This really only affects filepath/filename strings.  Since
UTF-8 and ASCII are identical over the range of ASCII, and string subsets
which GDAL could conceivably need to manipulate such as file extentions
("*.tif") are ASCII, nothing breaks by declaring that all filenames passed
to GDAL must be UTF-8.

I have found this to be true with the entire VTP software, which sits above
GDAL.  It now uses UTF-8 internally for all filesystem strings, and has not
encountered the need for any special Unicode library.

> This problem was raised one time in the past, probably this 
> is a time when we should prepare RFC for GDAL localization. 
> This is a major change, this change can broke functionality, 
> so it is painful, but inevitable.

I suspect is not so major.  I'm willing to tackle the task, if Frank et al.
are interested.

-Ben




More information about the Gdal-dev mailing list