[Gdal-dev] Wide-character filenames with GDAL file IO?

Andrey Kiselev dron at ak4719.spb.edu
Wed Sep 20 05:04:52 EDT 2006


On Tue, Sep 19, 2006 at 11:33:40AM -1000, Ben Discoe wrote:
> > The major drawback of this change is additional requirement: we will
> > need an Unicode library to replace all string handling functions
> > from the C lib with the Unicode aware ones.
> 
> Not necessarily.  This really only affects filepath/filename strings.
> Since UTF-8 and ASCII are identical over the range of ASCII, and
> string subsets which GDAL could conceivably need to manipulate such as
> file extentions ("*.tif") are ASCII, nothing breaks by declaring that
> all filenames passed to GDAL must be UTF-8.
> 
> I have found this to be true with the entire VTP software, which sits
> above GDAL.  It now uses UTF-8 internally for all filesystem strings,
> and has not encountered the need for any special Unicode library.
> 
> > This problem was raised one time in the past, probably this is a
> > time when we should prepare RFC for GDAL localization.  This is a
> > major change, this change can broke functionality, so it is painful,
> > but inevitable.
> 
> I suspect is not so major.  I'm willing to tackle the task, if Frank
> et al.  are interested.

Ben,

My concern is not only a file names, but the Unicode support for all
GDAL/OGR strings.  Personally I would like to see localized field names
in OGR and localized metadata in GDAL. But the filename issue is a good
starting point.

I think it should be done in the following steps:

1. Pick up the Unicode implementation.
2. Implement appropriate CPL layer for Unicide strings.
3. Convert all related CPL functions to use this new CPL layer. Note,
that filenames often embedded in more complex structures , so we should
be sure that all functions, responsible for parsing or serialization,
work correct with UTF-8 strings.

-- 
Andrey V. Kiselev
ICQ# 26871517



More information about the Gdal-dev mailing list