[gdal-dev] RFC 30: Unicode Filenames - call for discussion

Mark Overmeer mark at overmeer.net
Wed Sep 15 07:23:22 EDT 2010


* Ari Jolma (ari.jolma at gmail.com) [100915 10:49]:
> On 09/15/2010 06:22 AM, Frank Warmerdam wrote:
> >A client has asked me to support unicode filenames on windows.  To
> >that end
> >I have constructed an RFC for migration to treating all filesnames in the
> >GDAL API as utf-8.
> >
> >  http://trac.osgeo.org/gdal/wiki/rfc30_utf8_filenames
> >
> >I'd appreciate review and comment.  If all is well I hope to call
> >for a vote on this RFC late this week.
> 
> My observation is that I can open data sources with non-ascii
> filenames in Linux but not in Windows using the Perl bindings.
> There's a bug in the Perl bindings to tell Perl that those same
> filenames when read back from GDAL are (I guess) utf-8.

In UNIX/Linux, the charset used for filenames can differ per filesystem.
This means, in practice, that the charset is undefined; sometimes you
can find the charset for a filesys in /etc/fstab, sometimes only in the
filesys documentation. There is no systemcall which can tell you that.
It would be a nice addition to statfs().
What if you move a file between filesystems with different encodings?

> Does Windows use utf-8 for filenames? If so, then fixing the back to
> utf-8 bug, would also work for windows, I guess.

WINDOWS uses UTF16 with a subset of Unicode (all chars are two
bytes). See http://en.wikipedia.org/wiki/NTFS

Perl treats filenames as sequence of bytes, where a [/\:] have
a special meaning. You cannot convert filenames safely into utf8,
because they may already be in utf8 (or something else than latin1)
-- 
CU,
               MarkOv


More information about the gdal-dev mailing list