[gdal-dev] UTF-8 String Support in GDALOpen()
and OGRSFDriverRegistrar::Open()
Ivan
ivan.lucena at pmldnet.com
Fri Sep 4 18:33:59 EDT 2009
Frank Warmerdam wrote:
> Even Rouault wrote:
>> Louis, Chaintanya,
>>
>> I just wanted to mention that the topic of encoding for filenames
>> dealt by GDAL
>> or OGR is a known issue that has not been addressed yet. You can read
>> http://trac.osgeo.org/gdal/wiki/rfc5_unicode which was a proposal but
>> has not
>> been implemented. Some infrastructure for re-encoding has been
>> introduced during
>> the implementation of
>> http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode (but
>> RFC23 only addresses the issue of encoding in OGR field values, not for
>> filenames)
>>
>> My understanding is that :
>> * on Windows the current API used by GDAL/OGR does not expect UTF8 or
>> Unicode
>> but ANSI.
>> * on Linux systems, UTF-8 is now assumed
>
> Folks,
>
> I wonder if we should implement some mechanism to support UTF-8 filenames
> on windows (and generally) before GDAL 1.7 release?
>
> How dangerous would it be for us to always assume filenames are UTF-8 and
> act accordingly?
>
> One theoretical downside to treating filenames as UTF8 is that we do a lot
> of filename parsing that has no concept that some bytes in the name might
> be part of a multi-byte sequence. So if there was a UTF8 multibyte
> character that happened to include ASCII 92 '\' or ASCII 47 '/' it would
> confuse the path parsers. Also for subdatasets, database connections and
> other esoteric datasource names we do a lot of parsing - splitting on
> spaces, commas, quotes and other special characters. Some of this could be
> confused by unfortunate UTF-8 characters. I suppose we really ought to
> be migrating to doing these manipulations on wchar_t's or perhaps UCS-32
> arrays.
>
> Hmm, this is getting rather complicated to address fully.
>
> But at least as a hack we could provide a build or runtime mechanism to
> tell cpl_vsil_win32.cpp code that the passed in filename should be
> handled as UTF-8 instead of local code page characters on windows. Would
> that be worth implementing?
>
> Best regards,
Frank,
I would be in favor of a broader solution, even if it needs to wait until 1.8. INHO.
Ivan
More information about the gdal-dev
mailing list