[gdal-dev] UTF-8 String Support in GDALOpen() and OGRSFDriverRegistrar::Open()

Ivan ivan.lucena at pmldnet.com
Fri Sep 4 18:33:59 EDT 2009


Frank Warmerdam wrote:
> Even Rouault wrote:
>> Louis, Chaintanya,
>>
>> I just wanted to mention that the topic of encoding for filenames 
>> dealt by GDAL
>> or OGR is a known issue that has not been addressed yet. You can read
>> http://trac.osgeo.org/gdal/wiki/rfc5_unicode which was a proposal but 
>> has not
>> been implemented. Some infrastructure for re-encoding has been 
>> introduced during
>> the implementation of 
>> http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode (but
>> RFC23 only addresses the issue of encoding in OGR field values, not for
>> filenames)
>>
>> My understanding is that :
>> * on Windows the current API used by GDAL/OGR does not expect UTF8 or 
>> Unicode
>> but ANSI.
>> * on Linux systems, UTF-8 is now assumed
> 
> Folks,
> 
> I wonder if we should implement some mechanism to support UTF-8 filenames
> on windows (and generally) before GDAL 1.7 release?
> 
> How dangerous would it be for us to always assume filenames are UTF-8 and
> act accordingly?
> 
> One theoretical downside to treating filenames as UTF8 is that we do a lot
> of filename parsing that has no concept that some bytes in the name might
> be part of a multi-byte sequence.  So if there was a UTF8 multibyte
> character that happened to include ASCII 92 '\' or ASCII 47 '/' it would
> confuse the path parsers.  Also for subdatasets, database connections and
> other esoteric datasource names we do a lot of parsing - splitting on
> spaces, commas, quotes and other special characters.  Some of this could be
> confused by unfortunate UTF-8 characters.  I suppose we really ought to
> be migrating to doing these manipulations on wchar_t's or perhaps UCS-32
> arrays.
> 
> Hmm, this is getting rather complicated to address fully.
> 
> But at least as a hack we could provide a build or runtime mechanism to
> tell cpl_vsil_win32.cpp code that the passed in filename should be
> handled as UTF-8 instead of local code page characters on windows.  Would
> that be worth implementing?
> 
> Best regards,

Frank,

I would be in favor of a broader solution, even if it needs to wait until 1.8. INHO.

Ivan


More information about the gdal-dev mailing list