[gdal-dev] Usage of GDAL_FILENAME_IS_UTF8 config option

Ray Gardener rayg at daylongraphics.com
Fri Jan 13 15:08:54 PST 2017


Just to add a small note regarding OS X (and iOS) and UTF-8 filenames: 
The HFS+ filesystem stores accented characters in decomposed form, which 
can differ from the filename given to an API that creates the file such 
as fopen(..., "wb").

Applications that store a filename (e.g. into a preference or MRU file) 
might store it in precomposed form, which won't match what the 
filesystem uses and risks a file-not-found error. e.g. on iOS, text 
input gives strings in precomposed form.

More info is available at
http://stackoverflow.com/questions/6153345/different-utf8-encoding-in-filenames-os-x

Ray



On 1/9/2017, Monday 2:03 AM, Damian Dixon wrote:
> Hi Victor,
>
> If you set GDAL_FILENAME_IS_UTF8 to YES then you need to pass in 
> filenames and paths encoded as UTF8.
>
> This means that on Windows you will need to do additional work to 
> convert from MBCS or UTF16/UCS2 to UTF8.
>
> If your application is built as MBCS then what you essentially have is 
> a multi-byte string encoding which is the Windows local code page.
>
> If your application is built for Unicode then you have UTF16/UCS2 so 
> you have to convert the filenames to UTF8 for GDAL/OGR to work.
>
> If you save the filenames as part of an application specific 
> configuration then you need to consider how you will read read that 
> data back in if the Windows code page changes. This is not an easy 
> task unless you also save the code page as well. It also becomes a bit 
> of a mess supporting this on non-Windows.
>
> The approach we took was to convert our Windows applications to 
> Unicode and store/use all paths/filenames as UTF8 for portability to 
> Linux/Android/Solaris.
>
> Regards
> Damian
>
> PS. Microsoft has deprecated MBCS build of MFC.
>
>
>
> On 9 January 2017 at 09:31, Poughon Victor <Victor.Poughon at cnes.fr 
> <mailto:Victor.Poughon at cnes.fr>> wrote:
>
>     Hi,
>
>     We are using GDAL in OTB and recently we had a bug report about
>     opening non
>     ASCII filenames on Windows 10 [0]. They suggest a fix using:
>
>     > CPLSetConfigOption("GDAL_FILENAME_IS_UTF8","NO");
>
>     The test case is GDALOpen() on a file named 你好.tif, which I
>     confirmed works
>     fine on Linux, but not on Windows 7 or 10.
>
>     So my question is to have some clarification on this option, to
>     know if it's
>     potentially the correct fix for this problem. The doc says:
>
>     > This effectively restores the pre-GDAL1.8 behavior for handling
>     filenames on
>     > Windows and might be appropriate for applications that treat
>     filenames as
>     > being in the local encoding.
>
>     What does it mean exactly to consider filenames to be in the local
>     encoding? And
>     how do I know if my application [1] does that?
>
>     Cheers,
>
>     [0] https://github.com/orfeotoolbox/OTB/pull/14
>     <https://github.com/orfeotoolbox/OTB/pull/14>
>     [1]
>     https://github.com/janestar/OTB/blob/f6ffdc17ab3d7aa91726f03ed619fee806eb508a/Modules/IO/IOGDAL/src/otbGDALDriverManagerWrapper.cxx#L55
>     <https://github.com/janestar/OTB/blob/f6ffdc17ab3d7aa91726f03ed619fee806eb508a/Modules/IO/IOGDAL/src/otbGDALDriverManagerWrapper.cxx#L55>
>
>     Victor Poughon
>
>
>
>
>     _______________________________________________
>     gdal-dev mailing list
>     gdal-dev at lists.osgeo.org <mailto:gdal-dev at lists.osgeo.org>
>     http://lists.osgeo.org/mailman/listinfo/gdal-dev
>     <http://lists.osgeo.org/mailman/listinfo/gdal-dev>
>
>
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20170113/dc86f266/attachment.html>


More information about the gdal-dev mailing list