[gdal-dev] UTF-8 String Support in GDALOpen() and OGRSFDriverRegistrar::Open()

Even Rouault even.rouault at mines-paris.org
Tue Sep 1 14:50:22 EDT 2009


Selon Lodewijk Pool <louis.pool at gmail.com>:

> Even,
>
> I think I've figured this out, in my particular example I had a filename
> which contained the character 'é' (U+00E9), which in UTF-8 encoding is the
> two byte sequence 0xC3 0xA9. However, this character is also in the ANSI
> character set (233 decimal), which explains why passing a "normal" ANSI
> encoded C String to GDALOpen will open the file. If we instead try a
> filename with a character that is not in the ANSI character set, for example
> 'É™' (U+0259), then the function will not work (even) with a normal C String.
>
> So the current Win32 C/C++ API does not support UTF-8 encoded strings. Are
> you aware of any workarounds that may be available?

No, I'm afraid there's no workaround (apart from renaming the file to ANSI)
until someone implements RFC5 or something equivalent.

>
> Best Regards,
> Louis.
>
>
> On Mon, Aug 31, 2009 at 7:19 PM, Even Rouault
> <even.rouault at mines-paris.org>wrote:
>
> > Louis, Chaintanya,
> >
> > I just wanted to mention that the topic of encoding for filenames dealt by
> > GDAL
> > or OGR is a known issue that has not been addressed yet. You can read
> > http://trac.osgeo.org/gdal/wiki/rfc5_unicode which was a proposal but has
> > not
> > been implemented. Some infrastructure for re-encoding has been introduced
> > during
> > the implementation of http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode(but
> > RFC23 only addresses the issue of encoding in OGR field values, not for
> > filenames)
> >
> > My understanding is that :
> > * on Windows the current API used by GDAL/OGR does not expect UTF8 or
> > Unicode
> > but ANSI.
> > * on Linux systems, UTF-8 is now assumed
> >
> > Best regards,
> >
> > Even
> >
> > Selon Lodewijk Pool <louis.pool at gmail.com>:
> >
> > > Hi Chaitanya,
> > >
> > > I appreciate you taking the time to check. The TAB extension is MapInfo's
> > > vector file format. The odd thing is that I did exactly the same test as
> > you
> > > did, I renamed a GeoTiff file to the offending filename and tried the
> > normal
> > > Raster Driver and got the same problem. Still, as far as you aware these
> > > functions should support UTF-8 encoded strings?  There could possibly be
> > a
> > > peculiarity in the way I pack UTF-8 strings, though I am reasonably
> > certain
> > > that they are encoded correctly.
> > >
> > > Could you perhaps send me the code snippet you used to test the
> > > functionality (the part where you pass the string to GDALOpen). Do you
> > think
> > > there is a chance that my compiled version may differ from your own, i.e.
> > is
> > > it possible that I compiled a version of GDAL without UTF support?
> > >
> > > Best Regards,
> > > Louis.
> > >
> > > On Mon, Aug 31, 2009 at 6:35 PM, Chaitanya kumar CH
> > > <chaitanya.ch at gmail.com>wrote:
> > >
> > > > Louis,
> > > >
> > > > I couldn't reproduce the problem on my WinXP-32 system with vc8 with
> > locale
> > > > set to uk english. However, I used the filename on a GeoTiff file. I
> > > > couldn't identify the .TAB extension. I am not sure that is a problem.
> > > >
> > > > Some of the drivers may not handle non-ascii data but file names should
> > not
> > > > be a problem.
> > > >
> > > > If you don't find any problem at your application side, submit a bug
> > report
> > > > at http://trac.osgeo.org/gdal/
> > > >
> > > >
> > > > On Mon, Aug 31, 2009 at 8:02 PM, Lodewijk Pool <louis.pool at gmail.com
> > >wrote:
> > > >
> > > >> Hi Chaitanya,
> > > >>
> > > >> Yes, this is using the C/C++ API, the functions I am using are
> > declared in
> > > >> *gdal.h* and *ogrsf_frmts.h* respectively. I am using WinXP 32bit (UK
> > > >> English locale) and a version of GDAL 1.6.2 that I compiled for Win32
> > > using
> > > >> the supplied nmake script files for VC8. The specific filename that is
> > > >> causing me problems is this one; *"découpage_geographique.TAB"*. If I
> > > >> remove the 'é' character in that string and replace it with a normal
> > 'e'
> > > the
> > > >> file opens without any problems.
> > > >>
> > > >> Any help would be appreciated.
> > > >>
> > > >> Best Regards,
> > > >> Louis.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Aug 31, 2009 at 4:10 PM, Chaitanya kumar CH <chaitanya.ch@
> > > >> gmail.com> wrote:
> > > >>
> > > >>> Louis,
> > > >>>
> > > >>> GDAL/OGR usually supports utf-8 encoding. I just don't know where it
> > > >>> doesn't support.
> > > >>> Can you provide the details of the OS you are working on? Also, some
> > > >>> sample file names that caused you problems will come handy.
> > > >>> I presume you are working in C/C++.
> > > >>>
> > > >>> On Mon, Aug 31, 2009 at 6:37 PM, Lodewijk Pool
> > > <louis.pool at gmail.com>wrote:
> > > >>>
> > > >>>> Hi All,
> > > >>>>
> > > >>>> I'm having problems opening Raster and Vector Datasources that have
> > > >>>> filenames and paths with special characters. I'm using GDALOpen for
> > > Raster
> > > >>>> sources and OGRSFDriverRegistrar::Open() for Vector sources, the
> > strings
> > > I
> > > >>>> pass for the filenames are UTF-8 encoded. Does anyone know whether
> > these
> > > >>>> functions support UTF-8 encoding, and if not, whether there are any
> > > other
> > > >>>> API entry points that do support UTF-8 and/or UTF-16?
> > > >>>>
> > > >>>> Thank you in advance,
> > > >>>> Louis.
> > > >>>>
> > > >>>>  _______________________________________________
> > > >>>> gdal-dev mailing list
> > > >>>> gdal-dev at lists.osgeo.org
> > > >>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> Best regards,
> > > >>> --
> > > >>> Chaitanya kumar CH.
> > > >>>
> > > >>
> > > >>
> > > >
> > > >
> > > > Best regards,
> > > > --
> > > > Chaitanya kumar CH.
> > > >
> > >
> >
> >
> >
>




More information about the gdal-dev mailing list