[gdal-dev] UTF-8 String Support in GDALOpen() and OGRSFDriverRegistrar::Open()

Lodewijk Pool louis.pool at gmail.com
Mon Aug 31 19:05:37 EDT 2009


Even,

I think I've figured this out, in my particular example I had a filename
which contained the character 'é' (U+00E9), which in UTF-8 encoding is the
two byte sequence 0xC3 0xA9. However, this character is also in the ANSI
character set (233 decimal), which explains why passing a "normal" ANSI
encoded C String to GDALOpen will open the file. If we instead try a
filename with a character that is not in the ANSI character set, for example
'ə' (U+0259), then the function will not work (even) with a normal C String.

So the current Win32 C/C++ API does not support UTF-8 encoded strings. Are
you aware of any workarounds that may be available?

Best Regards,
Louis.


On Mon, Aug 31, 2009 at 7:19 PM, Even Rouault
<even.rouault at mines-paris.org>wrote:

> Louis, Chaintanya,
>
> I just wanted to mention that the topic of encoding for filenames dealt by
> GDAL
> or OGR is a known issue that has not been addressed yet. You can read
> http://trac.osgeo.org/gdal/wiki/rfc5_unicode which was a proposal but has
> not
> been implemented. Some infrastructure for re-encoding has been introduced
> during
> the implementation of http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode(but
> RFC23 only addresses the issue of encoding in OGR field values, not for
> filenames)
>
> My understanding is that :
> * on Windows the current API used by GDAL/OGR does not expect UTF8 or
> Unicode
> but ANSI.
> * on Linux systems, UTF-8 is now assumed
>
> Best regards,
>
> Even
>
> Selon Lodewijk Pool <louis.pool at gmail.com>:
>
> > Hi Chaitanya,
> >
> > I appreciate you taking the time to check. The TAB extension is MapInfo's
> > vector file format. The odd thing is that I did exactly the same test as
> you
> > did, I renamed a GeoTiff file to the offending filename and tried the
> normal
> > Raster Driver and got the same problem. Still, as far as you aware these
> > functions should support UTF-8 encoded strings?  There could possibly be
> a
> > peculiarity in the way I pack UTF-8 strings, though I am reasonably
> certain
> > that they are encoded correctly.
> >
> > Could you perhaps send me the code snippet you used to test the
> > functionality (the part where you pass the string to GDALOpen). Do you
> think
> > there is a chance that my compiled version may differ from your own, i.e.
> is
> > it possible that I compiled a version of GDAL without UTF support?
> >
> > Best Regards,
> > Louis.
> >
> > On Mon, Aug 31, 2009 at 6:35 PM, Chaitanya kumar CH
> > <chaitanya.ch at gmail.com>wrote:
> >
> > > Louis,
> > >
> > > I couldn't reproduce the problem on my WinXP-32 system with vc8 with
> locale
> > > set to uk english. However, I used the filename on a GeoTiff file. I
> > > couldn't identify the .TAB extension. I am not sure that is a problem.
> > >
> > > Some of the drivers may not handle non-ascii data but file names should
> not
> > > be a problem.
> > >
> > > If you don't find any problem at your application side, submit a bug
> report
> > > at http://trac.osgeo.org/gdal/
> > >
> > >
> > > On Mon, Aug 31, 2009 at 8:02 PM, Lodewijk Pool <louis.pool at gmail.com
> >wrote:
> > >
> > >> Hi Chaitanya,
> > >>
> > >> Yes, this is using the C/C++ API, the functions I am using are
> declared in
> > >> *gdal.h* and *ogrsf_frmts.h* respectively. I am using WinXP 32bit (UK
> > >> English locale) and a version of GDAL 1.6.2 that I compiled for Win32
> > using
> > >> the supplied nmake script files for VC8. The specific filename that is
> > >> causing me problems is this one; *"découpage_geographique.TAB"*. If I
> > >> remove the 'é' character in that string and replace it with a normal
> 'e'
> > the
> > >> file opens without any problems.
> > >>
> > >> Any help would be appreciated.
> > >>
> > >> Best Regards,
> > >> Louis.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Aug 31, 2009 at 4:10 PM, Chaitanya kumar CH <chaitanya.ch@
> > >> gmail.com> wrote:
> > >>
> > >>> Louis,
> > >>>
> > >>> GDAL/OGR usually supports utf-8 encoding. I just don't know where it
> > >>> doesn't support.
> > >>> Can you provide the details of the OS you are working on? Also, some
> > >>> sample file names that caused you problems will come handy.
> > >>> I presume you are working in C/C++.
> > >>>
> > >>> On Mon, Aug 31, 2009 at 6:37 PM, Lodewijk Pool
> > <louis.pool at gmail.com>wrote:
> > >>>
> > >>>> Hi All,
> > >>>>
> > >>>> I'm having problems opening Raster and Vector Datasources that have
> > >>>> filenames and paths with special characters. I'm using GDALOpen for
> > Raster
> > >>>> sources and OGRSFDriverRegistrar::Open() for Vector sources, the
> strings
> > I
> > >>>> pass for the filenames are UTF-8 encoded. Does anyone know whether
> these
> > >>>> functions support UTF-8 encoding, and if not, whether there are any
> > other
> > >>>> API entry points that do support UTF-8 and/or UTF-16?
> > >>>>
> > >>>> Thank you in advance,
> > >>>> Louis.
> > >>>>
> > >>>>  _______________________________________________
> > >>>> gdal-dev mailing list
> > >>>> gdal-dev at lists.osgeo.org
> > >>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> Best regards,
> > >>> --
> > >>> Chaitanya kumar CH.
> > >>>
> > >>
> > >>
> > >
> > >
> > > Best regards,
> > > --
> > > Chaitanya kumar CH.
> > >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/gdal-dev/attachments/20090901/d19c4312/attachment-0001.html


More information about the gdal-dev mailing list