[gdal-dev] Unicode support for GDAL/OGR

Even Rouault even.rouault at mines-paris.org
Wed Feb 11 16:43:02 EST 2009


Konstantin,

My understanding of the current state of GDAL 1.6.0 regarding those 2 RFCs 
is :

- RFC #5 has never been implemented, not even proposed to vote. The meaning of 
the terms 'development', 'adopted', etc is given at the bottom of the RfcList 
page

- RFC #23 has been implemented, at least partially. I can see that the 
CPLRecode API is implemented, the OLCStringsAsUTF8 capability as well. The 
GML driver is using the CPLRecode API. The GML and PostgreSQL both advertizes 
OLCStringsAsUTF8.
However, the shapefile driver hasn't yet received any updates regarding to 
encoding issues.

To answer your questions, RFC #23 only addresses OGR attribute values, not 
field names. It doesn't address either encoding of filenames (but RFC #5 
does). This is explained in the "Main concept" paragraph of RFC #23.

To sum up what RFC #5 says, I would say : "drivers should make all possible 
efforts to return strings attributes as UTF-8. When they are sure they do it, 
they can advertize OLCStringsAsUTF8. All drivers can assume that string 
attributes they receive will be UTF-8". So, the core has been implemented, 
but all drivers aren't necessary ready.

Best regards,

Even

Le Wednesday 11 February 2009 10:54:57 Konstantin Baumann, vous avez écrit :
> Hi!
>
>
>
> There are two RFCs regarding the Unicode issues:
>
>                 RFC #5: http://trac.osgeo.org/gdal/wiki/rfc5_unicode
>
>                 RFC #23: http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode
>
>
>
> RFC #23 seems to be adopted for GDAL 1.6. But to me it is not that clear,
> which functions/methods do:
>
> 1.       always return/accept UTF-8 strings (e.g. OGRFeature::SetField(),
> OGRFeature::GetFieldAsString() (or is this also based on the capability
> flag), any other functions?, what about GetFieldName()?)
>
> 2.       can return/accept UTF-8 and also Ansi-encoded strings, based on
> the capabilities of the underlying driver
>
> 3.       always return/accept Ansi-encoded strings (e.g. GDALOpen()?)
>
>
>
> The state of RFC #5 is "in development". What does this mean? It has been
> partially implemented, or it is currently in the implemented phase in a
> separate branch, to merge back the changes into the main development branch
> after a successful implementation?
>
>
>
> Thanks for clarification,
>
>     Kosta




More information about the gdal-dev mailing list