[gdal-dev] encodings, locales ?

Frank Warmerdam warmerdam at pobox.com
Tue Aug 19 16:31:18 EDT 2008

Attila Csipa wrote:
> On Tuesday 19 August 2008 01.50:18 Mateusz Loskot wrote:
>> Some bits of the RFC-23 implementation has been already
>> submitted to the  SVN trunk. Try this queries to find detailed changesets:
>> http://trac.osgeo.org/gdal/search?q=RFC23&noquickjump=1&changeset=on
>> http://trac.osgeo.org/gdal/search?q=RFC+23&noquickjump=1&changeset=on
> OK, so no driver specific changes (yet), but I don't expect those until RFC23 
> is actually finished.


It would appear that the above queries miss the changes.  The
GML, PG (Postgres), ODBC and PGEO drivers have been updated as I
recall.  I'm not sure if I touched one or two others.

>>> Also, is there plan for
>>> doing proper localization or is it just internationalization for now ?
>> Could you explain what you mean as "proper localization" and what is
>> missing in the RFC-23 that would make it "proper"?
> From what I see, RFC-23 is internationalization, which means it makes it 
> possible to use it with languages other than the original (e.g. English) as 
> it would get mostly language-agnostic with utf8. This is a good first step 
> (with the arguably acceptable loss of information in some of the more obscure 
> encodings), but it would be really good to have user-defineable input/output 
> encoding, especially as not all formats define the encoding themselves (see 
> dbf, csv). The output encoding problem can especially escalate if your 
> original encoding was a non-latin one byte encoding. For example, if you have 
> a source WIN1251 Cyrillic encoding, the utf8 version would take roughly twice 
> the number of bytes, so your data would either get truncated or the field 
> length must be changed to accommodate.

I would be happy to have some drivers offer datasource or layer creation
options to specify the desired encoding in created files.  This would
be done on a driver by driver basis based on demand and interest.

I'd also like to see some sort of mechanism added to ogr2ogr that would
let us apply a transcoding - essentially this means we are assuming the input
dataset is not in UTF-8 but that this was not identifiable in the driver so
it has to be "fixed up" via user direction.  But I don't have any timeline to
do this.

> Localization is different as it also means functional difference in terms that 
> the app actually acts differently based on your locale/encoding. In the GIS 
> field this often relates to date/time format representation, but also the 
> textual representation of numbers (see thousand separator, comma, as csv/xml 
> imports can be especially pesky). 

To some degree OGR already honours normal locale mechanisms.  For instance,
I *think* ogrinfo will report numeric values with local conventions by
virtue of GetFieldAsString() using sprintf() or analogs without overriding
the inherited locale.  For the most part this is left up to applications.

 > Arguably projections could/should be part
> of this, too, but I understand that it could cause as many problems (even if 
> for the right reasons) for many existing installs.

There has been a deliberate effort (though not necessarily a comprehensive
effort) to support reading coordinate system definitions with numeric values
in unusual locales (ie. comma for decimal point), but it is not my desire
to report these values with local locale settings in WKT form if I can avoid
it as I think this causes problems.

> In the broadest sense this would mean the interface of the application itself 
> (so, if you have a locale set, the application communicates error and help 
> messages in the language and format appropriate for the given locale), but 
> this is really hard to coordinate and I guess not really expected from a 
> specialist tool like GDAL. 

I'm not particularly interested in providing error text in different languages,
though if there was sufficient interest I'd be ok with having the English
hardcoded into the library but processed via a mechanism that would allow
translation into other languages using translation files.

But realistically, it would be hard to do, and many of the errors barely
make sense in context in english.  Translation is likely to degrade that
and be challenging to maintain.

Best regards,
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent

More information about the gdal-dev mailing list