[Gdal-dev] RFC DRAFT: Unicode support in GDAL

Mateusz Loskot mateusz at loskot.net
Thu Sep 21 14:46:11 EDT 2006


Andrey Kiselev wrote:
> On Thu, Sep 21, 2006 at 05:44:28PM +0200, Mateusz Loskot wrote:
>> Is my understanding correct that we won't reimplement GDAL drivers,
>> for example Shape to accept UTF-8?
>> So, strings will be converted to/from ASCII when reading/writing strings
>> into GDAL internal buffers, to UTF-8 ?
> 
> ...
> 
>>> For file format drivers the string representation should be worked out on
>>> per-driver basis. If driver need to parse ASCII text there is no need to
>>> convert strings to UTF-8 until they will be passed to GDAL functions.
>> I see, now my questions from above have been answered.
>> Thought, I still think drivers should also support Unicode, at least
>> OGR drivers, to be able to deal with i18n'ized
>> strings in feature attributes.
> 
> You are absolutely right. Moreover, it is my primary goal to add Unicode
> support to PG driver (I want localized table column names).

Andrey,

Now, it's clear for me. Thanks!
Yes, I also think it could be a good idea to have localized
attributes stored in Shapefile, MapInfo files
and - what's quite obvious - GML, KML and CSV.
There are also ESRI Personal GeoDatbase (here I'm not sure about UTF-8),
Oracle and MySQL databases which also can store Unicode strings and then
we could use UTF-8 encoding on client side.

I used ISO-8859-2 ASCII subset in Shapefile files to be able
to store polish strings, but I believe it should be
possible to store UTF-8. What do you think? Frank?

> But not all file formats support non-ASCII characters.

Yes, I suppose.
May be we could try to compose a list of formats that are
Unicode-friendly and on what conditions.
What do you think?

> For example, various .HDR
> labeled rasters are just 7-bit ASCII text files and it is not a good
> idea to write 8-bit strings in such a files.

Absolutely, right.

> When you need to pass strings, extracted from such file outside the driver
> (e.g., in SetMetadata() call), you should convert them to UTF-8. If you just want
> to use extracted strings internally in driver, there is no need in any
> conversions.

Yes, it's how I imagine it should work.

Cheers
-- 
Mateusz Loskot
http://mateusz.loskot.net



More information about the Gdal-dev mailing list