[fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN.

Xumeng Chen Xumeng.Chen at autodesk.com
Tue May 5 20:34:20 EDT 2009

Yes, if I created a new SHP dataset through FDO in EN locale, a default .CPG file was generated with 1252 codepage in it. When I generate a dataset which contains Chinese characters in its attribute names, I have to use UTF-8(65001) codepage and not the codepage(1252) of system locale which will possibly do incorrect conversion. But how can I use the UTF-8 codepage in SHP provider? Change the source code? Because it seems that SHP provider has a globalization(UNICODE) defect, it can't support writing multi-characters into DBF file when the system locale didn't support this language(didn't have the codepage).

Currently, when users new a SHP dataset, provider will use the codepage in system locale(get the codepage through function setlocale(LC_ALL, "")). In the most time, doing this is correct, but in the special situation like in the English locale to write Chinese character, user will get only "????"...


From: Dan Stoica
Sent: Tuesday, May 05, 2009 10:16 PM
To: FDO Internals Mail List
Cc: Kenny Jian; John Jiang; Hunter Chen
Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN.

Does your SHP dataset include a .cpg file? If no, you should create one. If yes, then it should contain the Chinese codepage otherwise the multibyte conversion will default to the machine's locale.

As for 11 characters limit, there is nothing you can do since this comes from the DBF specification.

From: fdo-internals-bounces at lists.osgeo.org [mailto:fdo-internals-bounces at lists.osgeo.org] On Behalf Of Xumeng Chen
Sent: Tuesday, May 05, 2009 5:03 AM
To: fdo-internals at lists.osgeo.org
Cc: Kenny Jian; John Jiang; Hunter Chen
Subject: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN.

Hi All,
Recently, I am developing with SHP Provider to export an SHP file. During testing my code, I found sometimes the attributes of the features which contain Chinese Characters can't be written into the file correctly, and my system locale is English(US), the exportation failed, because Provider can't store correct attribute name in the DBF file.

After looking into the SHP provider code, I find that when provider wrote attribute into DBF file, the wide strings will be converted into multi-bytes with the current locale codepage. For example, in my machine the Chinese strings were converted with 1252 codepage which is not correct.

After finding this, I tried to modify the source code and hardcode the codepage to UTF-8, then the characters are written and recognized correctly, and it seems that all right but there is a limitation. The max length of attributes name in DBF is 11, if we convert all string with UTF-8, only 5 letters are support when users in FR/DE locale...

Does anyone have suggestion to my situation? Like how to work around it, or suggestion to my fixing?
Any suggestion is appreciated highly.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/fdo-internals/attachments/20090505/9493a91e/attachment.html

More information about the fdo-internals mailing list