[fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN.

Xumeng Chen Xumeng.Chen at autodesk.com
Wed May 6 05:51:28 EDT 2009


It is a good idea to add extra property of codepage in connection string.

Thanks, 
Jimmy 

-----Original Message-----
From: Traian Stanev 
Sent: Wednesday, May 06, 2009 11:35 AM
To: Xumeng Chen
Cc: Kenny Jian; John Jiang; Hunter Chen; fdo-internals at lists.osgeo.org
Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN.


Hmmm, I guess you are right. So in your specific case you are looking to create an SHP file using FDO with an used-overridden value for the codepage. I don't know the details about the implementation of the codepage stuff in the SHP provider (someone else will probably speak to that), but one possible solution would be to allow for an optional connection property that specifies the override codepage to use when creating new SHP files. I don't know if that's the best way though.

Traian

________________________________________
From: Xumeng Chen
Sent: Tuesday, May 05, 2009 10:09 PM
To: Traian Stanev
Cc: Kenny Jian; John Jiang; Hunter Chen; fdo-internals at lists.osgeo.org
Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into      DBF Fails When the System locale is EN.

I find a point on ESRI.

http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&d=21106

"Shapefile can now be stored in UTF-8. However, Shapefile encoded in UTF-8 is only recognized in ArcGIS Desktop."

So can we do this?

Jimmy

-----Original Message-----
From: Traian Stanev
Sent: Wednesday, May 06, 2009 9:56 AM
To: Dan Stoica
Cc: Kenny Jian; John Jiang; Hunter Chen
Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN.


Hey guys,

The SHP format is not really supposed to be Unicode-capable (it's from the 1980s). The CPG file is essentially is a hack that ESRI added after the fact for ArcPad. Even if you change the source code to be able to write strings in UTF-8 format (effectively breaking the SHP standard), the resulting SHP file will not work in ESRI applications...

Traian

________________________________________
From: fdo-internals-bounces at lists.osgeo.org [fdo-internals-bounces at lists.osgeo.org] On Behalf Of Xumeng Chen
Sent: Tuesday, May 05, 2009 8:34 PM
To: fdo-internals at lists.osgeo.org; Dan Stoica
Cc: Kenny Jian; John Jiang; Hunter Chen
Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into      DBF Fails When the System locale is EN.

Yes, if I created a new SHP dataset through FDO in EN locale, a default .CPG file was generated with 1252 codepage in it. When I generate a dataset which contains Chinese characters in its attribute names, I have to use UTF-8(65001) codepage and not the codepage(1252) of system locale which will possibly do incorrect conversion. But how can I use the UTF-8 codepage in SHP provider? Change the source code? Because it seems that SHP provider has a globalization(UNICODE) defect, it can't support writing multi-characters into DBF file when the system locale didn't support this language(didn't have the codepage).

Currently, when users new a SHP dataset, provider will use the codepage in system locale(get the codepage through function setlocale(LC_ALL, "")). In the most time, doing this is correct, but in the special situation like in the English locale to write Chinese character, user will get only "????"...

Thanks,
Jimmy

From: Dan Stoica
Sent: Tuesday, May 05, 2009 10:16 PM
To: FDO Internals Mail List
Cc: Kenny Jian; John Jiang; Hunter Chen
Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN.

Does your SHP dataset include a .cpg file? If no, you should create one. If yes, then it should contain the Chinese codepage otherwise the multibyte conversion will default to the machine's locale.

As for 11 characters limit, there is nothing you can do since this comes from the DBF specification.


From: fdo-internals-bounces at lists.osgeo.org [mailto:fdo-internals-bounces at lists.osgeo.org] On Behalf Of Xumeng Chen
Sent: Tuesday, May 05, 2009 5:03 AM
To: fdo-internals at lists.osgeo.org
Cc: Kenny Jian; John Jiang; Hunter Chen
Subject: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN.

Hi All,
Recently, I am developing with SHP Provider to export an SHP file. During testing my code, I found sometimes the attributes of the features which contain Chinese Characters can't be written into the file correctly, and my system locale is English(US), the exportation failed, because Provider can't store correct attribute name in the DBF file.

After looking into the SHP provider code, I find that when provider wrote attribute into DBF file, the wide strings will be converted into multi-bytes with the current locale codepage. For example, in my machine the Chinese strings were converted with 1252 codepage which is not correct.

After finding this, I tried to modify the source code and hardcode the codepage to UTF-8, then the characters are written and recognized correctly, and it seems that all right but there is a limitation. The max length of attributes name in DBF is 11, if we convert all string with UTF-8, only 5 letters are support when users in FR/DE locale...

Does anyone have suggestion to my situation? Like how to work around it, or suggestion to my fixing?
Any suggestion is appreciated highly.

Thanks,
Jimmy


More information about the fdo-internals mailing list