[Gdal-dev] RFC DRAFT: Unicode support in GDAL
Mateusz Loskot
mateusz at loskot.net
Thu Sep 21 16:41:50 EDT 2006
Frank Warmerdam wrote:
> I persume ArcGIS is using some custom flag to keep track of this. If
> we can figure out what they did, we could also honour it.
OK, I think I'm close to know how ArcGIS stores codepage
information in Shapefile.
AFAIK so far, there are two variants:
1. Language driver ID stored in the header of DBF file.
It's 29th byte, 1 byte.
2. There is an associated file with the same name as other Shapefile
files, but with .CPG extension, i.e. countries.shp and countries.cpg
Here I found some sample Shapefile file that includes .cpg:
http://www.unc.edu/courses/2006spring/geog/070/001/mkjohnso/Lab%209/?C=M;O=A
There is Shapefile.prj, Shapefile.shp, Shapefile.cpg, etc.
The .cpg file simply stores codepage identitfier
Here is a list of possible/all (?) codepage identifiers + some
helpful explanation;
http://www.forumsig.org/archive/index.php/t-439.html
How ArcGIS handles codepage using these two indicators above?
<http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&d=21106>
"When opening a shapefile and dBASE file in ArcGIS Desktop, the Desktop
programs look at the Language Driver ID (LDID) in the header of a dBASE
file, or an associated *.CPG file, which are both used to define the
code page, in order to determine the code page of the file that is read."
I'm not 100%, but it seems this information makes it possible to
implement codepage support for Shapefile, at least ;-)
So, UTF-8 may be supported as well.
Cheers
--
Mateusz Loskot
http://mateusz.loskot.net
More information about the Gdal-dev
mailing list