[gdal-dev] Unicode support in OGR Shape/DBF

Even Rouault even.rouault at mines-paris.org
Tue Sep 6 17:08:05 EDT 2011


Le mardi 06 septembre 2011 19:42:18, Hilda Villegas a écrit :
> Hi,
> 
> 
> 
> I'm trying to use the preliminary encoding support for shapefile/dbf as
> you said in the Ticket #882, SHAPE_ENCODING configuration variable can
> be used to override the interpretation, but I cannot find the valid
> values for this SHAPE_ENCODING anywhere, What value should I use if I
> want to write Unicode characters (UTF-8) in the DBF?
> 

To overrite the encoding when *writing*, you should the ENCODING layer 
creation option :

ogr2ogr out.shp indatasource -lco ENCODING=UTF-8

Note: in that case, the encoding of the input datasource must be already 
encoded in UTF-8, which is the pivot encoding for OGR. The effect of  -lco 
ENCODING=UTF-8 will be essentially to write a .cpg file with UTF-8 as its 
content. Apart from using a value of the form LDID/a_numeric_value where 
a_numeric_value is a value in the first column of table 9 of 
http://www.autopark.ru/ASBProgrammerGuide/DBFSTRUC.HTM (from 1 to 204), it is 
not entirely clear which other values are valid for the ENCODING parameter to 
have interoperability with other systems.

The SHAPE_ENCODING configuration option/environment variable is to be used when 
you want to override the encoding indicated in the .dbf/.cpg file when reading 
a shapefile. It can be set to any valid value recognized by the iconv library, 
whose list you can get with iconv -l on a system with iconv binaries 
installed. In that case, OGR will recode from SHAPE_ENCODING to UTF-8. If you 
want no recoding to happen, you can set SHAPE_ENCODING="".


More information about the gdal-dev mailing list