[gdal-dev] LDID and .CPG in OGR shapefile driver

Francis Markham fmarkham at gmail.com
Wed Jul 14 01:38:41 EDT 2010


As discussed in
http://lists.osgeo.org/pipermail/gdal-dev/2010-May/024619.html and
http://lists.osgeo.org/pipermail/gdal-dev/2010-July/025192.html OGR's
shapefile driver does not allow the shapefile's codepage to be set or
retrieved using the DBF LDID byte or an *.cpg file.

This functionality is implemented in recent shapelib releases, when
creating a new shapefile.

Issue #882 http://trac.osgeo.org/gdal/ticket/882 addresses this issue,
but the discussion there largely predates RFCs 5 and 23 (
http://trac.osgeo.org/gdal/wiki/rfc5_unicode and
http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode ).

I would be interested in exposing this shapelib feature in OGR.
However, there are a number of design decisions to make:

1) Should encoding retrieval and setting be an OGR wide feature, or
one specific to the shapefile driver?

2) Should encodings be specified as a string or an enumeration of
well-known encodings?  If encoding retrieval and setting occurs only
at the shapefile driver level, then a string that mimics shapelib's
API might be sensible (if the codepage is set to "LDID/n" and -1 < n <
255 then the ldid byte of the dbf is set to the n, otherwise the whole
codepage string is written to the .CPG file).  Otherwise, commonsense
would suggest a standardised enum of encodings might be the way to go.

3) What should the API be?  A patch at issue #882 creates two new
OGRLayer member functions, GetEncoding() and SetEncoding(), and a
GetEncoding() implementation for shapefiles (although it fails to
allow the encoding to be set, as far as I can see).   As far as I can
see, this has some potential problems:
   a) It exposes these functions for all layers regardless of driver,
which may or may not be desirable.
   b) It assumes that encoding can be set by the layers.  Using
shapelib, the only way to set the encoding is when the DBF is created.
An alternative to the SetEncoding() function might to use a dataset or
layer creation option.  However, given that AFAIK OGR doesn't support
metadata in the same way GDAL does, a means of retrieving the encoding
would need to be paired with this.


Is this the appropriate place to have this discussion?  I would be
happy to provide a patch implementing this feature however it is
deemed most appropriate.


Kind Regards,

Francis Markham


More information about the gdal-dev mailing list