[gdal-dev] Shapefiles encoded in UTF-16 ?

Mikael Rittri Mikael.Rittri at carmenta.com
Thu Feb 21 04:39:29 PST 2019


Hello, list.

Recently, I noticed that ArcGIS software (at least since version 10.3) can produce shapefiles where the DBF file is encoded with UTF-16.
https://desktop.arcgis.com/en/arcmap/latest/extensions/production-mapping/converting-a-geodatabase-to-shapefiles.htm

But they have made it difficult to do so, since you need the "Production Mapping" license. Without that, produced shapefiles
will by default be in UTF-8; one can use some other code page by modifying a system registry setting dbfDefault, but there
doesn't seem to be any setting that will produce UTF-16.

I have never encountered a shapefile in UTF-16, but I am beginning to wonder if we ought to support them. I guess they would be
more space-efficient for languages like Chinese and Japanese, where most characters need three UTF-8 bytes but only two UTF-16
bytes. This could be important since DBF reserves only 10 bytes for field names.

Some questions:

Can the OGR Shape driver handle UTF-16?
More generally, are there many GIS systems that can handle UTF-16 in shapefiles?
Or perhaps I should just ask: has anyone ever seen a shapefile in UTF-16?
If so, would the content of the CPG file be always UTF-16LE or always UTF-16BE, or is it just UTF-16?
I suppose the only things encoded in UTF-16 would be the field values of type String, plus the field names?

(I also wonder if shapefiles in UTF-16 is a good idea, or if the GIS community just ought to forget about them,
but I guess there is no definite answer to that!)

Kind regards,

Mikael Rittri
Carmenta Geospatial Technologies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20190221/0aeb53c6/attachment.html>


More information about the gdal-dev mailing list