[gdal-dev] Re: Writing non-ASCII characters to shapefile

Francis Markham fmarkham at gmail.com
Tue Jul 6 22:56:49 EDT 2010


Obviously it is easy enough to write the .CPG and overwrite a byte in
the DBF in application code.  However, I would rather use the OGR API
for this.  Would a patch providing this functionality be welcomed?

If I were to implement this, would a "dataset creation option" in the
shapefile driver be the appropriate API for this?  All it would need
do is expose the existing functionality in shapelib.

Regards,

Francis

On 7 July 2010 02:44, Hermann Peifer <peifer at gmx.eu> wrote:
> In the mentioned mail from earlier this year, William Kyngesburye wrote:
>
>> Maybe GDAL needs a creation option in the shapefile driver
>> to set the LDID or to instead add a cpg file with an encoding value.
>
> As there was no reply to his mail, I assume that there is no GDAL/OGR
> creation option for the LDID and it is unlikely that there will be one in
> the near future.
>
> Hermann
>
>
> On 06/07/2010 13:04, Francis Markham wrote:
>>
>> It appears from
>> http://lists.osgeo.org/pipermail/gdal-dev/2010-May/024619.html and
>> http://resources.arcgis.com/content/kbase?fa=articleShow&d=26015  that
>> ESRI set the LDID to zero to indicate an unknown LDID.  From my
>> reading of the OGR sourcecode, it seems that OGR uses the shapelib
>> default LDID of 0x57 (previously it was set to 0x3, or windows-1252).
>> Is there any way to specify this value using the OGR API, or do I need
>> to use shapelib directly to create my shapefiles in order to do this?
>>
>> Cheers,
>>
>> Francis
>>
>> On 6 July 2010 18:31, Hermann Peifer<peifer at gmx.eu>  wrote:
>>>
>>> The .cpg files we generate through ArcGIS desktop contain the string:
>>> UTF-8
>>>
>>> Some time ago, there was a mail on this list about problems in case of
>>> conflicting information in the .cpg file compared to the Language Driver
>>> ID
>>> (LDID) in the header of a dBASE file, see:
>>> http://lists.osgeo.org/pipermail/gdal-dev/2010-May/024619.html
>>>
>>> Hermann
>>>
>>> On 06/07/2010 04:13, Francis Markham wrote:
>>>
>>> Okay, I will take that approach then.  Thank you all for your help.
>>>
>>> What specific value should I write into the .cpg? The string '65001'
>>> or the string 'utf-8' or something else?
>>>
>>> -Francis
>>>
>>> On 5 July 2010 21:53, Peter Hopfgartner<peter.hopfgartner at r3-gis.com>
>>> wrote:
>>>
>>>
>>> Hi Francis,
>>>
>>> what does not portable mean? ArcMap handles UTF-8 fine, if the correct
>>> encoding is written into the .cpg file. Recent shapelib should handle
>>> this fine, too. If there is any problem with a specific GIS program, a
>>> bug report for that GIS program might be the right thing to do.
>>>
>>> Regards,
>>>
>>> Peter
>>>
>>> On Mon, 2010-07-05 at 19:18 +1000, Francis Markham wrote:
>>>
>>>
>>> At the bottom of this page, for one:
>>> http://resources.arcgis.com/content/kbase?fa=articleShow&d=21106
>>>
>>> But honestly I've found hard to find  information about this.  I'd be
>>> very happy to be corrected if this is not the case!
>>>
>>> Cheers,
>>>
>>> Francis
>>>
>>> On 5 July 2010 19:09, Hermann Peifer<peifer at gmx.eu>  wrote:
>>>
>>>
>>> Francis, you wrote:
>>>
>>>
>>>
>>> I have heard that the use of UTF-8 in shapefiles is not portable.
>>>
>>>
>>> Where did you hear this?
>>>
>>> Regards, Hermann
>>>
>>>
>>> On 03/07/2010 04:40, Francis Markham wrote:
>>>
>>>
>>> Hi there,
>>>
>>> I'm trying to write data from a Microsoft Excel .xls file into a
>>> shapefile, using OGR's Python bindings in Python 2.6.  This is going
>>> well, but I am having some problems when I try to write values that
>>> contain so-called "smart quotes".  Smart quotes are special
>>> characters, defined as characters 0x91 through 0x94 in Windows-1252 (
>>> see http://msdn.microsoft.com/en-au/goglobal/cc305145.aspx ).
>>>
>>> What is the best way to save this data to a shapefile using OGR?  I
>>> need the shapefile to be interoperable with other programs, including
>>> but not limited to ESRI products. While I assume I could simply
>>> translate these characters to standard ASCII, I would prefer not to if
>>> possible.  I also haven't tested the shapefiles with data from other
>>> character encodings.
>>>
>>> I have heard that the use of UTF-8 in shapefiles is not portable.  I
>>> am also aware that shapefile.cpg can store a shapefile's codepage.  I
>>> don't know how to put these pieces together to create a portable
>>> solution, however.
>>>
>>> Apologies if this is a newbie question, but I can't find answers on the
>>> web.
>>>
>>> Thanks,
>>>
>>> Francis Markham
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>>>
>>> --
>>> Dott. Peter Hopfgartner
>>>
>>> R3 GIS Srl - GmbH
>>> Via Johann Kravogl-Str. 2
>>> I-39012 Meran/Merano (BZ)
>>> Email: peter.hopfgartner at r3-gis.com
>>> Tel. : +39 0473 494949
>>> Fax  : +39 0473 069902
>>> www  : http://www.r3-gis.com
>>>
>>> XING : http://www.xing.com/go/invita/8917535
>>>
>>>
>>>
>
>


More information about the gdal-dev mailing list