[gdal-dev] Re: Handling CPG (encoding) file

Peter J Halls P.Halls at york.ac.uk
Wed May 26 02:22:17 EDT 2010


   re use case 1) below.  One instance where the raw data format might be 
required is where that encoding contains characters that are not present in an 
output encoding and specific action is to be taken to render such characters in 
a transmittable form in the output character set.

Best wishes,


Even Rouault wrote:
> Alexander,
> I'm cc'ing Gaige Paulsen as he proposed in 
> http://trac.osgeo.org/gdal/ticket/3403 a patch with a similar approach to 
> yours, that is to say provide a method at the OGRLayer level to return the 
> encoding.
> The more I think to this issue the more I recognize that the "UTF-8 everywhere 
> internally" is probably not practical in all situations, or at least doesn't 
> let enough control to the user. The UTF-8 as a pivot is - conceptually - OK 
> for the read part, but it doesn't help for the write part when a driver 
> doesn't support UTF-8 (or if for some compatibility reasons with other 
> software, we must write data in a certain encoding)
> My main remark about your patch is I don't believe that the enum approach to 
> list the encodings is the best one. I'd be rather in favor of using a string, 
> and possibly sticking to the ones returned by 'iconv -l' so that we can 
> easily use the return of GetEncoding() to feed it into the converter through 
> CPLRecode(). I've experimented with it some time ago and have ready some 
> changes in cpl_recode_stub.cpp & configure to plug iconv support into it, in 
> order to extend its scope beyond the current hardcoded support for UTF8 and 
> ISO-8859-1.
> We could imagine a -s_encoding, -t_encoding and -a_encoding switches to 
> ogr2ogr to let the user define the transcoding or encoding assignment. One of 
> the difficulty raised by Gaige in #3403 is the meaning of the width attribute 
> of an OGRFieldDefn object (number of bytes or number characters in a given 
> encoding), and how/if it will be affected by an encoding change.
> The other issues raised by Gaige in his last comment are still worth 
> considering. For the read part, what do we want ? :
> 1) that the driver returns the data in its "raw" encoding and mentions the 
> encoding --> matches the approach of your proposal
> 2) that we ask it to return the data to UTF-8 when we don't care about the 
> data in its source encoding
> 3) that we can override its encoding when the source encoding is believed to 
> be incorrect so that 2) can work properly
> 1) and 2) approach are clearly following 2 differents tracks. One way to 
> reconcile both would be to provide some configuration/opening option to 
> choose which behaviour is prefered. RFC23 currently chooses 2) as it mandates 
> that "Any driver which knows it's encoding should convert to UTF-8." Well, 
> probably not a big deal since that any change related to how we deal with 
> encoding is likely to cause RFC23 to be amended anyway.
> Personnaly, I'm not sure about which one is the best. I'm wondering what the 
> use cases for 1) are : when do we really want the data to be returned in its 
> source encoding --> will not be it converted later to UTF-8 at the 
> application level after the user has potentially selected/overriden the 
> source encoding ? In which case 3) would solve the problem. Just thinking 
> loud...
> For the write part, a OGRSFDriver::GetSupportedEncodings() and 
> OGRLayer::SetEncoding() could make sense (for the later, if it must be 
> exposed at the datasource or layer level is an open point and a slight 
> difference between yours and Gaige's approach)
> Best regards 
> Even
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

Peter J Halls, GIS Advisor, University of York
Telephone: 01904 433806     Fax: 01904 433740
Snail mail: Computing Service, University of York, Heslington, York YO10 5DD
This message has the status of a private and personal communication

More information about the gdal-dev mailing list