[gdal-dev] GDAL/OGR C# wrapper and UTF8

Dennis Gocke dengo at gmx.net
Tue Apr 2 02:43:41 PDT 2013


Hi,

I have a few questions regarding the C# wrapper and UTF8 encoded strings.


This is mostly an issue when working with OGR. If I understand correctly strings are supposed to be encoded in UTF8 internally. Some drivers might not follow that rule, but most do, is that correct?


The problem that I have is that almost always the C# wrapper assumes strings to be ANSI encoded.


Only in some cases, when the wrapper handles filepaths, it uses the correct UTF8 encoding, and even then the workaround is a bit strange. For instance when ‘converting’ a string to OGR it uses Encoding.Default.GetString(Encoding.UTF8.GetBytes(utf8_path)). So it essentially converts the path to a ‘wrongly’ encoded string, so that when it gets marshaled to unmanaged code which uses ANSI encoding by default, it again results to the correct bytes of a UTF8 encoded string. The reason for the strange workaround probably is that .NET does not support marshaling of strings to unmanaged UTF8 encoded strings, but I think it still would be better to not use automatic marshaling of strings at all and instead convert the string to a correct byte array in .NET and then pass the byte array/pointer to unmanaged code. The same for the reverse direction, get pointer from unmanaged code and convert to managed string using the correct UTF8 encoding.


But as I said above the main problem is that in most cases the default ANSI encoding is used instead of UTF8. Some examples are Layer.SetAttributefilter, Layer.ExecuteSQL, FieldDefn.SetName, FieldDefn.GetName…


Because I don’t really understand how SWIG works, I have created my own .NET C# wraps for these methods using the C API of OGR. That works fine but I thought that others might have the same problems, so I decided to post this.


Also some methods still seem to expect ASCII/ANSI encoded string, is there an easy way to determine what encoding the method expects?
For instance CPLSetConfigOption and everything that has to do with SpatialReference don’t work correctly when using UTF8 encoded strings (Although WKT of SpatialReferences shouldn’t normally contain non-ASCII characters and then of course it does not make a difference.)


Kind regards,
Dennis


More information about the gdal-dev mailing list