[gdal-dev] GDAL/OGR C# wrapper and UTF8

Tamas Szekeres szekerest at gmail.com
Tue Apr 2 03:52:53 PDT 2013


2013/4/2 Dennis Gocke <dengo at gmx.net>

> Hi,
>
> I have a few questions regarding the C# wrapper and UTF8 encoded strings.
>
>
> This is mostly an issue when working with OGR. If I understand correctly
> strings are supposed to be encoded in UTF8 internally. Some drivers might
> not follow that rule, but most do, is that correct?
>
>
> The problem that I have is that almost always the C# wrapper assumes
> strings to be ANSI encoded.
>
>
> Only in some cases, when the wrapper handles filepaths, it uses the
> correct UTF8 encoding, and even then the workaround is a bit strange. For
> instance when ‘converting’ a string to OGR it uses
> Encoding.Default.GetString(Encoding.UTF8.GetBytes(utf8_path)). So it
> essentially converts the path to a ‘wrongly’ encoded string, so that when
> it gets marshaled to unmanaged code which uses ANSI encoding by default, it
> again results to the correct bytes of a UTF8 encoded string. The reason for
> the strange workaround probably is that .NET does not support marshaling of
> strings to unmanaged UTF8 encoded strings, but I think it still would be
> better to not use automatic marshaling of strings at all and instead
> convert the string to a correct byte array in .NET and then pass the byte
> array/pointer to unmanaged code. The same for the reverse direction, get
> pointer from unmanaged code and convert to managed string using the correct
> UTF8 encoding.
>
>
Hi Dennis,

I don't exactly follow your statements, it might be reasonable to get some
examples to make sure about the issue. The utf8_path parameter is
essentially an unicode string at the managed side and the .NET default
marshaler doesn't seem to be capable to convert it to UTF8 by default. It's
true that we might probably pass the byte array returned by
Encoding.UTF8.GetBytes(utf8_path) to GDAL directly, but I guess it'd
require an extra work either by using Marshal.Copy or pinning the array
with a handle to make sure the garbage collector won't collect it until the
unmanaged function call completes.


>
> But as I said above the main problem is that in most cases the default
> ANSI encoding is used instead of UTF8. Some examples are
> Layer.SetAttributefilter, Layer.ExecuteSQL, FieldDefn.SetName,
> FieldDefn.GetName…
>
>
As far as I remember only the pathnames are treated as utf8 in GDAL, but it
might have been changed since I last reviewed the code. Encoding of the
fieldnames is somewhat driver specific which should probably be handled at
application level.


>
> Because I don’t really understand how SWIG works, I have created my own
> .NET C# wraps for these methods using the C API of OGR. That works fine but
> I thought that others might have the same problems, so I decided to post
> this.
>
>
Do you have an example which demonstrates the incorrect behaviour of the
current code?

Best regards,

Tamas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130402/02041e35/attachment.html>


More information about the gdal-dev mailing list