<br><br><div class="gmail_quote">2013/4/2 Dennis Gocke <span dir="ltr"><<a href="mailto:dengo@gmx.net" target="_blank">dengo@gmx.net</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
I have a few questions regarding the C# wrapper and UTF8 encoded strings.<br>
<br>
<br>
This is mostly an issue when working with OGR. If I understand correctly strings are supposed to be encoded in UTF8 internally. Some drivers might not follow that rule, but most do, is that correct?<br>
<br>
<br>
The problem that I have is that almost always the C# wrapper assumes strings to be ANSI encoded.<br>
<br>
<br>
Only in some cases, when the wrapper handles filepaths, it uses the correct UTF8 encoding, and even then the workaround is a bit strange. For instance when ‘converting’ a string to OGR it uses Encoding.Default.GetString(Encoding.UTF8.GetBytes(utf8_path)). So it essentially converts the path to a ‘wrongly’ encoded string, so that when it gets marshaled to unmanaged code which uses ANSI encoding by default, it again results to the correct bytes of a UTF8 encoded string. The reason for the strange workaround probably is that .NET does not support marshaling of strings to unmanaged UTF8 encoded strings, but I think it still would be better to not use automatic marshaling of strings at all and instead convert the string to a correct byte array in .NET and then pass the byte array/pointer to unmanaged code. The same for the reverse direction, get pointer from unmanaged code and convert to managed string using the correct UTF8 encoding.<br>
<br></blockquote><div><br></div><div>Hi Dennis,</div><div><br></div><div>I don't exactly follow your statements, it might be reasonable to get some examples to make sure about the issue. The utf8_path parameter is essentially an unicode string at the managed side and the .NET default marshaler doesn't seem to be capable to convert it to UTF8 by default. It's true that we might probably pass the byte array returned by Encoding.UTF8.GetBytes(utf8_path) to GDAL directly, but I guess it'd require an extra work either by using Marshal.Copy or pinning the array with a handle to make sure the garbage collector won't collect it until the unmanaged function call completes.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
But as I said above the main problem is that in most cases the default ANSI encoding is used instead of UTF8. Some examples are Layer.SetAttributefilter, Layer.ExecuteSQL, FieldDefn.SetName, FieldDefn.GetName…<br>
<br></blockquote><div><br></div><div>As far as I remember only the pathnames are treated as utf8 in GDAL, but it might have been changed since I last reviewed the code. Encoding of the fieldnames is somewhat driver specific which should probably be handled at application level.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Because I don’t really understand how SWIG works, I have created my own .NET C# wraps for these methods using the C API of OGR. That works fine but I thought that others might have the same problems, so I decided to post this.<br>
<br></blockquote><div><br></div><div>Do you have an example which demonstrates the incorrect behaviour of the current code?</div><div><br></div><div>Best regards,</div><div><br></div><div>Tamas</div><div><br></div><div><br>
</div><div> </div></div>