[PROJ] PROJ and Unicode on Windows

Peter Townsend peter.townsend at maplarge.com
Fri Apr 7 15:00:21 PDT 2023


I got it to work finally. Here's what I did for posterity.

What I've created is a C# managed wrapper, and I added a bunch of C
functions to PROJ with some extra functionality. Part of the extra
functionality is to CoTaskMemAlloc all the strings that PROJ can return so
that the .NET world will be happy.
For example, something like this:

const char* proj_errno_string_proxy(int err) {
return convertResult(proj_errno_string(err));
}
...
const char* convertResult(const char* result) {
if (!result)
return result;

//Going out, we need to CoTaskMemAlloc to pin the string memory.
//MarshalAs LPStr will free it automatically in .NET world.
std::string str(result);
auto len = str.length() + 1;
char* buff = (char*)CoTaskMemAlloc(len);
if (buff) {
ml_strcpy(buff, len, str);
}
return buff;
}

The PInvoke signature is:
public const CharSet CHARSET = CharSet.Ansi;
public const CallingConvention CALLING_CONVENTION = CallingConvention.Cdecl;
public const UnmanagedType STRINGTYPE = UnmanagedType.LPStr;
...
[DllImport(PROJ_PROXY_DLL, EntryPoint = "proj_errno_string_proxy",
CallingConvention = CALLING_CONVENTION)]
[return: MarshalAs(STRINGTYPE)]
public extern static string proj_errno_string_proxy(int err);

(The built-in marshalling will take care of freeing what I've sent it.)

Because PROJ is returning UTF-8 strings, this means that my strings aren't
coming or going in the right encoding. It's been working fine though in
practice, but sometimes you'd see the unicode garbling here and there.

Here's what I did to fix it. I was originally targeting .NET standard 2.0
and 2.1. 2.1 added UnmanagedType.LPUTF8Str from .NET Framework 4.7. And it
"just works". Changing STRINGTYPE to LPUTF8Str makes those parameters (and
struct members) encode correctly. I had to drop Standard 2.0 though.

Except in the case of string arrays, like those const char** options
parameters. My PInvoke signature was this:
public const UnmanagedType STRINGTYPE = UnmanagedType.LPStr;
public const UnmanagedType ARRAYTYPE = UnmanagedType.LPArray;
...
public extern static IntPtr proj_context_set_search_paths(ProjContextHandle
ctx, int count_paths, [MarshalAs(ARRAYTYPE, ArraySubType = STRINGTYPE,
SizeParamIndex = 1)] string[] paths);

Alas, LPUTF8Str is NOT supported with ArraySubType! So in order to conquer
that problem, I ended up using a custom marshaller.
public extern static IntPtr proj_context_set_search_paths(ProjContextHandle
ctx, int count_paths, [MarshalAs(UnmanagedType.CustomMarshaler,
MarshalTypeRef = typeof(Utf8StringArrayMarshaler))] string[] paths);

internal class Utf8StringMarshaler : ICustomMarshaler {
private static readonly Utf8StringMarshaler _instance = new
Utf8StringMarshaler();

public unsafe IntPtr MarshalManagedToNative(object strObj) {
if (strObj == null)
return IntPtr.Zero;
if (!(strObj is string str))
throw new ArgumentException("Value must be string", nameof(strObj));

return MarshalManagedValue(str);
}
public unsafe static IntPtr MarshalManagedValue(string str) {
#if NETSTANDARD2_1_OR_GREATER
//From core runtime's UTF8 string marshaller.
int exactByteCount = checked(Encoding.UTF8.GetByteCount(str) + 1); // + 1
for null terminator
byte* mem = (byte*)Marshal.AllocCoTaskMem(exactByteCount);
Span<byte> buffer = new(mem, exactByteCount);

int byteCount = Encoding.UTF8.GetBytes(str, buffer);
buffer[byteCount] = 0; // null-terminate
return (IntPtr)mem;
#else
var bytes = Encoding.UTF8.GetBytes(str);
var ptr = Marshal.AllocCoTaskMem(bytes.Length + 1);
Marshal.Copy(bytes, 0, ptr, bytes.Length);
Marshal.WriteByte(ptr, bytes.Length, 0);
return ptr;
#endif
}

public object MarshalNativeToManaged(IntPtr pNativeData) {
return MarshalUnmanagedValue(pNativeData);
}
public static string MarshalUnmanagedValue(IntPtr pNativeData) {
if (pNativeData == IntPtr.Zero)
return null;

#if NETSTANDARD2_1_OR_GREATER
return Marshal.PtrToStringUTF8(pNativeData);
#else
var bytes = new List<byte>(4096);
int offset = 0;
byte b;
do {
b = Marshal.ReadByte(pNativeData, offset);
if (b != 0) {
bytes.Add(b);
offset++;
}
} while (b != 0);

return bytes.Count > 0 ? Encoding.UTF8.GetString(bytes.ToArray(), 0,
bytes.Count) : "";
#endif
}

public void CleanUpManagedData(object ManagedObj) {
}

public void CleanUpNativeData(IntPtr pNativeData) {
Marshal.FreeCoTaskMem(pNativeData);
}

public int GetNativeDataSize() {
return -1;
}

public static ICustomMarshaler GetInstance(string pstrCookie) {
return _instance;
}
}

internal class Utf8StringArrayMarshaler : ICustomMarshaler {

private static readonly Utf8StringArrayMarshaler _instance = new
Utf8StringArrayMarshaler();

public unsafe IntPtr MarshalManagedToNative(object strObj) {
if (strObj == null)
return IntPtr.Zero;
if (!(strObj is string[] str))
throw new ArgumentException("Value must be string array", nameof(strObj));

//Write UTF-8 arrays for each entry in the string array
//Then end it with a nullptr.
var len = IntPtr.Size * str.Length;
var basePtr = Marshal.AllocHGlobal(len + IntPtr.Size);
var ptr = basePtr;
for (var i = 0; i < str.Length; i++) {
var addr = Utf8StringMarshaler.MarshalManagedValue(str[i]);
Marshal.WriteIntPtr(ptr, addr);
ptr += IntPtr.Size;
}
Marshal.WriteIntPtr(ptr, IntPtr.Zero);
return basePtr;
}

public object MarshalNativeToManaged(IntPtr pNativeData) {
if (pNativeData == IntPtr.Zero)
return null;

//We don't have any context on how long the string array will be.
var values = new List<string>();

//Read UTF8 strings until we hit nullptr.
var ptr = pNativeData;
var currValue = Marshal.ReadIntPtr(ptr);
while (currValue != IntPtr.Zero) {
var str = Utf8StringMarshaler.MarshalUnmanagedValue(currValue);
values.Add(str);

ptr += IntPtr.Size;
currValue = Marshal.ReadIntPtr(ptr);
}
return values.ToArray();
}

public void CleanUpManagedData(object ManagedObj) {
}

public void CleanUpNativeData(IntPtr pNativeData) {
if (pNativeData == IntPtr.Zero) {
return;
}

//Free the individual strings until we hit a nullptr.
var ptr = pNativeData;
var value = Marshal.ReadIntPtr(ptr);
while (value != IntPtr.Zero) {
Marshal.FreeCoTaskMem(value);
ptr += IntPtr.Size;
value = Marshal.ReadIntPtr(ptr);
}

//Free the array.
Marshal.FreeHGlobal(pNativeData);
}

public int GetNativeDataSize() {
return -1;
}

public static ICustomMarshaler GetInstance(string pstrCookie) {
return _instance;
}
}

I couldn't use the custom marshaller as a complete replacement though. You
can't use them on struct fields. So I have to use LPUTF8Str on those.
public const UnmanagedType STRINGTYPE = UnmanagedType.LPUTF8Str;
...
[StructLayout(LayoutKind.Sequential, CharSet = CHARSET)]
public struct PROJ_UNIT_INFO {

[MarshalAs(STRINGTYPE)]
public string auth_name;

[MarshalAs(STRINGTYPE)]
public string code;

[MarshalAs(STRINGTYPE)]
public string name;

[MarshalAs(STRINGTYPE)]
public string category;

public double conv_factor;

[MarshalAs(STRINGTYPE)]
public string proj_short_name;

public int deprecated;

}


On Thu, Apr 6, 2023 at 3:31 PM Peter Townsend <peter.townsend at maplarge.com>
wrote:

> Thanks, but I can't really use the SharpProj way. It's kinda using the
> .NET string as an intermediary. Plus I need to support a Linux build so I
> can't use C++/CLI anyway. The utf_8string method that takes in the .NET
> string works kinda similar to doing this w/o it:
> std::string utf8_string(String^ v)
> {
>     std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> conv;
>     pin_ptr<const wchar_t> pPath = PtrToStringChars(v);
>     std::wstring vstr(pPath);
>     std::string sstr(conv.to_bytes(vstr));
>     return sstr;
> }
> const char* convertResult4(const char* result) {
>     if (!result)
>         return result;
>
>     std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> conv;
>     std::wstring str = conv.from_bytes(result);
>     std::string sstr(conv.to_bytes(str));
>     ...
> }
>
> The std::wstring str contains the correctly encoded string, but turning it
> back to a const char* using std::string sstr.c_str() just garbles it back
> again.
>
> I might have to just proxy everything over the managed/unmanaged pinvoke
> wall as a wchar_t* or just make everything IntPtrs and Marshal them that
> way.
>
>
>
> On Thu, Apr 6, 2023 at 10:32 AM Bert Huijben <bert at qqmail.nl> wrote:
>
>>                 Hi Peter,
>>
>>
>>
>> When I needed proj for my work on my previous day-job, I spend a bit
>> extra time and created a complete wrapping C# library that is still used
>> there and a few other places. The wrapping is specifically targeted towards
>> Windows, but works there with .Net Framework and .Net core. See
>> https://github.com/ampscm/sharpproj/ (or just use SharpProj from NuGet)
>>
>>
>>
>>
>>
>> The sample code I have on that page shows +- what you try here, so you
>> should be able to use that to try your use-cases around encoding.
>>
>>
>>
>> [[
>>
>> using SharpProj;
>>
>>
>>
>> using var rd = CoordinateReferenceSystem.CreateFromEpsg(28992);
>>
>> using var wgs84 = CoordinateReferenceSystem.CreateFromEpsg(4326);
>>
>>
>>
>> var area = rd.UsageArea;
>>
>> Assert.AreEqual("Netherlands - onshore, including Waddenzee, Dutch Wadden
>> Islands and 12-mile offshore coastal zone.", area.Name);
>>
>>
>>
>> using (var t = CoordinateTransform.Create(rd, wgs84))
>>
>> {
>>
>>     var r = t.Apply(new PPoint(155000, 463000));
>>
>>     Assert.AreEqual(new PPoint(52.155, 5.387), r.ToXY(3)); // Round to 3
>> decimals for easy testing
>>
>> }
>>
>> ]]
>>
>>
>>
>> If you pick EPSG 23031, you will see that the encodings work there.
>>
>>
>>
>>
>>
>> You can check all the sourcecode too, if you just want to check how to
>> get the en-/decoding to work. (It is all Apache licensed, so feel free to
>> copy&paste… or provide pull requests if you want something added to the
>> library)
>>
>>
>>
>>                 Bert
>>
>>
>>
>>
>>
>>
>>
>> *From:* PROJ <proj-bounces at lists.osgeo.org> *On Behalf Of *Even Rouault
>> *Sent:* Wednesday, April 5, 2023 11:53 PM
>> *To:* Peter Townsend <peter.townsend at maplarge.com>; proj <
>> proj at lists.osgeo.org>
>> *Subject:* Re: [PROJ] PROJ and Unicode on Windows
>>
>>
>>
>> Peter,
>>
>> there isn't any issue in your build. It is just that PROJ returns UTF-8
>> encoded strings and that the typical Windows console isn't configured to
>> display UTF-8. Cf
>> https://stackoverflow.com/questions/57131654/using-utf-8-encoding-chcp-65001-in-command-prompt-windows-powershell-window
>> or similar issues
>>
>> Even
>>
>> Le 05/04/2023 à 23:44, Peter Townsend via PROJ a écrit :
>>
>> I've got a bit of an annoyance with my windows proj build. Hopefully it's
>> not too hard to resolve as the world of char/wchar_t/etc. isn't something
>> I'm terribly familiar with.
>>
>>
>>
>> Take for example the area of use of EPSG:23031. On Linux it's fine, but
>> on windows there's a unicode issue.
>>
>>
>>
>> PJ* crs = proj_create(m_ctxt, "EPSG:23031");
>> ASSERT_NE(crs, nullptr);
>> ObjectKeeper keeper_crsH(crs);
>>
>> double w, s, e, n;
>> const char* a;
>> proj_get_area_of_use(m_ctxt, crs, &w, &s, &e, &n, &a);
>>
>>
>>
>> Contents of a:
>>
>> "Europe - between 0°E and 6°E - Andorra; Denmark (North Sea); Germany
>> offshore; Netherlands offshore; Norway including Svalbard - onshore and
>> offshore; Spain - onshore (mainland and Balearic Islands); United Kingdom
>> (UKCS) offshore."
>>
>>
>>
>> Is there a simple thing I'm overlooking in the build process that might
>> clear up the encoding goof? Or do I need to do some bending over backwards
>> with character manipulation?
>>
>>
>>
>> This is the command line I'm using to build this example:
>>
>> cmake -DBUILD_SHARED_LIBS=ON
>> -DCMAKE_TOOLCHAIN_FILE=C:\dev\vcpkg\scripts\buildsystems\vcpkg.cmake ..
>> cmake --build . --config Debug -j 8
>>
>>
>>
>> Thanks!
>>
>> --
>>
>> Peter Townsend
>>
>> Senior Software Developer
>>
>>
>>
>> _______________________________________________
>>
>> PROJ mailing list
>>
>> PROJ at lists.osgeo.org
>>
>> https://lists.osgeo.org/mailman/listinfo/proj
>>
>> --
>>
>> http://www.spatialys.com
>>
>> My software is free, but my time generally not.
>>
>>
>
> --
> Peter Townsend
> Senior Software Developer
>


-- 
Peter Townsend
Senior Software Developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20230407/eda9ee7b/attachment-0001.htm>


More information about the PROJ mailing list