[PROJ] PROJ and Unicode on Windows

Wed Apr 5 15:38:28 PDT 2023

Well it's not the console I'm worried about, that's coming straight from
the VS debugger. Knowing that strings are always coming out of PROJ in
UTF-8 is good.

Ultimately I'm sending the output to a C# DLL, so I need to CoTaskMemAlloc
my string. If I do something like this:

std::wstring s2ws(const char* utf8Bytes)
{
const std::string& str(utf8Bytes);
int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(),
NULL, 0);
std::wstring wstrTo(size_needed, 0);
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0],
size_needed);
return wstrTo;
}

Then I see the corrected UTF-8 text in the wstring. As mentioned this isn't
something I'm terribly familiar with, and I'd like to avoid writing
terrible C code and exploding buffers.
CoTaskMemAlloc needs the actual number of bytes, and we'll need an extra
spot for the null terminator.

const wchar_t* u_convertResult(const char* result) {
if (!result)
return nullptr;

std::wstring wstr = s2ws(result);
auto wlen = wstr.length() + 1;
auto len = wlen * sizeof(wchar_t);
wchar_t* buff = (wchar_t*)CoTaskMemAlloc(len);
if (buff) {
wcscpy_s(buff, wlen, wstr.c_str());
}
return buff;
}

Does this sound reasonable for Windows?

And as for Linux and maintaining a multi-platform compatibility, I'd define
an alias function like this instead:
const wchar_t* u_convertResult(const char* result) {
std::string str(result);
std::wstring wstr = std::wstring(str.begin(), str.end());

auto wlen = wstr.length() + 1;
auto len = wlen * sizeof(wchar_t);
wchar_t* buff = (wchar_t*)malloc(len);
if (buff) {
wcscpy(buff, wstr.c_str());
}
return buff;
}

Since it's already happily working as UTF-8 on Linux, I should be able to
pass in the original string to the wstring. CoTaskMemAlloc is just malloc.
Does this sound okay too?

Thanks!

On Wed, Apr 5, 2023 at 4:52 PM Even Rouault <even.rouault at spatialys.com>
wrote:

> Peter,
>
> there isn't any issue in your build. It is just that PROJ returns UTF-8
> encoded strings and that the typical Windows console isn't configured to
> display UTF-8. Cf
> https://stackoverflow.com/questions/57131654/using-utf-8-encoding-chcp-65001-in-command-prompt-windows-powershell-window
> or similar issues
>
> Even
> Le 05/04/2023 à 23:44, Peter Townsend via PROJ a écrit :
>
> I've got a bit of an annoyance with my windows proj build. Hopefully it's
> not too hard to resolve as the world of char/wchar_t/etc. isn't something
> I'm terribly familiar with.
>
> Take for example the area of use of EPSG:23031. On Linux it's fine, but on
> windows there's a unicode issue.
>
> PJ* crs = proj_create(m_ctxt, "EPSG:23031");
> ASSERT_NE(crs, nullptr);
> ObjectKeeper keeper_crsH(crs);
>
> double w, s, e, n;
> const char* a;
> proj_get_area_of_use(m_ctxt, crs, &w, &s, &e, &n, &a);
>
> Contents of a:
> "Europe - between 0Â°E and 6Â°E - Andorra; Denmark (North Sea); Germany
> offshore; Netherlands offshore; Norway including Svalbard - onshore and
> offshore; Spain - onshore (mainland and Balearic Islands); United Kingdom
> (UKCS) offshore."
>
> Is there a simple thing I'm overlooking in the build process that might
> clear up the encoding goof? Or do I need to do some bending over backwards
> with character manipulation?
>
> This is the command line I'm using to build this example:
> cmake -DBUILD_SHARED_LIBS=ON
> -DCMAKE_TOOLCHAIN_FILE=C:\dev\vcpkg\scripts\buildsystems\vcpkg.cmake ..
> cmake --build . --config Debug -j 8
>
> Thanks!
> --
> Peter Townsend
> Senior Software Developer
>
> _______________________________________________
> PROJ mailing listPROJ at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/proj
>
> -- http://www.spatialys.com
> My software is free, but my time generally not.
>
>

-- 
Peter Townsend
Senior Software Developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20230405/242d70b1/attachment.htm>