[PROJ] PROJ and Unicode on Windows

Even Rouault even.rouault at spatialys.com
Wed Apr 5 16:30:20 PDT 2023


Le 06/04/2023 à 01:09, Dan Crosby a écrit :
>
> How does this work on Linux? is char define as wchar there?
>
No. char is a single byte. wchar_t is generally a 32-bit integer on Unix.
>
> If Proj is returning UTF8 strings, shouldn’t the functions be using 
> wchar, or TCHAR at the least?
>
I guess this is just a matter of taste/habit. Lots of open source 
libraries that return Unicode content just return it as UTF-8 in a char* 
(or a std::string in C++. this is typically the case of the 
nlohmann/json library we use for JSON parsing).  If you need to access 
the string by Unicode character, you can use iconv or 
https://en.cppreference.com/w/cpp/locale/codecvt_utf8 in C++ (although 
the latter has been deprecated).
>
> Is there a compatibility reason to use char **?
>
That's all the reason why UTF-8 was designed for. To be able to deal 
with it mostly as if it was an old-school ASCII string.


> *From:*PROJ <proj-bounces at lists.osgeo.org> *On Behalf Of *Peter 
> Townsend via PROJ
> *Sent:* Thursday, 6 April 2023 10:38
> *To:* Even Rouault <even.rouault at spatialys.com>
> *Cc:* proj <proj at lists.osgeo.org>
> *Subject:* Re: [PROJ] PROJ and Unicode on Windows
>
> Well it's not the console I'm worried about, that's coming straight 
> from the VS debugger. Knowing that strings are always coming out of 
> PROJ in UTF-8 is good.
>
> Ultimately I'm sending the output to a C# DLL, so I need to 
> CoTaskMemAlloc my string. If I do something like this:
>
> std::wstring s2ws(const char* utf8Bytes)
> {
> const std::string& str(utf8Bytes);
> int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], 
> (int)str.size(), NULL, 0);
> std::wstring wstrTo(size_needed, 0);
> MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], 
> size_needed);
> return wstrTo;
> }
>
> Then I see the corrected UTF-8 text in the wstring. As mentioned this 
> isn't something I'm terribly familiar with, and I'd like to avoid 
> writing terrible C code and exploding buffers.
>
> CoTaskMemAlloc needs the actual number of bytes, and we'll need an 
> extra spot for the null terminator.
>
> const wchar_t* u_convertResult(const char* result) {
> if (!result)
> return nullptr;
>
> std::wstring wstr = s2ws(result);
> auto wlen = wstr.length() + 1;
> auto len = wlen * sizeof(wchar_t);
> wchar_t* buff = (wchar_t*)CoTaskMemAlloc(len);
> if (buff) {
> wcscpy_s(buff, wlen, wstr.c_str());
> }
> return buff;
> }
>
> Does this sound reasonable for Windows?
>
> And as for Linux and maintaining a multi-platform compatibility, I'd 
> define an alias function like this instead:
>
> const wchar_t* u_convertResult(const char* result) {
> std::string str(result);
> std::wstring wstr = std::wstring(str.begin(), str.end());
>
> auto wlen = wstr.length() + 1;
> auto len = wlen * sizeof(wchar_t);
> wchar_t* buff = (wchar_t*)malloc(len);
> if (buff) {
> wcscpy(buff, wstr.c_str());
> }
> return buff;
> }
>
> Since it's already happily working as UTF-8 on Linux, I should be able 
> to pass in the original string to the wstring. CoTaskMemAlloc is just 
> malloc. Does this sound okay too?
>
> Thanks!
>
>
> _______________________________________________
> PROJ mailing list
> PROJ at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/proj

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20230406/b5614430/attachment.htm>


More information about the PROJ mailing list