<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">Le 06/04/2023 à 01:09, Dan Crosby a
écrit :<br>
</div>
<blockquote type="cite"
cite="mid:cdca058a-7f03-454f-83c0-8f0851555e14@lincolnagritech.co.nz">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;
mso-fareast-language:EN-NZ;}span.gmailsignatureprefix
{mso-style-name:gmail_signature_prefix;}span.EmailStyle22
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}span.EmailStyle23
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}div.WordSection1
{page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">How
does this work on Linux? is char define as wchar there?</span></p>
</div>
</blockquote>
No. char is a single byte. wchar_t is generally a 32-bit integer on
Unix.<br>
<blockquote type="cite"
cite="mid:cdca058a-7f03-454f-83c0-8f0851555e14@lincolnagritech.co.nz">
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">If
Proj is returning UTF8 strings, shouldn’t the functions be
using wchar, or TCHAR at the least?</span></p>
</div>
</blockquote>
I guess this is just a matter of taste/habit. Lots of open source
libraries that return Unicode content just return it as UTF-8 in a
char* (or a std::string in C++. this is typically the case of the
nlohmann/json library we use for JSON parsing). If you need to
access the string by Unicode character, you can use iconv or
<a class="moz-txt-link-freetext" href="https://en.cppreference.com/w/cpp/locale/codecvt_utf8">https://en.cppreference.com/w/cpp/locale/codecvt_utf8</a> in C++
(although the latter has been deprecated).<br>
<blockquote type="cite"
cite="mid:cdca058a-7f03-454f-83c0-8f0851555e14@lincolnagritech.co.nz">
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Is
there a compatibility reason to use char **?</span></p>
</div>
</blockquote>
<p>That's all the reason why UTF-8 was designed for. To be able to
deal with it mostly as if it was an old-school ASCII string.<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:cdca058a-7f03-454f-83c0-8f0851555e14@lincolnagritech.co.nz">
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm
0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US"> PROJ
<a class="moz-txt-link-rfc2396E" href="mailto:proj-bounces@lists.osgeo.org"><proj-bounces@lists.osgeo.org></a> <b>On Behalf Of
</b>Peter Townsend via PROJ<br>
<b>Sent:</b> Thursday, 6 April 2023 10:38<br>
<b>To:</b> Even Rouault
<a class="moz-txt-link-rfc2396E" href="mailto:even.rouault@spatialys.com"><even.rouault@spatialys.com></a><br>
<b>Cc:</b> proj <a class="moz-txt-link-rfc2396E" href="mailto:proj@lists.osgeo.org"><proj@lists.osgeo.org></a><br>
<b>Subject:</b> Re: [PROJ] PROJ and Unicode on Windows<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Well it's not the console I'm worried
about, that's coming straight from the VS debugger.
Knowing that strings are always coming out of PROJ in
UTF-8 is good. <o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Ultimately I'm sending the output to
a C# DLL, so I need to CoTaskMemAlloc my string. If I do
something like this:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">std::wstring s2ws(const char*
utf8Bytes)<br>
{<br>
const std::string& str(utf8Bytes);<br>
int size_needed = MultiByteToWideChar(CP_UTF8, 0,
&str[0], (int)str.size(), NULL, 0);<br>
std::wstring wstrTo(size_needed, 0);<br>
MultiByteToWideChar(CP_UTF8, 0, &str[0],
(int)str.size(), &wstrTo[0], size_needed);<br>
return wstrTo;<br>
}<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Then I see the corrected UTF-8 text
in the wstring. As mentioned this isn't something I'm
terribly familiar with, and I'd like to avoid writing
terrible C code and exploding buffers.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">CoTaskMemAlloc needs the actual
number of bytes, and we'll need an extra spot for the
null terminator. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">const wchar_t* u_convertResult(const
char* result) {<br>
if (!result)<br>
return nullptr;<br>
<br>
std::wstring wstr = s2ws(result);<br>
auto wlen = wstr.length() + 1;<br>
auto len = wlen * sizeof(wchar_t);<br>
wchar_t* buff = (wchar_t*)CoTaskMemAlloc(len);<br>
if (buff) {<br>
wcscpy_s(buff, wlen, wstr.c_str());<br>
}<br>
return buff;<br>
}<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Does this sound reasonable for
Windows?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">And as for Linux and maintaining a
multi-platform compatibility, I'd define an alias
function like this instead:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">const wchar_t* u_convertResult(const
char* result) {<br>
std::string str(result);<br>
std::wstring wstr = std::wstring(str.begin(),
str.end());<br>
<br>
auto wlen = wstr.length() + 1;<br>
auto len = wlen * sizeof(wchar_t);<br>
wchar_t* buff = (wchar_t*)malloc(len);<br>
if (buff) {<br>
wcscpy(buff, wstr.c_str());<br>
}<br>
return buff;<br>
}<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Since it's already happily working as
UTF-8 on Linux, I should be able to pass in the original
string to the wstring. CoTaskMemAlloc is just malloc.
Does this sound okay too?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Thanks!<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
PROJ mailing list
<a class="moz-txt-link-abbreviated" href="mailto:PROJ@lists.osgeo.org">PROJ@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/proj">https://lists.osgeo.org/mailman/listinfo/proj</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
</body>
</html>