<div dir="ltr"><div><br></div>I ran across this (<a href="https://sc18.supercomputing.org/proceedings/workshops/workshop_files/ws_llvmf106s2-file1.pdf">https://sc18.supercomputing.org/proceedings/workshops/workshop_files/ws_llvmf106s2-file1.pdf</a>), which may be of interest.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Apr 17, 2020 at 10:01 AM Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div style="font-family:monospace;font-size:9pt;font-weight:400;font-style:normal">
<p style="margin:0px;text-indent:0px">On vendredi 17 avril 2020 09:21:39 CEST Andrew Bell wrote:</p>
<p style="margin:0px;text-indent:0px">> I know almost nothing about this, but I *thought* that compilers were</p>
<p style="margin:0px;text-indent:0px">> moving to do SIMD instructions where possible as an optimization.  It may</p>
<p style="margin:0px;text-indent:0px">> not be there yet, isn't this something that's getting attention?  Could the</p>
<p style="margin:0px;text-indent:0px">> library take advantage of this by arranging the code to allow for this</p>
<p style="margin:0px;text-indent:0px">> optimization, rather than adding explicit sleef interface?</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">That's a good point, but there's no way that the most clever compiler would or will be able to auto-vectorize PROJ code as it is currently written, since there's no explicit loop in the projection code, and that's the number 1 requirement for auto-vectorization to be triggered.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">If there were such loops, maybe some compilers, which do not exist yet (maybe except the Intel compiler which I presume can use their proprietary SVML library [1]), could also auto-vectorize the trancendent functions.</p>
<p style="margin:0px;text-indent:0px">Actually I found  posts [2] [3] on the clang mailing list from 2016 where they considered sleef, but it doesn't seem that was pursued.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">So, if you take the following snippet</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">#include <math.h></p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">void foo(const double* lon, const double* lat, double* Xout, double* Yout, int N)</p>
<p style="margin:0px;text-indent:0px">{</p>
<p style="margin:0px;text-indent:0px">    for( int i = 0; i < N; i++ )</p>
<p style="margin:0px;text-indent:0px">    {</p>
<p style="margin:0px;text-indent:0px">        Xout[i] = lon[i] * 2;</p>
<p style="margin:0px;text-indent:0px">        Yout[i] = lat[i] * 4;</p>
<p style="margin:0px;text-indent:0px">    }</p>
<p style="margin:0px;text-indent:0px">}</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">void bar(const double* lon, const double* lat, double* Xout, double* Yout, int N)</p>
<p style="margin:0px;text-indent:0px">{</p>
<p style="margin:0px;text-indent:0px">    for( int i = 0; i < N; i++ )</p>
<p style="margin:0px;text-indent:0px">    {</p>
<p style="margin:0px;text-indent:0px">        Xout[i] = lon[i] * 2;</p>
<p style="margin:0px;text-indent:0px">        Yout[i] = sqrt(lat[i]);</p>
<p style="margin:0px;text-indent:0px">    }</p>
<p style="margin:0px;text-indent:0px">}</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">void baz(const double* lon, const double* lat, double* Xout, double* Yout, int N)</p>
<p style="margin:0px;text-indent:0px">{</p>
<p style="margin:0px;text-indent:0px">    for( int i = 0; i < N; i++ )</p>
<p style="margin:0px;text-indent:0px">    {</p>
<p style="margin:0px;text-indent:0px">        Xout[i] = sin(lon[i]);</p>
<p style="margin:0px;text-indent:0px">        Yout[i] = cos(lon[i]) * cos(lat[i]);</p>
<p style="margin:0px;text-indent:0px">    }</p>
<p style="margin:0px;text-indent:0px">}</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">With gcc and clang in -O3 mode (to force autovectorization), foo() is assembled into a reasonable auto-vectorized version. But the generated code for bar() and baz() remains completely serial (actually, that's quite surprising for bar() since sqrt exists as a SSE instruction). And bar() and baz() are actually more typical of PROJ than foo().</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">There's also the question of branches. Human intervention is probably needed to rewrite them in a way compatible.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">If at the end of my loop in foo(), I add the following</p>
<p style="margin:0px;text-indent:0px">        if( Yout[i] >= 4 )</p>
<p style="margin:0px;text-indent:0px">        {</p>
<p style="margin:0px;text-indent:0px">            Xout[i] = 4 - Xout[i];</p>
<p style="margin:0px;text-indent:0px">            Yout[i] = 4 - Yout[i];</p>
<p style="margin:0px;text-indent:0px">        }</p>
<p style="margin:0px;text-indent:0px">gcc or clang give up on autovectorization (even when complicating significantly the code before it)</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">Even</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">[1] <a href="https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-intrinsics-for-short-vector-math-library-svml-operations" target="_blank">https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-intrinsics-for-short-vector-math-library-svml-operations</a></p>
<p style="margin:0px;text-indent:0px">[2] <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-July/102254.html" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2016-July/102254.html</a></p>
<p style="margin:0px;text-indent:0px">[3] <a href="https://reviews.llvm.org/D24951" target="_blank">https://reviews.llvm.org/D24951</a></p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">-- </p>
<p style="margin:0px;text-indent:0px">Spatialys - Geospatial professional services</p>
<p style="margin:0px;text-indent:0px"><a href="http://www.spatialys.com" target="_blank">http://www.spatialys.com</a></p></div></blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature">Andrew Bell<br><a href="mailto:andrew.bell.ia@gmail.com" target="_blank">andrew.bell.ia@gmail.com</a></div>