[PROJ] Vector/SIMD acceleration

Fri Apr 17 07:55:58 PDT 2020

I ran across this (
https://sc18.supercomputing.org/proceedings/workshops/workshop_files/ws_llvmf106s2-file1.pdf),
which may be of interest.

On Fri, Apr 17, 2020 at 10:01 AM Even Rouault <even.rouault at spatialys.com>
wrote:

> On vendredi 17 avril 2020 09:21:39 CEST Andrew Bell wrote:
>
> > I know almost nothing about this, but I *thought* that compilers were
>
> > moving to do SIMD instructions where possible as an optimization. It may
>
> > not be there yet, isn't this something that's getting attention? Could
> the
>
> > library take advantage of this by arranging the code to allow for this
>
> > optimization, rather than adding explicit sleef interface?
>
>
>
> That's a good point, but there's no way that the most clever compiler
> would or will be able to auto-vectorize PROJ code as it is currently
> written, since there's no explicit loop in the projection code, and that's
> the number 1 requirement for auto-vectorization to be triggered.
>
>
>
> If there were such loops, maybe some compilers, which do not exist yet
> (maybe except the Intel compiler which I presume can use their proprietary
> SVML library [1]), could also auto-vectorize the trancendent functions.
>
> Actually I found posts [2] [3] on the clang mailing list from 2016 where
> they considered sleef, but it doesn't seem that was pursued.
>
>
>
> So, if you take the following snippet
>
>
>
> #include <math.h>
>
>
>
> void foo(const double* lon, const double* lat, double* Xout, double* Yout,
> int N)
>
> {
>
> for( int i = 0; i < N; i++ )
>
> {
>
> Xout[i] = lon[i] * 2;
>
> Yout[i] = lat[i] * 4;
>
> }
>
> }
>
>
>
> void bar(const double* lon, const double* lat, double* Xout, double* Yout,
> int N)
>
> {
>
> for( int i = 0; i < N; i++ )
>
> {
>
> Xout[i] = lon[i] * 2;
>
> Yout[i] = sqrt(lat[i]);
>
> }
>
> }
>
>
>
> void baz(const double* lon, const double* lat, double* Xout, double* Yout,
> int N)
>
> {
>
> for( int i = 0; i < N; i++ )
>
> {
>
> Xout[i] = sin(lon[i]);
>
> Yout[i] = cos(lon[i]) * cos(lat[i]);
>
> }
>
> }
>
>
>
> With gcc and clang in -O3 mode (to force autovectorization), foo() is
> assembled into a reasonable auto-vectorized version. But the generated code
> for bar() and baz() remains completely serial (actually, that's quite
> surprising for bar() since sqrt exists as a SSE instruction). And bar() and
> baz() are actually more typical of PROJ than foo().
>
>
>
> There's also the question of branches. Human intervention is probably
> needed to rewrite them in a way compatible.
>
>
>
> If at the end of my loop in foo(), I add the following
>
> if( Yout[i] >= 4 )
>
> {
>
> Xout[i] = 4 - Xout[i];
>
> Yout[i] = 4 - Yout[i];
>
> }
>
> gcc or clang give up on autovectorization (even when complicating
> significantly the code before it)
>
>
>
> Even
>
>
>
>
>
> [1]
> https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-intrinsics-for-short-vector-math-library-svml-operations
>
> [2] http://lists.llvm.org/pipermail/llvm-dev/2016-July/102254.html
>
> [3] https://reviews.llvm.org/D24951
>
>
>
> --
>
> Spatialys - Geospatial professional services
>
> http://www.spatialys.com
>

-- 
Andrew Bell
andrew.bell.ia at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20200417/da51be67/attachment.html>