[PROJ] Vector/SIMD acceleration

Even Rouault even.rouault at spatialys.com
Fri Apr 17 07:01:42 PDT 2020


On vendredi 17 avril 2020 09:21:39 CEST Andrew Bell wrote:
> I know almost nothing about this, but I *thought* that compilers were
> moving to do SIMD instructions where possible as an optimization.  It may
> not be there yet, isn't this something that's getting attention?  Could the
> library take advantage of this by arranging the code to allow for this
> optimization, rather than adding explicit sleef interface?

That's a good point, but there's no way that the most clever compiler would or will be able to 
auto-vectorize PROJ code as it is currently written, since there's no explicit loop in the 
projection code, and that's the number 1 requirement for auto-vectorization to be triggered.

If there were such loops, maybe some compilers, which do not exist yet (maybe except the 
Intel compiler which I presume can use their proprietary SVML library [1]), could also auto-
vectorize the trancendent functions.
Actually I found  posts [2] [3] on the clang mailing list from 2016 where they considered sleef, 
but it doesn't seem that was pursued.

So, if you take the following snippet

#include <math.h>

void foo(const double* lon, const double* lat, double* Xout, double* Yout, int N)
{
    for( int i = 0; i < N; i++ )
    {
        Xout[i] = lon[i] * 2;
        Yout[i] = lat[i] * 4;
    }
}

void bar(const double* lon, const double* lat, double* Xout, double* Yout, int N)
{
    for( int i = 0; i < N; i++ )
    {
        Xout[i] = lon[i] * 2;
        Yout[i] = sqrt(lat[i]);
    }
}

void baz(const double* lon, const double* lat, double* Xout, double* Yout, int N)
{
    for( int i = 0; i < N; i++ )
    {
        Xout[i] = sin(lon[i]);
        Yout[i] = cos(lon[i]) * cos(lat[i]);
    }
}

With gcc and clang in -O3 mode (to force autovectorization), foo() is assembled into a 
reasonable auto-vectorized version. But the generated code for bar() and baz() remains 
completely serial (actually, that's quite surprising for bar() since sqrt exists as a SSE 
instruction). And bar() and baz() are actually more typical of PROJ than foo().

There's also the question of branches. Human intervention is probably needed to rewrite 
them in a way compatible.

If at the end of my loop in foo(), I add the following
        if( Yout[i] >= 4 )
        {
            Xout[i] = 4 - Xout[i];
            Yout[i] = 4 - Yout[i];
        }
gcc or clang give up on autovectorization (even when complicating significantly the code 
before it)

Even


[1] https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-intrinsics-for-short-vector-math-library-svml-operations
[2] http://lists.llvm.org/pipermail/llvm-dev/2016-July/102254.html
[3] https://reviews.llvm.org/D24951

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20200417/72fadb7b/attachment.html>


More information about the PROJ mailing list