[PROJ] Vector/SIMD acceleration
Even Rouault
even.rouault at spatialys.com
Fri Apr 17 16:36:40 PDT 2020
> I assume the ">" operator will
> return an array of bools and then you'll have to arrange that different
> portions of your array are handled differently (like MATLAB/Octave?
> see also the old Cray vector merge intrinsics).
In what I prototyped, you'd do something like
const auto mask = (fabs (Ce) <= 2.623395162778);
const auto X = Q->Qn * Ce;
const auto Y = Q->Qn * Cn + Q->Zb;
return { ternary(mask, X, HUGE_VAL),
ternary(mask, Y, HUGE_VAL) };
With SSE/AVX "mask" will be just a SIMD register itself with bits set or unset depending on
the rest of the comparison for each member of the vector, and ternary(mask, then, else) will
expand to
or(and(mask, then), andnot(mask, else)). (In an ideal world, we would be able to overload the
? : operator, but that's not possible :-))
In cases where you really need a real branch, you can get an integer and then decide to go to
a slower code path to deal with a non nominal situation. But in my idea, only a small portion
of existing PROJ operations would be convered to a vector-enabled code: the ones that are
particularly costly and have few branches to deal with the nominal input.
> (And probably my assertion that the code can remain essentially
> the same, except for the declarations, is a pipe dream.)
Not at all. My SIMD exact_e_fwd() is exactly the same as current one, with just "double"
replaced by the vector type, and the final part with the if( fabs (Ce) <= 2.623395162778)
branch replaced by the above snippet.
> By the way, Eigen is an awesome package. It implements everything with
> header code and so the end result is often very efficient. The downside
> is that because of all the flexibility that Eigen provides with how
> arrays are laid out and allocated, you can sometimes end up in template
> hell.
To be honest, I love my own header approach because I know how it works and can customize
it quite easily, and it was fun to do. But its scope (base types & supported architectures) is
more reduced than other mentioned alternatives.
At the end of the day, everybody ends up more or less the same API, a templated type
<base_type, number_of_elements>, so switching between implementations (with similar
capabilities) could be done relatively easily. In a PROJ context, the availability and accuracy of
transcendent functions is a key element, and I'm not aware of better alternatives than Sleef
for that.
The fact that it was considered for clang and also in the paper Andrew quoted for a LLVM-
based (proprietary) Arm Compiler for HPC makes it IMHO a serious candidate.
I've also just given a quick try at https://github.com/xtensor-stack/xsimd suggested by Joris
Van den Bossche, which is also another header based approach (the only one that can work...)
One annoying limitation of it is that it refuses to instanciate for example a vector of 4-double
on SSE2-only because it doesn't map to a single SIMD register, whereas Eigen or my approach
would use 2 SSE registers for that, which is sometimes still valuable to make use of the fact
that the CPU can sometimes execute 2 additions in parallel if they use separated registers. I
could actually measure that on my prototype (even when going to a 8-double "vector" on
SSE, which maps to 4 SSE registers), so this isn't just theoretical.
I've not seen an explicit statement about the accuracy of transcendent functions in xsimd.
When randomly trying sin(150), it returned a small difference compared to libc sin() or Sleef,
but that's probably not in the range of values that PROJ would trigger. When trying between
[-PI,PI], couldn't see a difference from a quick probing.
One advantage with xsimd is that the transcendent functions are implemented in a header-
only fashion too, whereas currently Sleef requires linking against a lib, but inlining of the
code of the transcendent functions isn't really needed for performance, as the overhead of
the call is quite neglectable compared to their cost.
--
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20200418/64d89c88/attachment.html>
More information about the PROJ
mailing list