[PROJ] Vector/SIMD acceleration
Thomas Knudsen
knudsen.thomas at gmail.com
Fri Apr 17 00:09:56 PDT 2020
Bonjour Even!
I'm "kind-of" interested in this. Not so much because I need
transformation capacity at the billions/sec level, but because
I, in my spare time, am working on an improved PROJ internal
data flow.
That is - actually, I work on a proof-of-concept for an improved,
next generation WKT, ironing out some of the geodetically
unfortunate elements of WKT2019.
But incidentally this involves implementing support for the
OGC/ISO19100 "Coordinate Set" (i.e. "sets of coordinate tuples")
concept, since ISO metadata is attached at the set, rather than
tuple, level. And implementing support for coordinate sets is
a very good excuse for (conceptually speaking) implementing
proj_trans() in terms of proj_transform_generic(), rather than,
as today, the other way round.
This makes it possible to introduce SIMD parallelism in a very
smooth way by providing parallel versions of the computationally
most costly operations first, and make the "parallel" API handle
the difference between parallel-native operations and old style
serial ones until some day everything is natively parallel.
It also introduces a much simpler "4D-all-the-way" internal
plumbing of the PROJ data flow and all in all cleans up a
lot of the extreme mess that lurks under the pretty surface of proj_trans().
And the current 4D API can be reimplemented easily as a thin wrapper
over the new parallel API (which currently goes under the name PEGS:
"Platform for Experiments with Geodetic Software").
PEGS is beginning to shape up, but I probably need another few
weeks to get it a bit more ready for other eyes, and to provide some
initial, rudimentary documentation. I will announce it here on
the list, when I make the repo public.
But until then I'll be very interested in discussing the form and
contents of a parallel coordinate data structure (CoordinateSet
class), so we can keep things compatible.
My first thought was to make the CoordinateSet class simply a
container for the material currently given as args to
proj_transform_generic() - i.e. something compatible with more
or less any possible data structure a user program may implement.
But this may be too general to fit with the parallel templates
you mention?
/thomas
Den tor. 16. apr. 2020 kl. 17.18 skrev Even Rouault <
even.rouault at spatialys.com>:
> Hi,
>
>
>
> I've lately worked (again (*)) on a proof of concept of the Transverse
> Mercator forward transformation to use Intel SIMD instructions to transform
> several coordinate pairs simultaneously, potentially for use by the
> proj_trans_array() / proj_trans_generic() functions. Transverse Mercator is
> a very good candidate for that as it is quite expensive, and has few
> branches.
>
>
>
> The impact on the projection code is minimal, and the conversion of the
> original code was mostly straightforward, by using C++ templates and
> operator overloading: you mostly replace occurences of "double" by a
> templated type, and depending on how it is instanciated, it can expand to a
> single, 2, 4, 8, etc. doubles, either in a single or several SIMD
> registers. Optimizers do a good job at generating good assembly from that.
>
>
>
> SIMD instrinsincs are available for basic arithmetic operations and
> comparisons, but not for trigonometric (sin, cos, etc.) and other
> transcendent (exp, log, ...) functions that are often needed to implement
> projections, and are usually the computation bottlenecks.
>
>
>
> The SLEEF Vectorized Math Library (https://sleef.org/), using Boost
> License
>
> (~ MIT), provides such operations, and with very good accuracy (accuracy
> of 1 ULP for double precision). It is portable accross OS and supports
> different architectures.
>
>
>
> On my standalone prototype (outside of PROJ infrastructure, with just the
> forward TMerc code extracted), I get a 3.8x speedup with the AVX2 + FMA
> instruction sets, compared to a build with AVX2 + FMA enabled with the
> original non-vector implementation, and using SLEEF. This is when
> transforming 8 coordinate pairs at the same time. This 3.8x speed-up is
> close to the optimal 4 factor (AVX/AVX2 256bit vectors can store 4
> doubles). Without SLEEF, the speedup is 1.35x
>
> I guess that with AVX-512 available, gains in the [4x, 8x[ range could be
> expected, but I haven't tested.
>
>
>
> With pure SSE2 that comes automatically with x86_64, I can get a 1.55x
> speed-up with SLEEF (optimal would be x2 due to the 128 bit SSE vectors).
> Without SLEEF, the speedup is 1.35x as well.
>
>
>
> I would expect similar gains on the reverse path of etmerc which has
> equivalent complexity. Snyder's tmerc, geographic <--> cartesian
> conversions, etc. would likely be other good candidates.
>
>
>
> SLEEF could be made an optional dependency of PROJ. When it is not
> available, the execution of trigonometric & transcendent functions is of
> course serialized, hence the reduced efficiency.
>
>
>
> I would expect the actual gains, once the needed changes to be able to
> integrate that in PROJ itself are done, to be less than what I got on the
> prototype, due to other overheads in code between the user call and the
> actual projection code. But there's probably improvements that could be
> done to reduce current overheads.
>
>
>
> Is there an interest in seeing that integrated in PROJ ? I guess this is
> mostly of interest for people transforming at least billions of points. A
> few millions is probably not enough to really appreciate the difference: I
> can already get 4 million points/sec transformed by proj_trans() with tmerc.
>
>
>
> The question of funding such work would also remained to be solved.
>
>
>
> Even
>
>
>
> (*) I had a feeling of deja-vu when writing this email, and actually I
> realized I wrote a similar one almost 5 years ago
>
> ( http://lists.maptools.org/pipermail/proj/2015-June/007169.html ). C++
> at that time seemed to be a hurdle for a number of people, but luckily we
> have gone through it now.
>
>
>
> --
>
> Spatialys - Geospatial professional services
>
> http://www.spatialys.com
> _______________________________________________
> PROJ mailing list
> PROJ at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/proj
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20200417/f7022a72/attachment-0001.html>
More information about the PROJ
mailing list