<div dir="ltr">Bonjour Even!<br><br>I'm "kind-of" interested in this. Not so much because I need<br>transformation capacity at the billions/sec level, but because<br>I, in my spare time, am working on an improved PROJ internal<br>data flow.<br><br>That is - actually, I work on a proof-of-concept for an improved,<br>next generation WKT, ironing out some of the geodetically<br>unfortunate elements of WKT2019.<br><br>But incidentally this involves implementing support for the<br>OGC/ISO19100 "Coordinate Set" (i.e. "sets of coordinate tuples")<div>concept, since ISO metadata is attached at the set, rather than<br>tuple, level. And implementing support for coordinate sets is<br>a very good excuse for (conceptually speaking) implementing<br>proj_trans() in terms of proj_transform_generic(), rather than,<br>as today, the other way round.<br><br>This makes it possible to introduce SIMD parallelism in a very<br>smooth way by providing parallel versions of the computationally<br>most costly operations first, and make the "parallel" API handle<br>the difference between parallel-native operations and old style<br>serial ones until some day everything is natively parallel.<br><br>It also introduces a much simpler "4D-all-the-way" internal<br>plumbing of the PROJ data flow and all in all cleans up a<br>lot of the extreme mess that lurks under the pretty surface of proj_trans().<div>And the current 4D API can be reimplemented easily as a thin wrapper</div><div>over the new parallel API (which currently goes under the name PEGS:</div><div>"Platform for Experiments with Geodetic Software").<br><br>PEGS is beginning to shape up, but I probably need another few<br>weeks to get it a bit more ready for other eyes, and to provide some<br>initial, rudimentary documentation. I will announce it here on<br>the list, when I make the repo public.<br><br>But until then I'll be very interested in discussing the form and<br>contents of a parallel coordinate data structure (CoordinateSet<br>class), so we can keep things compatible.<br><br>My first thought was to make the CoordinateSet class simply a<br>container for the material currently given as args to<br>proj_transform_generic() - i.e. something compatible with more<br>or less any possible data structure a user program may implement.<br>But this may be too general to fit with the parallel templates<br>you mention?<br><br>/thomas<br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Den tor. 16. apr. 2020 kl. 17.18 skrev Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div style="font-family:monospace;font-size:9pt;font-weight:400;font-style:normal">
<p style="margin:0px;text-indent:0px">Hi,</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">I've lately worked (again (*)) on a proof of concept of the Transverse Mercator forward transformation to use Intel SIMD instructions to transform several coordinate pairs simultaneously, potentially for use by the proj_trans_array() / proj_trans_generic() functions. Transverse Mercator is a very good candidate for that as it is quite expensive, and has few branches.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">The impact on the projection code is minimal, and the conversion of the original code was mostly straightforward, by using C++ templates and operator overloading: you mostly replace occurences of "double" by a templated type, and depending on how it is instanciated, it can expand to a single, 2, 4, 8, etc. doubles, either in a single or several SIMD registers. Optimizers do a good job at generating good assembly from that.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">SIMD instrinsincs are available for basic arithmetic operations and comparisons, but not for trigonometric (sin, cos, etc.) and other transcendent (exp, log, ...) functions that are often needed to implement projections, and are usually the computation bottlenecks.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">The SLEEF Vectorized Math Library (<a href="https://sleef.org/" target="_blank">https://sleef.org/</a>), using Boost License</p>
<p style="margin:0px;text-indent:0px">(~ MIT), provides such operations, and with very good accuracy (accuracy of 1 ULP for double precision). It is portable accross OS and supports different architectures.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">On my standalone prototype (outside of PROJ infrastructure, with just the forward TMerc code extracted), I get a 3.8x speedup with the AVX2 + FMA instruction sets, compared to a build with AVX2 + FMA enabled with the original non-vector implementation, and using SLEEF. This is when transforming 8 coordinate pairs at the same time. This 3.8x speed-up is close to the optimal 4 factor (AVX/AVX2 256bit vectors can store 4 doubles). Without SLEEF, the speedup is 1.35x</p>
<p style="margin:0px;text-indent:0px">I guess that with AVX-512 available, gains in the [4x, 8x[ range could be expected, but I haven't tested.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">With pure SSE2 that comes automatically with x86_64, I can get a 1.55x speed-up with SLEEF (optimal would be x2 due to the 128 bit SSE vectors). Without SLEEF, the speedup is 1.35x as well.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">I would expect similar gains on the reverse path of etmerc which has equivalent complexity. Snyder's tmerc, geographic <--> cartesian conversions, etc. would likely be other good candidates.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">SLEEF could be made an optional dependency of PROJ. When it is not available, the execution of trigonometric & transcendent functions is of course serialized, hence the reduced efficiency.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">I would expect the actual gains, once the needed changes to be able to integrate that in PROJ itself are done, to be less than what I got on the prototype, due to other overheads in code between the user call and the actual projection code. But there's probably improvements that could be done to reduce current overheads.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">Is there an interest in seeing that integrated in PROJ ? I guess this is mostly of interest for people transforming at least billions of points. A few millions is probably not enough to really appreciate the difference: I can already get 4 million points/sec transformed by proj_trans() with tmerc.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">The question of funding such work would also remained to be solved.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">Even</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">(*) I had a feeling of deja-vu when writing this email, and actually I realized I wrote a similar one almost 5 years ago</p>
<p style="margin:0px;text-indent:0px">( <a href="http://lists.maptools.org/pipermail/proj/2015-June/007169.html" target="_blank">http://lists.maptools.org/pipermail/proj/2015-June/007169.html</a> ). C++ at that time seemed to be a hurdle for a number of people, but luckily we have gone through it now.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">-- </p>
<p style="margin:0px;text-indent:0px">Spatialys - Geospatial professional services</p>
<p style="margin:0px;text-indent:0px"><a href="http://www.spatialys.com" target="_blank">http://www.spatialys.com</a></p></div>_______________________________________________<br>
PROJ mailing list<br>
<a href="mailto:PROJ@lists.osgeo.org" target="_blank">PROJ@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/proj" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/proj</a><br>
</blockquote></div>