[Proj] Experiment to speed up proj.4 by 2 or more

José Luis García Pallero jgpallero at gmail.com
Tue Jun 23 09:04:34 PDT 2015


2015-06-23 17:29 GMT+02:00 Even Rouault <even.rouault at spatialys.com>:
> Hi,
>
> I've done an experiment to use Intel SIMD intrinsics
> (https://en.wikipedia.org/wiki/SIMD), and I think they could be beneficial for
> proj, when called to transform several coordinates at a time.
>
> I've used the SSE2 instruction set (128 bit registers, so 2 doubles at a
> time), and I managed to speed up the inverse Transverse Mercator ellipsoidal
> transformation (ie. from projected to geodetic) by a factor of ~ 2 (excluding
> potential datum transformations)
>
> One key for performance was to find an efficient way of computing the usual
> transcendental functions (ie. sin, cos, tan and their inverse, exp, ln,
> etc...) with SIMD registers, since they are not included in the instruction
> set. Otherwise you have to collect each component of the SIMD register,
> evaluate it with the x87 coprocessor, and reassemble the SIMD register from
> the computed components, which kills all the other performance gains. The
> SLEEF library (http://freecode.com/projects/sleef) has such routines, is in
> the public domain and works rather well (with gcc/clang, although it has some
> rough edges when trying with MSVC, but nothing that cannot be overcome)
>
> I've encapsulated the use of SSE2 intrinsics in a C++ class with overloading
> of arithmetics operators, so the resulting code looks pretty much similar to
> the original C code, which is great for readability (although the original C
> code isn't always very readable ;-)), and confidence that it doesn't introduce
> errors. Conditionnal branches are not so great for SIMD performance, but there
> are tricks to rewrite some of them with a ternary-like operator.
>
> SLEEF also supports the AVX & AVX2+FMA instruction sets (256 bit registers),
> which could also lead to a further ~ x2 gain over SSE2.
>
> So I was wondering if there was :
>
> 1) interest of the project in pursuing into that approach (which involves
> introducing C++ in the code base, as an implementation detail, the interface
> being unchanged). We could imagine to have the same source files compiled
> several times with different register sizes, with runtime selection of the
> appropriate variant (note: SSE2 is guaranteed to be available on all x86_64
> compatible processors. AVX/AVX2 is for more recent CPUs).
>
> 2) ... and sponsors interested in making that happen.
>
> Finally, the proof of concept:
> * regular code (runs in ~30s on Core i5 750 @ 2.67GHz  ):
>    https://gist.github.com/rouault/946104d0b98e8e8cc564
> * SSE2 code (~14s):
>    https://gist.github.com/rouault/3bbc31c9f12391d79920

It sounds very interesting, but I find some bad things:

1. Introducing C++: From my point of view, one of the PROJ best things
is that it is written in pure C. Also I can see in the SLEEF folder
that there exists a folder called 'purec'
2. Can be used SLEEF with Intel compilers, PathScale, Portland et al.?

Best regards

>
> Best regards,
>
> Even
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
> _______________________________________________
> Proj mailing list
> Proj at lists.maptools.org
> http://lists.maptools.org/mailman/listinfo/proj



-- 
*****************************************
José Luis García Pallero
jgpallero at gmail.com
(o<
/ / \
V_/_
Use Debian GNU/Linux and enjoy!
*****************************************



More information about the Proj mailing list