[PROJ] Vector/SIMD acceleration

Even Rouault even.rouault at spatialys.com
Fri Apr 17 04:01:04 PDT 2020


Thomas,

> That is - actually, I work on a proof-of-concept for an improved,
> next generation WKT, ironing out some of the geodetically
> unfortunate elements of WKT2019.

Hum. Was hoping we wouldn't need a WKT2021 :-)

> 
> But incidentally this involves implementing support for the
> OGC/ISO19100 "Coordinate Set" (i.e. "sets of coordinate tuples")
> concept, since ISO metadata is attached at the set, rather than
> tuple, level.

For other readers, I suppose you speak about the classes described at:
http://docs.opengeospatial.org/as/18-005r4/18-005r4.html#18

That's indeed something I left aside during PROJ 6 implementation. CoordinateMetadata 
could be interesting to implement, as it has a WKT:2019 representation (not implement 
currently), and could be useful for transformations involving dynamic/time-dependent CRS.

> But until then I'll be very interested in discussing the form and
> contents of a parallel coordinate data structure (CoordinateSet
> class), so we can keep things compatible.

To give you some hints on what I prototyped, the generic vector-capable type is

template<typename T, int N> class VF{};

(VF stands Vector Floating point)

And it has specializations VF<double,1>, VF<double,2>, VF<double,4> etc

VF<double,1> once optimized is equivalent to a plain old double
VF<double,2> on SSE2 expands to a SSE 128bit register (or 2 double on non vector platforms, 
but not necessarily runtime efficient)
VF<double,4> on AVX/AVX2 expands to a AVX 256bit register, or on SSE2 on 2 SSE 128 bit 
registers
VF<double,8> on AVX/AVX2 expands to 2 AVX 256bit registers, or on SSE2 to 4 SSE 128 bit 
registers (and possibly on AVX-512 to a AVX-512 512 bit register)

Then you can define a:
template<typename T, int N> struct PJ_VF_XY
{
	VF<T,N> x;
	VF<T,N> y;
};
etc etc

But I don't think we would want to expose that on PROJ API level, one of the good reason is 
that it is C++ so cannot be used at the C level.

> My first thought was to make the CoordinateSet class simply a
> container for the material currently given as args to
> proj_transform_generic()

I concur with this. I'd also imagine CoordinateSet to be more or less similar to 
proj_trans_generic() arguments, with pointers to X, Y, Z, T double* arrays and user provided 
strides, to be accept all reasonable memory arrangements (typically separate/contiguous X, 
Y, Z, T components, or interleaved XY, XYZ, XYZT patterns)
(I see that CoordinateSet refers to the DirectPosition type, which must be defined in some 
other ISO standard, but I don't think we want & need possibly heavy weight objects there)

If the user chooses a in-memory arrangement of its data that directly matches what PROJ 
would use under the hood (for a vector type, the separate/contiguous arrangement would be 
the best), it can save a bit of time for the loading/unloading between memory and SSE/AVX 
registers, but even if those moves between memory and registers aren't done in the most 
efficient way, they are certainly neglectable regarding the cost of the math operations.

PROJ should be smart enough internally to figure out from the available components of a 
pipeline if it must use a vector operation or not. Typically you would have a fwd2d_2points, 
fwd2d_4points, fwd2d_8points function pointers for a PROJ operation, that would be set or 
not, depending on available hardware capabilities at runtime and benchmarked efficiency of 
using those variants.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20200417/ee3a9b7b/attachment.html>


More information about the PROJ mailing list