[PROJ] Vector/SIMD acceleration
Thomas Knudsen
knudsen.thomas at gmail.com
Fri Apr 17 04:59:38 PDT 2020
> Hum. Was hoping we wouldn't need a WKT2021 :-)
Next revision cycle is not until 2024, so nothing before that. But things
take time,
and I need to think, rethink and discuss this extensively, so better be
well prepared
in ample time before time's up.
And as the work provides a way towards better PROJ plumbing, whether or not
it leads
to WKT revisions, I think it's worthwhile.
I understand from your description of VF that it is fine to move-initialize
VF. This
obviously simplifies the interfacing, compared to map directly to the user
provided
data arrays. So essentially one could do (in pseudo-code):
N = 42424242
double serialdata[N]
for (i = 0; i < N+3; i+=4) {
LAST = min (N - 1, i + 3)
NUM = i + 4 - LAST
paralleldata[0..3] = serialdata[i..LAST]
(do cool stuff in 4-way-parallel)
serialdata[i..LAST] = paralleldata[0..NUM]
}
is this correctly understood?
Den fre. 17. apr. 2020 kl. 13.01 skrev Even Rouault <
even.rouault at spatialys.com>:
> Thomas,
>
>
>
> > That is - actually, I work on a proof-of-concept for an improved,
>
> > next generation WKT, ironing out some of the geodetically
>
> > unfortunate elements of WKT2019.
>
>
>
> Hum. Was hoping we wouldn't need a WKT2021 :-)
>
>
>
> >
>
> > But incidentally this involves implementing support for the
>
> > OGC/ISO19100 "Coordinate Set" (i.e. "sets of coordinate tuples")
>
> > concept, since ISO metadata is attached at the set, rather than
>
> > tuple, level.
>
>
>
> For other readers, I suppose you speak about the classes described at:
>
> http://docs.opengeospatial.org/as/18-005r4/18-005r4.html#18
>
>
>
> That's indeed something I left aside during PROJ 6 implementation.
> CoordinateMetadata could be interesting to implement, as it has a WKT:2019
> representation (not implement currently), and could be useful for
> transformations involving dynamic/time-dependent CRS.
>
>
>
> > But until then I'll be very interested in discussing the form and
>
> > contents of a parallel coordinate data structure (CoordinateSet
>
> > class), so we can keep things compatible.
>
>
>
> To give you some hints on what I prototyped, the generic vector-capable
> type is
>
>
>
> template<typename T, int N> class VF{};
>
>
>
> (VF stands Vector Floating point)
>
>
>
> And it has specializations VF<double,1>, VF<double,2>, VF<double,4> etc
>
>
>
> VF<double,1> once optimized is equivalent to a plain old double
>
> VF<double,2> on SSE2 expands to a SSE 128bit register (or 2 double on non
> vector platforms, but not necessarily runtime efficient)
>
> VF<double,4> on AVX/AVX2 expands to a AVX 256bit register, or on SSE2 on 2
> SSE 128 bit registers
>
> VF<double,8> on AVX/AVX2 expands to 2 AVX 256bit registers, or on SSE2 to
> 4 SSE 128 bit registers (and possibly on AVX-512 to a AVX-512 512 bit
> register)
>
>
>
> Then you can define a:
>
> template<typename T, int N> struct PJ_VF_XY
>
> {
>
> VF<T,N> x;
>
> VF<T,N> y;
>
> };
>
> etc etc
>
>
>
> But I don't think we would want to expose that on PROJ API level, one of
> the good reason is that it is C++ so cannot be used at the C level.
>
>
>
> > My first thought was to make the CoordinateSet class simply a
>
> > container for the material currently given as args to
>
> > proj_transform_generic()
>
>
>
> I concur with this. I'd also imagine CoordinateSet to be more or less
> similar to proj_trans_generic() arguments, with pointers to X, Y, Z, T
> double* arrays and user provided strides, to be accept all reasonable
> memory arrangements (typically separate/contiguous X, Y, Z, T components,
> or interleaved XY, XYZ, XYZT patterns)
>
> (I see that CoordinateSet refers to the DirectPosition type, which must be
> defined in some other ISO standard, but I don't think we want & need
> possibly heavy weight objects there)
>
>
>
> If the user chooses a in-memory arrangement of its data that directly
> matches what PROJ would use under the hood (for a vector type, the
> separate/contiguous arrangement would be the best), it can save a bit of
> time for the loading/unloading between memory and SSE/AVX registers, but
> even if those moves between memory and registers aren't done in the most
> efficient way, they are certainly neglectable regarding the cost of the
> math operations.
>
>
>
> PROJ should be smart enough internally to figure out from the available
> components of a pipeline if it must use a vector operation or not.
> Typically you would have a fwd2d_2points, fwd2d_4points, fwd2d_8points
> function pointers for a PROJ operation, that would be set or not, depending
> on available hardware capabilities at runtime and benchmarked efficiency of
> using those variants.
>
>
>
> Even
>
>
>
> --
>
> Spatialys - Geospatial professional services
>
> http://www.spatialys.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20200417/d58fa476/attachment-0001.html>
More information about the PROJ
mailing list