[PROJ] Geodetic to Authalic latitude conversions
Jérôme St-Louis
jerome at ecere.com
Thu Sep 12 12:56:53 PDT 2024
Dear Charles, Daniel, All,
> It is possible to do 2x unroll of the Clenshaw loop to avoid the
> shuffling of variables (t = xx(u0, u1), u1 = u0, u0 = t). See the
> function SinCosSeries in geodesic.c where this is done.
I applied the trick to avoid swapping the variables to both the rolled
and unrolled version -- thanks Charles for pointing me to that trick in
/SinCosSeries()/, I was already wondering how we could save that.
> I recommend against unrolling the loops.
> Any modern compiler will make these optimizations, tuned to the target
> architecture
I am not an optimization expert by any mean.
However, based on initial tests running the authalic ==> geodetic
conversion using the Clenshaw algorithm 1 billion times, my unrolled
version of the /clenshaw()/ appears to be roughly 3% faster than the
static inline one (27.7 seconds vs. 28.5 seconds, including generating a
random input latitude), with -O2, MMX, SSE and GCC fast-math
optimizations turned on (using GCC, not G++).
I think the extra logic overhead from a generic /clenshaw()/ could
explain this difference.
> This makes the code longer and harder to read.
Personally, I actually find the expanded version much easier to
understand what's going on, but that's just me :)
> You also lose the flexibility of adjusting the number of terms in the
> expansion at runtime.
That would be a very good argument, but that is not functionality that
is being exposed anywhere at the moment, not even as a compile-time option.
Would it be desirable to allow selecting how many orders / terms to use,
either at compile-time or at runtime in PROJ? If so, how we would go
about making this option available?
3% might be a relatively small performance improvement, but I would not
call it negligible.
However, I'm fine with using the /clenshaw()/ function inline if that's
what we want to do.
> Undoubtedly, we could do a better job centralizing some of these core
> capabilities, Clenshaw (and its complex counterpart) + general
> auxiliary latitude conversions, so that we don't have essentially
> duplicate code scattered all over the place.
Agreed. There is also /clens()/ in /tmerc.cpp /(and /clenS() /there for
the complex version) implementing this Clenshaw summation.
Thank you!
Kind regards,
-Jerome
On 9/12/24 12:57 PM, DANIEL STREBE wrote:
>
>
>> On Sep 12, 2024, at 05:09, Charles Karney via PROJ
>> <proj at lists.osgeo.org> wrote:
>>
>> I recommend against unrolling the loops. This makes the code longer and
>> harder to read. You also lose the flexibility of adjusting the number
>> of terms in the expansion at runtime.
>>
>> …But
>> remember that compilers can do the loop unrolling for you. Also,
>> doesn't the smaller code size with the loops result in fewer cache
>> misses?
>
> I think Charles is spot-on here. Any modern compiler will make these
> optimizations, tuned to the target architecture. Different
> architectures will prefer different amount of unrolling, so it’s best
> not to second-guess by hard-coding. Loop overhead of a simple counter
> is zero, normally, because of unrolling in the short cases and because
> the branch prediction will favor continuation in the longer cases.
> Meanwhile the loop counting happens in parallel in one of the ALUs
> while the FPUs do their thing.
>
> — daan Strebe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20240912/5b8d1cd9/attachment.htm>
More information about the PROJ
mailing list