[geos-devel] Numeric stability

Wed Apr 26 06:17:57 EDT 2006

Being fighting with dragons in the last few weeks
I thought I'd share some observations with the community :)

I produced a small testcase to clearly show what the
problem is. Call it unstable.cpp, here's the code:

---8<---------------------------------------------------------

#include <vector>
#include <cassert>

using namespace std;

vector<double> v;
double tot=0;

void add(double d) {
        v.push_back(d);
        tot+=d;
}

void check() {
        double tot_check=0;
        for (size_t i=0; i<v.size(); i++) {
                tot_check+=v[i];
        }
        assert(tot_check==tot);
}

int main()
{
        add(3.4200000000000001);
        add(1.51000000000000000000000001);
        check();
        return 0;
}

---8<---------------------------------------------------------

You will notice that depending on optimization used, FP
unit and inline strategy you'll get different results
(ie: the assertion failing or succeeding).

A few examples:

// These builds make assertion fail:
// g++  -O3   unstable.cpp
// g++  -O2   unstable.cpp
// g++  -O1   unstable.cpp
//
// These builds make assertion succeed:
// g++  -O0   unstable.cpp
// g++  -ffloat-store -O1   unstable.cpp
// g++  -ffloat-store -O2   unstable.cpp
// g++  -ffloat-store -O3   unstable.cpp
// g++ -march=pentium4 -mfpmath=sse -msse2 -O1 unstable.cpp
// g++ -march=pentium4 -mfpmath=sse -msse2 -O2 unstable.cpp
// g++ -march=pentium4 -mfpmath=sse -msse2 -O3 unstable.cpp

I haven't tried messing with 'inlined' funx, but
that has been part of my dragon's fights with the
real codebase, so I experienced that inlined/outlined
versions of the same function can also influence
rounding of floating point numbers.
Having done some research on the web, it looks like
that also the exactly same assembly code can result
in different results based on the actual processor
used, when it comes to floating point computations,
this has starterd to be taken into consideration by
new processors providing 'scalar floating
point instructions' (pentium4 and up)

Here is an excerpt from g++ manual page (-mfpmath):

   sse Use scalar floating point instructions present in the SSE
       instruction set.  This instruction set is supported by Pentium3 and
       newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp
       chips.  The earlier version of SSE instruction set supports
       only single precision arithmetics, thus the double and extended
       precision arithmetics is still done using 387.  Later version,
       present only in Pentium4 and the future AMD x86-64 chips supports
       double precision arithmetics too.

       For i387 you need to use -march=cpu-type, -msse or -msse2
       switches to enable SSE extensions and make this option effective.
       For x86-64 compiler, these extensions are enabled by default.

       The resulting code should be considerably faster in the majority of
       cases and avoid the numerical instability problems of 387 code, but
       may break some existing code that expects temporaries to be 80bit.

       This is the default choice for the x86-64 compiler.

So, basically, the only way to work predictably with doubles (building
blocks for our Coordinates) would be having a modern processor providing
SSE instructions set. Or at least this is what I understood.

Alternatively, I read some reports on the web about taking care
of the way things are coded to augment the predictability of
results. This would include having NO inlined code compute ops
on floating points and have *every* complex computation be split
in atomic operations where all intermediate value is assigned
to a variable. I wouldn't be surprised if we'd also need to disable
optimization in order to make this effective. Sounds a big work
and probably a loss of performance as well.

Ok. This is it for now, as I don't have any practical solution yet.
I just wanted to share these thoughts with you in case anyone has
more experience on the subject.

Comments highly welcome, as are reports of run of that testcase.

--strk;