summaryrefslogtreecommitdiff
path: root/src/math/fma.c
AgeCommit message (Collapse)AuthorFilesLines
2017-10-13math: rewrite fma with mostly int arithmeticsSzabolcs Nagy1-431/+154
the freebsd fma code failed to raise underflow exception in some cases in nearest rounding mode (affects fmal too) e.g. fma(-0x1p-1000, 0x1.000001p-74, 0x1p-1022) and the inexact exception may be raised spuriously since the fenv is not saved/restored around the exact multiplication algorithm (affects x86 fma too). another issue is that the underflow behaviour when the rounded result is the minimal normal number is target dependent, ieee754 allows two ways to raise underflow for inexact results: raise if the result before rounding is in the subnormal range (e.g. aarch64, arm, powerpc) or if the result after rounding with infinite exponent range is in the subnormal range (e.g. x86, mips, sh). to avoid all these issues the algorithm was rewritten with mostly int arithmetics and float arithmetics is only used to get correct rounding and raise exceptions according to the behaviour of the target without any fenv.h dependency. it also unifies x86 and non-x86 fma. fmaf is not affected, fmal need to be fixed too. this algorithm depends on a_clz_64 and it required a few spurious instructions to make sure underflow exception is raised in a particular corner case. (normally FORCE_EVAL(tiny*tiny) would be used for this, but on i386 gcc is broken if the expression is constant https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57245 and there is no easy portable fix for the macro.)
2015-02-08add parenthesis in fma.c to clarify intent and silence warningsSzabolcs Nagy1-1/+1
2013-10-07math: fix rare underflow issue in fmaSzabolcs Nagy1-2/+14
the issue is described in commits 1e5eb73545ca6cfe8b918798835aaf6e07af5beb and ffd8ac2dd50f99c3c83d7d9d845df9874ec3e7d5
2013-09-05math: remove *_WORD64 macros from libm.hSzabolcs Nagy1-13/+13
only fma used these macros and the explicit union is clearer
2013-09-05math: fix remaining old long double code (erfl, fmal, lgammal, scalbnl)Szabolcs Nagy1-24/+14
in lgammal don't handle 1 and 2 specially, in fma use the new ldshape union instead of ld80 one.
2013-05-19math: add fma TODO comments about the underflow issueSzabolcs Nagy1-0/+2
The underflow exception is not raised correctly in some cornercases (see previous fma commit), added comments with examples for fmaf, fmal and non-x86 fma. In fmaf store the result before returning so it has the correct precision when FLT_EVAL_METHOD!=0
2013-05-19math: fix two fma issues (only affects non-nearest rounding mode, x86)Szabolcs Nagy1-4/+38
1) in downward rounding fma(1,1,-1) should be -0 but it was 0 with gcc, the code was correct but gcc does not support FENV_ACCESS ON so it used common subexpression elimination where it shouldn't have. now volatile memory access is used as a barrier after fesetround. 2) in directed rounding modes there is no double rounding issue so the complicated adjustments done for nearest rounding mode are not needed. the only exception to this rule is raising the underflow flag: assume "small" is an exactly representible subnormal value in double precision and "verysmall" is a much smaller value so that (long double)(small plus verysmall) == small then (double)(small plus verysmall) raises underflow because the result is an inexact subnormal, but (double)(long double)(small plus verysmall) does not because small is not a subnormal in long double precision and it is exact in double precision. now this problem is fixed by checking inexact using fenv when the result is subnormal
2012-11-13math: use '#pragma STDC FENV_ACCESS ON' when fenv is accessedSzabolcs Nagy1-0/+2
2012-06-20math: fix fma bug on x86 (found by Bruno Haible with gnulib)nsz1-2/+10
The long double adjustment was wrong: The usual check is mant_bits & 0x7ff == 0x400 before doing a mant_bits++ or mant_bits-- adjustment since this is the only case when rounding an inexact ld80 into double can go wrong. (only in nearest rounding mode) After such a check the ++ and -- is ok (the mantissa will end in 0x401 or 0x3ff). fma is a bit different (we need to add 3 numbers with correct rounding: hi_xy + lo_xy + z so we should survive two roundings at different places without precision loss) The adjustment in fma only checks for zero low bits mant_bits & 0x3ff == 0 this way the adjusted value is correct when rounded to double or *less* precision. (this is an important piece in the fma puzzle) Unfortunately in this case the -- is not a correct adjustment because mant_bits might underflow so further checks are needed and this was the source of the bug.
2012-03-19use scalbn or *2.0 instead of ldexp, fix fmalnsz1-6/+6
Some code assumed ldexp(x, 1) is faster than 2.0*x, but ldexp is a wrapper around scalbn which uses multiplications inside, so this optimization is wrong. This commit also fixes fmal which accidentally used ldexp instead of ldexpl loosing precision. There are various additional changes from the work-in-progress const cleanups.
2012-03-19remove unnecessary TODO comments from fma.cnsz1-5/+1
2012-03-19add fma implementation for x86nsz1-7/+143
correctly rounded double precision fma using extended precision arithmetics for ld80 systems (x87)
2012-03-16make fma and lrint functions build without full fenv supportRich Felker1-2/+12
this is necessary to support archs where fenv is incomplete or unavailable (presently arm). fma, fmal, and the lrint family should work perfectly fine with this change; fmaf is slightly broken with respect to rounding as it depends on non-default rounding modes to do its work.
2012-03-13first commit of the new libm!Rich Felker1-0/+270
thanks to the hard work of Szabolcs Nagy (nsz), identifying the best (from correctness and license standpoint) implementations from freebsd and openbsd and cleaning them up! musl should now fully support c99 float and long double math functions, and has near-complete complex math support. tgmath should also work (fully on gcc-compatible compilers, and mostly on any c99 compiler). based largely on commit 0376d44a890fea261506f1fc63833e7a686dca19 from nsz's libm git repo, with some additions (dummy versions of a few missing long double complex functions, etc.) by me. various cleanups still need to be made, including re-adding (if they're correct) some asm functions that were dropped.