summaryrefslogtreecommitdiff
path: root/src/math
AgeCommit message (Collapse)AuthorFilesLines
2016-02-19work around regression building for armhf with clang (compiler bug)Rich Felker2-2/+2
commit e4355bd6bec89688e8c739cd7b4c76e675643dca moved the math asm from external source files to inline asm, but unfortunately, all current releases of clang use the wrong inline asm constraint codes for float and double ("w" and "P" instead of "t" and "w", respectively). this patch adds detection for the bug in configure, and, for now, just disables the affected asm on broken clang versions.
2016-02-18improve macro logic for enabling arm math asmRich Felker2-2/+2
in order to take advantage of the fpu in -mfloat-abi=softfp mode, the __VFP_FP__ (presence of vfp fpu) was checked instead of checking for __ARM_PCS_VFP (hardfloat EABI variant). however, the latter macro is the one that's actually specified by the ABI documents rather than being compiler-specific, and should also be checked in case __VFP_FP__ is not defined on some compilers or some configurations.
2016-01-20replace armhf math asm source files with inline asmRich Felker16-40/+60
this makes it possible to inline them with LTO, and is the simplest approach to eliminating the use of .sub files. this also makes VFP sqrt available for use with the standard EABI (plain arm rather than armhf subarch) when libc is built with -mfloat-abi=softfp. the same could have been done for fabs, but when the argument and return value are in integer registers, moving to VFP registers and back is almost certainly more costly than a simple integer operation.
2015-11-21math: explicitly promote expressions to excess-precision typesRich Felker3-4/+4
a conforming compiler for an arch with excess precision floating point (FLT_EVAL_METHOD!=0; presently i386 is the only such arch supported) computes all intermediate results in the types float_t and double_t rather than the nominal type of the expression. some incorrect compilers, however, only keep excess precision in registers, and convert down to the nominal type when spilling intermediate results to memory, yielding unpredictable results that depend on the compiler's choices of what/when to spill. in particular, this happens on old gcc versions with -ffloat-store, which we need in order to work around bugs where the compiler wrongly keeps explicitly-dropped excess precision. by explicitly converting to double_t where expressions are expected be be evaluated in double_t precision, we can avoid depending on the compiler to get types correct when spilling; the nominal and intermediate precision now match. this commit should not change the code generated by correct compilers, or by old ones on non-i386 archs where double_t is defined as double. this fixes a serious bug in argument reduction observed on i386 with gcc 4.2: for values of x outside the unit circle, sin(x) was producing results outside the interval [-1,1]. changes made in commit 0ce946cf808274c2d6e5419b139e130c8ad4bd30 were likely responsible for breaking compatibility with this and other old gcc versions. patch by Szabolcs Nagy.
2015-11-10explicitly assemble all arm asm sources as UALRich Felker4-0/+4
these files are all accepted as legacy arm syntax when producing arm code, but legacy syntax cannot be used for producing thumb2 with access to the full ISA. even after switching to UAL, some asm source files contain instructions which are not valid in thumb mode, so these will need to be addressed separately.
2015-10-19declare fpu usage to the assembler in arm hard-float asm filesSzabolcs Nagy4-0/+4
Some armhf gcc toolchains (built with --with-float=hard but without --with-fpu=vfp*) do not pass -mfpu=vfp to the assembler and then binutils rejects the UAL mnemonics for VFP unless there is an .fpu vfp directive in the asm source.
2015-04-23fix regression in x86_64 math asm with old binutilsRich Felker2-6/+6
the implicit-operand form of fucomip is rejected by binutils 2.19 and perhaps other versions still in use. writing both operands explicitly fixes the issue. there is no change to the resulting output. commit a732e80d33b4fd6f510f7cec4f5573ef5d89bc4e was the source of this regression.
2015-04-18remove potentially PIC-incompatible relocations from x86_64 and x32 asmRich Felker2-2/+2
analogous to commit 8ed66ecbcba1dd0f899f22b534aac92a282f42d5 for i386.
2015-04-18remove the last of possible-textrels from i386 asmRich Felker2-1/+5
none of these are actual textrels because of ld-time binding performed by -Bsymbolic-functions, but I'm changing them with the goal of making ld-time binding purely an optimization rather than relying on it for semantic purposes. in the case of memmove's call to memcpy, making it explicit that the memmove asm is assuming the forward-copying behavior of the memcpy asm is desirable anyway; in case memcpy is ever changed, the semantic mismatch would be apparent while editing memmcpy.s.
2015-04-18math: fix pow(+-0,-inf) not to raise divbyzero flagSzabolcs Nagy3-3/+3
this reverts the commit f29fea00b5bc72d4b8abccba2bb1e312684d1fce which was based on a bug in C99 and POSIX and did not match IEEE-754 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1515.pdf
2015-03-11add aarch64 portSzabolcs Nagy4-0/+24
This adds complete aarch64 target support including bigendian subarch. Some of the long double math functions are known to be broken otherwise interfaces should be fully functional, but at this point consider this port experimental. Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
2015-03-11math: add dummy implementations of 128 bit long double functionsSzabolcs Nagy16-4/+97
This is in preparation for the aarch64 port only to have the long double math symbols available on ld128 platforms. The implementations should be fixed up later once we have proper tests for these functions. Added bigendian handling for ld128 bit manipulations too.
2015-03-11math: add ld128 exp2l based on the freebsd implementationSzabolcs Nagy1-1/+366
Changed the special case handling and bit manipulation to better match the double version.
2015-02-09math: fix fmodl for IEEE binary128Szabolcs Nagy1-1/+1
This trivial copy-paste bug went unnoticed due to lack of testing. No currently supported target archs are affected.
2015-02-08math: fix __fpclassifyl(-0.0) for IEEE binary128Szabolcs Nagy1-3/+2
The sign bit was not cleared before checking for 0 so -0.0 was misclassified as FP_SUBNORMAL instead of FP_ZERO.
2015-02-08add parenthesis in fma.c to clarify intent and silence warningsSzabolcs Nagy1-1/+1
2014-11-05math: use fnstsw consistently instead of fstsw in x87 asmSzabolcs Nagy11-11/+11
fnstsw does not wait for pending unmasked x87 floating-point exceptions and it is the same as fstsw when all exceptions are masked which is the only environment libc supports.
2014-11-05math: fix x86_64 and x32 asm not to use sahf instructionSzabolcs Nagy6-28/+14
Some early x86_64 cpus (released before 2006) did not support sahf/lahf instructions so they should be avoided (intel manual says they are only supported if CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 1). The workaround simplifies exp2l and expm1l because fucomip can be used instead of the fucomp;fnstsw;sahf sequence copied from i386. In fmodl and remainderl sahf is replaced by a simple bit test.
2014-10-31math: use the rounding idiom consistentlySzabolcs Nagy13-58/+89
the idiomatic rounding of x is n = x + toint - toint; where toint is either 1/EPSILON (x is non-negative) or 1.5/EPSILON (x may be negative and nearest rounding mode is assumed) and EPSILON is according to the evaluation precision (the type of toint is not very important, because single precision float can represent the 1/EPSILON of ieee binary128). in case of FLT_EVAL_METHOD!=0 this avoids a useless store to double or float precision, and the long double code became cleaner with 1/LDBL_EPSILON instead of ifdefs for toint. __rem_pio2f and __rem_pio2 functions slightly changed semantics: on i386 a double-rounding is avoided so close to half-way cases may get evaluated differently eg. as sin(pi/4-eps) instead of cos(pi/4+eps)
2014-10-31fix rint.c and rintf.c when FLT_EVAL_METHOD!=0Szabolcs Nagy2-4/+22
The old code used the rounding idiom incorrectly: y = (double)(x + 0x1p52) - 0x1p52; the cast is useless if FLT_EVAL_METHOD==0 and causes a second rounding if FLT_EVAL_METHOD==2 which can give incorrect result in nearest rounding mode, so the correct idiom is to add/sub a power-of-2 according to the characteristics of double_t. This did not cause actual bug because only i386 is affected where rint is implemented in asm. Other rounding functions use a similar idiom, but they give correct results because they only rely on getting a neighboring integer result and the rounding direction is fixed up separately independently of the current rounding mode. However they should be fixed to use the idiom correctly too.
2014-10-08always provide __fpclassifyl and __signbitl definitionsRich Felker2-1/+9
previously the external definitions of these functions were omitted on archs where long double is the same as double, since the code paths in the math.h macros which would call them are unreachable. however, even if they are unreachable, the definitions are still mandatory. omitting them is invalid C, and in the case of a non-optimizing compiler, will result in a link error.
2014-09-18math: fix exp10 not to raise invalid exception on NaNSzabolcs Nagy3-4/+13
This was not caught earlier because gcc incorrectly generates quiet relational operators that never raise exceptions.
2014-09-08fix exp10l.c to include float.hSzabolcs Nagy1-0/+1
the previous commit was a no op in exp10l because LDBL_* macros were implicitly 0 (the preprocessor does not warn about undefined symbols).
2014-09-08prune math code on archs with binary64 long doubleSzabolcs Nagy2-0/+10
__polevll, __p1evll and exp10l were provided on archs when long double is the same as double. The first two were completely unused and exp10l can be a wrapper around exp10.
2014-04-11math: fix aliasing violation in long double wrappersSzabolcs Nagy2-2/+10
modfl and sincosl were passing long double* instead of double* to the wrapped double precision functions (on archs where long double and double have the same size). This is fixed now by using temporaries (this is not optimized to a single branch so the generated code is a bit bigger). Found by Morten Welinder.
2014-02-23x32 port (diff against vanilla x86_64)rofl0r18-69/+69
2014-02-23import vanilla x86_64 code as x32rofl0r30-0/+396
2014-01-08math: add drem and dremf weak aliases to i386 remainder asmSzabolcs Nagy2-0/+6
weak_alias was only in the c code, so drem was missing on platforms where remainder is implemented in asm.
2013-12-12math: define _GNU_SOURCE when implementing non-standard math functionsSzabolcs Nagy6-0/+6
this makes the prototypes in math.h are visible so they are checked agaist the function definitions
2013-11-24math: clean up __rem_pio2Szabolcs Nagy3-71/+53
- remove the HAVE_EFFICIENT_IRINT case: fn is an exact integer, so it can be converted to int32_t a bit more efficiently than with a cast (the rounding mode change can be avoided), but musl does not support this case on any arch. - __rem_pio2: use double_t where possible - __rem_pio2f: use less assignments to avoid stores on i386 - use unsigned int bit manipulation (and union instead of macros) - use hexfloat literals instead of named constants
2013-11-21math: add (obsolete) bsd drem and finite functionsSzabolcs Nagy4-0/+20
2013-11-21math: lgamma cleanup (simpler sin(pi*x) for the negative case)Szabolcs Nagy4-202/+110
* simplify sin_pi(x) (don't care about inexact here, the result is inexact anyway, and x is not so small to underflow) * in lgammal add the previously removed special case for x==1 and x==2 (to fix the sign of zero in downward rounding mode) * only define lgammal on supported long double platforms * change tgamma so the generated code is a bit smaller
2013-10-28math: extensive log*.c cleanupSzabolcs Nagy14-583/+369
The log, log2 and log10 functions share a lot of code and to a lesser extent log1p too. A small part of the code was kept separately in __log1p.h, but since it did not capture much of the common code and it was inlined anyway, it did not solve the issue properly. Now the log functions have significant code duplication, which may be resolved later, until then they need to be modified together. logl, log10l, log2l, log1pl: * Fix the sign when the return value should be -inf. * Remove the volatile hack from log10l (seems unnecessary) log1p, log1pf: * Change the handling of small inputs: only |x|<2^-53 is special (then it is enough to return x with the usual subnormal handling) this fixes the sign of log1p(0) in downward rounding. * Do not handle the k==0 case specially (other than skipping the elaborate argument reduction) * Do not handle 1+x close to power-of-two specially (this code was used rarely, did not give much speed up and the precision wasn't better than the general) * Fix the correction term formula (c=1-(u-x) was used incorrectly when x<1 but (double)(x+1)==2, this was not a critical issue) * Use the exact same method for calculating log(1+f) as in log (except in log1p the c correction term is added to the result). log, logf, log10, log10f, log2, log2f: * Use double_t and float_t consistently. * Now the first part of log10 and log2 is identical to log (until the return statement, hopefully this makes maintainence easier). * Most special case formulas were removed (close to power-of-two and k==0 cases), they increase the code size without providing precision or performance benefits (and obfuscate the code). Only x==1 is handled specially so in downward rounding mode the sign of zero is correct (the general formula happens to give -0). * For x==0 instead of -1/0.0 or -two54/0.0, return -1/(x*x) to force raising the exception at runtime. * Arg reduction code is changed (slightly simplified) * The thresholds for arg reduction to [sqrt(2)/2,sqrt(2)] are now consistently the [0x3fe6a09e00000000,0x3ff6a09dffffffff] and the [0x3f3504f3,0x3fb504f2] intervals for double and float reductions respectively (the exact threshold values are not critical) * Remove the obsolete comment for the FLT_EVAL_METHOD!=0 case in log2f (The same code is used for all eval methods now, on i386 slightly simpler code could be used, but we have asm there anyway) all: * Fix signed int arithmetics (using unsigned for bitmanipulation) * Fix various comments
2013-10-07math: fix rare underflow issue in fmaSzabolcs Nagy3-13/+55
the issue is described in commits 1e5eb73545ca6cfe8b918798835aaf6e07af5beb and ffd8ac2dd50f99c3c83d7d9d845df9874ec3e7d5
2013-10-07math: use sqrtl if FLT_EVAL_METHOD==2 in acosh and acoshfSzabolcs Nagy2-0/+13
this makes acosh slightly more precise around 1.0 on i386
2013-10-06math: remove an unused variable from modflSzabolcs Nagy1-1/+0
2013-10-04math: remove code duplication in erfl found by clang analyzerSzabolcs Nagy1-13/+2
erfl had some superflous code left around after the last erf cleanup. the issue was reported by Alexander Monakov
2013-10-04math: remove a useless assignment in lgammal found by clang analyzerSzabolcs Nagy1-2/+2
the issue was reported by Alexander Monakov
2013-09-13fix x86_64 lrintl asm, againRich Felker1-2/+2
the underlying problem was not incorrect sign extension (fixed in the previous commit to this file by nsz) but that code that treats "long" as 32-bit was copied blindly from i386 to x86_64. now lrintl is identical to llrintl on x86_64, as it should be.
2013-09-06math: remove STRICT_ASSIGN from exp2f (see previous commit)Szabolcs Nagy1-1/+1
2013-09-06math: remove STRICT_ASSIGN macroSzabolcs Nagy10-12/+13
gcc did not always drop excess precision according to c99 at assignments before version 4.5 even if -std=c99 was requested which caused badly broken mathematical functions on i386 when FLT_EVAL_METHOD!=0 but STRICT_ASSIGN was not used consistently and it is worked around for old compilers with -ffloat-store so it is no longer needed the new convention is to get the compiler respect c99 semantics and when excess precision is not harmful use float_t or double_t or to specialize code using FLT_EVAL_METHOD
2013-09-05math: support invalid ld80 representations in fpclassifySzabolcs Nagy1-2/+4
apparently gnulib requires invalid long double representations to be handled correctly in printf so we classify them according to how the fpu treats them: bad inf is nan, bad nan is nan, bad normal is nan and bad subnormal/zero is minimal normal
2013-09-05math: fix atanh (overflow and underflow issues)Szabolcs Nagy3-14/+37
in atanh exception handling was left to the called log functions, but the argument to those functions could underflow or overflow. use double_t and float_t to avoid some useless stores on x86
2013-09-05math: remove libc.h include from libm.hSzabolcs Nagy4-1/+5
libc.h is only for weak_alias so include it directly where it is used
2013-09-05math: fix acoshf on negative valuesSzabolcs Nagy2-7/+8
acosh(x) is invalid for x<1, acoshf tried to be clever using signed comparisions to handle all x<2 the same way, but the formula was wrong on large negative values.
2013-09-05math: fix expm1l on x86_64 (avoid underflow for large negative x)Szabolcs Nagy3-3/+13
copy the fix from i386: return -1 instead of exp2l(x)-1 when x <= -65
2013-09-05math: fix lrintl.s on x86_64 (use movslq to signextend the result)Szabolcs Nagy1-1/+1
2013-09-05math: fix exp2l asm on x86 (raise underflow correctly)Szabolcs Nagy2-67/+78
there were two problems: * omitted underflow on subnormal results: exp2l(-16383.5) was calculated as sqrt(2)*2^-16384, the last bits of sqrt(2) are zero so the down scaling does not underflow eventhough the result is in subnormal range * spurious underflow for subnormal inputs: exp2l(0x1p-16400) was evaluated as f2xm1(x)+1 and f2xm1 raised underflow (because inexact subnormal result) the first issue is fixed by raising underflow manually if x is in (-32768,-16382] and not integer (x-0x1p63+0x1p63 != x) the second issue is fixed by treating x in (-0x1p64,0x1p64) specially for these fixes the special case handling was completely rewritten
2013-09-05math: cosmetic cleanup (use explicit union instead of fshape and dshape)Szabolcs Nagy10-100/+84
2013-09-05math: remove *_WORD64 macros from libm.hSzabolcs Nagy1-13/+13
only fma used these macros and the explicit union is clearer