summaryrefslogtreecommitdiff
path: root/src/math
AgeCommit message (Collapse)AuthorFilesLines
2020-02-06rename i386 exp.s to exp_ld.sRich Felker2-0/+1
this commit is for the sake of reviewable history.
2020-02-06fix excess precision in return value of i386 log-family functionsRich Felker8-0/+20
2020-02-06fix excess precision in return value of i386 acos[f] and asin[f]Rich Felker6-42/+75
analogous to commit 1c9afd69051a64cf085c6fb3674a444ff9a43857 for atan[2][f].
2020-02-06fix excess precision in return value of i386 atan[2][f]Rich Felker4-2/+8
for functions implemented in C, this is a requirement of C11 (F.6); strictly speaking that text does not apply to standard library functions, but it seems to be intended to apply to them, and C2x is expected to make it a requirement. failure to drop excess precision is particularly bad for inverse trig functions, where a value with excess precision can be outside the range of the function (entire range, or range for a particular subdomain), breaking reasonable invariants a caller may expect.
2020-01-27math/x32: correct lrintl.s for 32-bit longAlexander Monakov1-2/+2
2019-11-05ppc: add configure check for older compilers erroring on 'd' constraintrofl0r2-2/+2
2019-10-14mips: add single-instruction math functionsinfo@mobile-stream.com4-0/+64
SQRT.fmt exists on MIPS II+ (float), MIPS III+ (double). ABS.fmt exists on MIPS I+ but only cores with ABS2008 flag in FCSR implement the required behaviour.
2019-10-13math: fix signed int left shift ub in sqrtSzabolcs Nagy2-4/+2
Both sqrt and sqrtf shifted the signed exponent as signed int to adjust the bit representation of the result. There are signed right shifts too in the code but those are implementation defined and are expected to compile to arithmetic shift on supported compilers and targets.
2019-09-27math: optimize lrint on 32bit targetsSzabolcs Nagy1-1/+27
lrint in (LONG_MAX, 1/DBL_EPSILON) and in (-1/DBL_EPSILON, LONG_MIN) is not trivial: rounding to int may be inexact, but the conversion to int may overflow and then the inexact flag must not be raised. (the overflow threshold is rounding mode dependent). this matters on 32bit targets (without single instruction lrint or rint), so the common case (when there is no overflow) is optimized by inlining the lrint logic, otherwise the old code is kept as a fallback. on my laptop an i486 lrint call is asm:10ns, old c:30ns, new c:21ns on a smaller arm core: old c:71ns, new c:34ns on a bigger arm core: old c:27ns, new c:19ns
2019-08-05fix build regression in i386 asm for atan2, atan2fRich Felker2-2/+2
commit f3ed8bfe8a82af1870ddc8696ed4cc1d5aa6b441 inadvertently removed labels that were still needed.
2019-08-05fix x87 stack imbalance in corner cases of i386 math asmRich Felker8-44/+14
commit 31c5fb80b9eae86f801be4f46025bc6532a554c5 introduced underflow code paths for the i386 math asm, along with checks on the fpu status word to skip the underflow-generation instructions if the underflow flag was already raised. unfortunately, at least one such path, in log1p, returned with 2 items on the x87 stack rather than just 1 item for the return value. this is a violation of the ABI's calling convention, and could cause subsequent floating point code to produce NANs due to x87 stack overflow. if floating point results are used in flow control, this can lead to runaway wrong code execution. rather than reviewing each "underflow already raised" code path for correctness, remove them all. they're likely slower than just performing the underflow code unconditionally, and significantly more complex. all of this code should be ripped out and replaced by C source files with inline asm. doing so would preclude this kind of error by having the compiler perform all x87 stack register allocation and stack manipulation, and would produce comparable or better code. however such a change is a much larger project.
2019-06-14add riscv64 architecture supportRich Felker12-0/+180
Author: Alex Suykov <alex.suykov@gmail.com> Author: Aric Belsito <lluixhi@gmail.com> Author: Drew DeVault <sir@cmpwn.com> Author: Michael Clark <mjc@sifive.com> Author: Michael Forney <mforney@mforney.org> Author: Stefan O'Rear <sorear2@gmail.com> This port has involved the work of many people over several years. I have tried to ensure that everyone with substantial contributions has been credited above; if any omissions are found they will be noted later in an update to the authors/contributors list in the COPYRIGHT file. The version committed here comes from the riscv/riscv-musl repo's commit 3fe7e2c75df78eef42dcdc352a55757729f451e2, with minor changes by me for issues found during final review: - a_ll/a_sc atomics are removed (according to the ISA spec, lr/sc are not safe to use in separate inline asm fragments) - a_cas[_p] is fixed to be a memory barrier - the call from the _start assembly into the C part of crt1/ldso is changed to allow for the possibility that the linker does not place them nearby each other. - DTP_OFFSET is defined correctly so that local-dynamic TLS works - reloc.h LDSO_ARCH logic is simplified and made explicit. - unused, non-functional crti/n asm files are removed. - an empty .sdata section is added to crt1 so that the __global_pointer reference is resolvable. - indentation style errors in some asm files are fixed.
2019-04-17math: new powSzabolcs Nagy3-303/+520
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc The underflow exception is signaled if the result is in the subnormal range even if the result is exact. code size change: +3421 bytes. benchmark on x86_64 before, after, speedup: -Os: pow rthruput: 102.96 ns/call 33.38 ns/call 3.08x pow latency: 144.37 ns/call 54.75 ns/call 2.64x -O3: pow rthruput: 98.91 ns/call 32.79 ns/call 3.02x pow latency: 138.74 ns/call 53.78 ns/call 2.58x
2019-04-17math: new exp and exp2Szabolcs Nagy4-480/+434
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc TOINT_INTRINSICS and EXP_USE_TOINT_NARROW cases are unused. The underflow exception is signaled if the result is in the subnormal range even if the result is exact (e.g. exp2(-1023.0)). code size change: -1672 bytes. benchmark on x86_64 before, after, speedup: -Os: exp rthruput: 12.73 ns/call 6.68 ns/call 1.91x exp latency: 45.78 ns/call 21.79 ns/call 2.1x exp2 rthruput: 6.35 ns/call 5.26 ns/call 1.21x exp2 latency: 26.00 ns/call 16.58 ns/call 1.57x -O3: exp rthruput: 12.75 ns/call 6.73 ns/call 1.89x exp latency: 45.91 ns/call 21.80 ns/call 2.11x exp2 rthruput: 6.47 ns/call 5.40 ns/call 1.2x exp2 latency: 26.03 ns/call 16.54 ns/call 1.57x
2019-04-17math: new log2Szabolcs Nagy3-106/+335
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +2458 bytes (+1524 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log2 rthruput: 16.08 ns/call 10.49 ns/call 1.53x log2 latency: 44.54 ns/call 25.55 ns/call 1.74x -O3: log2 rthruput: 15.92 ns/call 10.11 ns/call 1.58x log2 latency: 44.66 ns/call 26.16 ns/call 1.71x
2019-04-17math: new logSzabolcs Nagy3-104/+454
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc Assume __FP_FAST_FMA implies __builtin_fma is inlined as a single instruction. code size change: +4588 bytes (+2540 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log rthruput: 12.61 ns/call 7.95 ns/call 1.59x log latency: 41.64 ns/call 23.38 ns/call 1.78x -O3: log rthruput: 12.51 ns/call 7.75 ns/call 1.61x log latency: 41.82 ns/call 23.55 ns/call 1.78x
2019-04-17math: new powfSzabolcs Nagy3-240/+226
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc POWF_SCALE != 1.0 case only matters if TOINT_INTRINSICS is set, which is currently not supported for any target. SNaN is not supported, it would require an issignalingf implementation. code size change: -816 bytes. benchmark on x86_64 before, after, speedup: -Os: powf rthruput: 95.14 ns/call 20.04 ns/call 4.75x powf latency: 137.00 ns/call 34.98 ns/call 3.92x -O3: powf rthruput: 92.48 ns/call 13.67 ns/call 6.77x powf latency: 131.11 ns/call 35.15 ns/call 3.73x
2019-04-17math: new exp2f and expfSzabolcs Nagy4-179/+177
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc In expf TOINT_INTRINSICS is kept, but is unused, it would require support for __builtin_round and __builtin_lround as single instruction. code size change: +94 bytes. benchmark on x86_64 before, after, speedup: -Os: expf rthruput: 9.19 ns/call 8.11 ns/call 1.13x expf latency: 34.19 ns/call 18.77 ns/call 1.82x exp2f rthruput: 5.59 ns/call 6.52 ns/call 0.86x exp2f latency: 17.93 ns/call 16.70 ns/call 1.07x -O3: expf rthruput: 9.12 ns/call 4.92 ns/call 1.85x expf latency: 34.44 ns/call 18.99 ns/call 1.81x exp2f rthruput: 5.58 ns/call 4.49 ns/call 1.24x exp2f latency: 17.95 ns/call 16.94 ns/call 1.06x
2019-04-17math: new log2fSzabolcs Nagy3-58/+108
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +177 bytes. benchmark on x86_64 before, after, speedup: -Os: log2f rthruput: 11.38 ns/call 5.99 ns/call 1.9x log2f latency: 35.01 ns/call 22.57 ns/call 1.55x -O3: log2f rthruput: 10.82 ns/call 5.58 ns/call 1.94x log2f latency: 35.13 ns/call 21.04 ns/call 1.67x
2019-04-17math: new logfSzabolcs Nagy3-54/+109
from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc, with minor changes to better fit into musl. code size change: +289 bytes. benchmark on x86_64 before, after, speedup: -Os: logf rthruput: 8.40 ns/call 6.14 ns/call 1.37x logf latency: 31.79 ns/call 24.33 ns/call 1.31x -O3: logf rthruput: 8.43 ns/call 5.58 ns/call 1.51x logf latency: 32.04 ns/call 20.88 ns/call 1.53x
2019-04-17math: add double precision error handling functionsSzabolcs Nagy5-0/+30
2019-04-17math: add single precision error handling functionsSzabolcs Nagy5-0/+30
These are supposed to be used in tail call positions when handling special cases in new code. (fp exceptions may be raised "naturally" by the common code path if special casing is more effort.) This implements the error handling apis used in https://github.com/ARM-software/optimized-routines without errno setting.
2019-04-03fix unintended global symbols in atanl.cDan Gohman1-3/+3
Mark atanhi, atanlo, and aT in atanl.c as static, as they're not intended to be part of the public API. These are already static in the LDBL_MANT_DIG == 64 code, so this patch is just making the LDBL_MANT_DIG == 113 code do the same thing.
2018-10-15x86_64: add single instruction fmaSzabolcs Nagy4-0/+92
fma is only available on recent x86_64 cpus and it is much faster than a software fma, so this should be done with a runtime check, however that requires more changes, this patch just adds the code so it can be tested when musl is compiled with -mfma or -mfma4.
2018-10-15arm: add single instruction fmaSzabolcs Nagy2-0/+30
vfma is available in the vfpv4 fpu and above, the ACLE standard feature test for double precision hardware fma support is __ARM_FEATURE_FMA && __ARM_FP&8 we need further checks to work around clang bugs (fixed in clang >=7.0) && !__SOFTFP__ because __ARM_FP is defined even with -mfloat-abi=soft && !BROKEN_VFP_ASM to disable the single precision code when inline asm handling is broken. For runtime selection the HWCAP_ARM_VFPv4 hwcap flag can be used, but that requires further work.
2018-10-15powerpc: add single instruction fabs, fabsf, fma, fmaf, sqrt, sqrtfSzabolcs Nagy6-0/+90
These are only available on hard float target and sqrt is not available in the base ISA, so further check is used.
2018-10-15s390x: add single instruction fma and fmafSzabolcs Nagy2-0/+14
These are available in the s390x baseline isa -march=z900.
2018-09-12reduce spurious inclusion of libc.hRich Felker9-9/+0
libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
2018-09-12apply hidden visibility to various remaining internal interfacesRich Felker1-0/+2
2018-09-12apply hidden visibility to internal math functionsRich Felker1-2/+2
this makes significant differences to codegen on archs with an expensive PLT-calling ABI; on i386 and gcc 7.3 for example, the sin and sinf functions no longer touch call-saved registers or the stack except for pushing outgoing arguments. performance is likely improved too, but no measurements were taken.
2018-09-12move lgamma-related internal declarations to libm.hRich Felker4-12/+3
2018-06-14add support for m68k 80-bit long double variantRich Felker1-2/+10
since x86 and m68k are the only archs with 80-bit long double and each has mandatory endianness, select the variant via endianness. differences are minor: apparently just byte order and representation of infinities. the m68k format is not well-documented anywhere I could find, so if other differences are found they may require additional changes later.
2018-04-02fix fmaf wrong resultSzabolcs Nagy1-1/+1
if double precision r=x*y+z is not a half way case between two single precision floats or it is an exact result then fmaf returns (float)r. however the exactness check was wrong when |x*y| < |z| and could cause incorrectly rounded result in nearest rounding mode when r is a half way case. fmaf(-0x1.26524ep-54, -0x1.cb7868p+11, 0x1.d10f5ep-29) was incorrectly rounded up to 0x1.d117ap-29 instead of 0x1.d1179ep-29. (exact result is 0x1.d1179efffffffecp-29, r is 0x1.d1179fp-29)
2017-10-13math: rewrite fma with mostly int arithmeticsSzabolcs Nagy1-431/+154
the freebsd fma code failed to raise underflow exception in some cases in nearest rounding mode (affects fmal too) e.g. fma(-0x1p-1000, 0x1.000001p-74, 0x1p-1022) and the inexact exception may be raised spuriously since the fenv is not saved/restored around the exact multiplication algorithm (affects x86 fma too). another issue is that the underflow behaviour when the rounded result is the minimal normal number is target dependent, ieee754 allows two ways to raise underflow for inexact results: raise if the result before rounding is in the subnormal range (e.g. aarch64, arm, powerpc) or if the result after rounding with infinite exponent range is in the subnormal range (e.g. x86, mips, sh). to avoid all these issues the algorithm was rewritten with mostly int arithmetics and float arithmetics is only used to get correct rounding and raise exceptions according to the behaviour of the target without any fenv.h dependency. it also unifies x86 and non-x86 fma. fmaf is not affected, fmal need to be fixed too. this algorithm depends on a_clz_64 and it required a few spurious instructions to make sure underflow exception is raised in a particular corner case. (normally FORCE_EVAL(tiny*tiny) would be used for this, but on i386 gcc is broken if the expression is constant https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57245 and there is no easy portable fix for the macro.)
2017-06-23powerpc64: add single-instruction math functionsRich Felker22-0/+290
while the official elfv2 abi for "powerpc64le" sets power8 as the baseline isa, we use it for both little and big endian powerpc64 targets and need to maintain compatibility with pre-power8 models. the instructions for sqrt, fabs, and fma are in the baseline isa; support for the rest is conditional via predefined isa-level macros. patch by David Edelsohn.
2017-06-23s390x: add single-instruction math functionsRich Felker24-0/+360
these were introduced in z196 and not available in the baseline (z900) ISA level. use __HTM__ as an alternate indicator for ISA level, since gcc did not define __ARCH__ until 7.x. patch by David Edelsohn.
2017-04-21fix scalbn when result is in the subnormal rangeSzabolcs Nagy3-12/+14
in nearest rounding mode scalbn could introduce double rounding error when an intermediate value and the final result were both in the subnormal range e.g. scalbn(0x1.7ffffffffffffp-1, -1073) returned 0x1p-1073 instead of 0x1p-1074, because the intermediate computation got rounded to 0x1.8p-1023. with the fix an intermediate value can only be in the subnormal range if the final result is 0 which is correct even after double rounding. (there still can be two roundings so signals may be raised twice, but that's only observable with trapping exceptions which is not supported.)
2017-03-21aarch64: add single instruction math functionsSzabolcs Nagy34-24/+226
this should increase performance and reduce code size on aarch64. the compiled code was checked against using __builtin_* instead of inline asm with gcc-6.2.0. lrint is two instructions. c with inline asm is used because it is safer than a pure asm implementation, this prevents ll{rint,round} to be an alias of l{rint,round} (because the types don't match) and depends on gcc style inline asm support. ceil, floor, round, trunc can either raise inexact on finite non-integer inputs or not raise any exceptions. the new implementation does not raise exceptions while the generic c code does. on aarch64, the underflow exception is signaled before rounding (ieee 754 allows both before and after rounding, but it must be consistent), the generic fma c code signals it after rounding so using single instruction fixes a slight conformance issue too.
2017-03-15fix threshold constants in j0f, y0f, j1f, y1fSzabolcs Nagy2-13/+12
partly following freebsd rev 279491 https://svnweb.freebsd.org/base?view=revision&revision=279491 (musl had some of the fixes before freebsd). the change should not matter much for j0f, y0f, but it improves j1f and y1f in [2.5,~3.75] (that is [0x40200000,~0x40700000]). near roots (e.g. around 3.8317 for j1f) there are still large ulp errors. dropped code that tried to raise inexact.
2016-10-20math: fix pow signed shift ubSzabolcs Nagy1-2/+2
j is int32_t and thus j<<31 is undefined if j==1, so j is changed to uint32_t locally as a quick fix, the generated code is not affected. (this is a strict conformance fix, future c standard may allow 1<<31, see DR 463. the bug was inherited from freebsd fdlibm, the proper fix is to use uint32_t for all bit hacks, but that requires more intrusive changes.) reported by Daniel Sabogal
2016-08-30math: fix 128bit long double inverse trigonometric functionsSzabolcs Nagy1-1/+1
there was a copy paste error that could cause large ulp errors in atan2l, atanl, asinl and acosl on aarch64, mips64 and mipsn32. (the implementation is from freebsd fdlibm, but the tail end of the polynomial was wrong. 128 bit long double functions are not yet tested so this went undetected.)
2016-03-04math: fix expf(-NAN) and exp2f(-NAN) to return -NAN instead of 0Szabolcs Nagy2-0/+4
expf(-NAN) was treated as expf(-large) which unconditionally returns +0, so special case +-NAN. reported by Petr Hosek.
2016-02-19work around regression building for armhf with clang (compiler bug)Rich Felker2-2/+2
commit e4355bd6bec89688e8c739cd7b4c76e675643dca moved the math asm from external source files to inline asm, but unfortunately, all current releases of clang use the wrong inline asm constraint codes for float and double ("w" and "P" instead of "t" and "w", respectively). this patch adds detection for the bug in configure, and, for now, just disables the affected asm on broken clang versions.
2016-02-18improve macro logic for enabling arm math asmRich Felker2-2/+2
in order to take advantage of the fpu in -mfloat-abi=softfp mode, the __VFP_FP__ (presence of vfp fpu) was checked instead of checking for __ARM_PCS_VFP (hardfloat EABI variant). however, the latter macro is the one that's actually specified by the ABI documents rather than being compiler-specific, and should also be checked in case __VFP_FP__ is not defined on some compilers or some configurations.
2016-01-20replace armhf math asm source files with inline asmRich Felker16-40/+60
this makes it possible to inline them with LTO, and is the simplest approach to eliminating the use of .sub files. this also makes VFP sqrt available for use with the standard EABI (plain arm rather than armhf subarch) when libc is built with -mfloat-abi=softfp. the same could have been done for fabs, but when the argument and return value are in integer registers, moving to VFP registers and back is almost certainly more costly than a simple integer operation.
2015-11-21math: explicitly promote expressions to excess-precision typesRich Felker3-4/+4
a conforming compiler for an arch with excess precision floating point (FLT_EVAL_METHOD!=0; presently i386 is the only such arch supported) computes all intermediate results in the types float_t and double_t rather than the nominal type of the expression. some incorrect compilers, however, only keep excess precision in registers, and convert down to the nominal type when spilling intermediate results to memory, yielding unpredictable results that depend on the compiler's choices of what/when to spill. in particular, this happens on old gcc versions with -ffloat-store, which we need in order to work around bugs where the compiler wrongly keeps explicitly-dropped excess precision. by explicitly converting to double_t where expressions are expected be be evaluated in double_t precision, we can avoid depending on the compiler to get types correct when spilling; the nominal and intermediate precision now match. this commit should not change the code generated by correct compilers, or by old ones on non-i386 archs where double_t is defined as double. this fixes a serious bug in argument reduction observed on i386 with gcc 4.2: for values of x outside the unit circle, sin(x) was producing results outside the interval [-1,1]. changes made in commit 0ce946cf808274c2d6e5419b139e130c8ad4bd30 were likely responsible for breaking compatibility with this and other old gcc versions. patch by Szabolcs Nagy.
2015-11-10explicitly assemble all arm asm sources as UALRich Felker4-0/+4
these files are all accepted as legacy arm syntax when producing arm code, but legacy syntax cannot be used for producing thumb2 with access to the full ISA. even after switching to UAL, some asm source files contain instructions which are not valid in thumb mode, so these will need to be addressed separately.
2015-10-19declare fpu usage to the assembler in arm hard-float asm filesSzabolcs Nagy4-0/+4
Some armhf gcc toolchains (built with --with-float=hard but without --with-fpu=vfp*) do not pass -mfpu=vfp to the assembler and then binutils rejects the UAL mnemonics for VFP unless there is an .fpu vfp directive in the asm source.
2015-04-23fix regression in x86_64 math asm with old binutilsRich Felker2-6/+6
the implicit-operand form of fucomip is rejected by binutils 2.19 and perhaps other versions still in use. writing both operands explicitly fixes the issue. there is no change to the resulting output. commit a732e80d33b4fd6f510f7cec4f5573ef5d89bc4e was the source of this regression.
2015-04-18remove potentially PIC-incompatible relocations from x86_64 and x32 asmRich Felker2-2/+2
analogous to commit 8ed66ecbcba1dd0f899f22b534aac92a282f42d5 for i386.