Age | Commit message (Collapse) | Author | Files | Lines |
|
commit b50d315fd23f0fbc4c11e2583801dd123d933745 introduced
fp_force_eval implemented by default with a dead store to a volatile
variable. unfortunately introduces warnings with -Wunused-variable and
breaks the ability to use -Werror with the default warning options set
by configure when warnings are enabled.
we could just call fp_barrier instead, but that results in a spurious
load after the store due to volatile semantics.
the fix committed here avoids the load. it will still produce warnings
without -Wno-unused-but-set-variable, but that's part of our default
warning profile, and there are already other locations in the source
where an unused variable warning will occur without it.
|
|
from https://github.com/ARM-software/optimized-routines,
commit 04884bd04eac4b251da4026900010ea7d8850edc
The underflow exception is signaled if the result is in the subnormal
range even if the result is exact.
code size change: +3421 bytes.
benchmark on x86_64 before, after, speedup:
-Os:
pow rthruput: 102.96 ns/call 33.38 ns/call 3.08x
pow latency: 144.37 ns/call 54.75 ns/call 2.64x
-O3:
pow rthruput: 98.91 ns/call 32.79 ns/call 3.02x
pow latency: 138.74 ns/call 53.78 ns/call 2.58x
|
|
from https://github.com/ARM-software/optimized-routines,
commit 04884bd04eac4b251da4026900010ea7d8850edc
POWF_SCALE != 1.0 case only matters if TOINT_INTRINSICS is set, which
is currently not supported for any target.
SNaN is not supported, it would require an issignalingf
implementation.
code size change: -816 bytes.
benchmark on x86_64 before, after, speedup:
-Os:
powf rthruput: 95.14 ns/call 20.04 ns/call 4.75x
powf latency: 137.00 ns/call 34.98 ns/call 3.92x
-O3:
powf rthruput: 92.48 ns/call 13.67 ns/call 6.77x
powf latency: 131.11 ns/call 35.15 ns/call 3.73x
|
|
from https://github.com/ARM-software/optimized-routines,
commit 04884bd04eac4b251da4026900010ea7d8850edc
In expf TOINT_INTRINSICS is kept, but is unused, it would require support
for __builtin_round and __builtin_lround as single instruction.
code size change: +94 bytes.
benchmark on x86_64 before, after, speedup:
-Os:
expf rthruput: 9.19 ns/call 8.11 ns/call 1.13x
expf latency: 34.19 ns/call 18.77 ns/call 1.82x
exp2f rthruput: 5.59 ns/call 6.52 ns/call 0.86x
exp2f latency: 17.93 ns/call 16.70 ns/call 1.07x
-O3:
expf rthruput: 9.12 ns/call 4.92 ns/call 1.85x
expf latency: 34.44 ns/call 18.99 ns/call 1.81x
exp2f rthruput: 5.58 ns/call 4.49 ns/call 1.24x
exp2f latency: 17.95 ns/call 16.94 ns/call 1.06x
|
|
Musl currently aims to support non-nearest rounding mode and does not
support SNaNs. These macros allow marking relevant code paths in case
these decisions are changed later (they also help documenting the
corner cases involved).
|
|
These don't have an effectw with -Os so not useful with default settings
other than documenting the expectation.
With --enable-optimize=internal,malloc,string,math the libc.so code size
increases by 18K on x86_64 and performance varies in -2% .. +10%.
|
|
|
|
These are supposed to be used in tail call positions when handling
special cases in new code. (fp exceptions may be raised "naturally"
by the common code path if special casing is more effort.)
This implements the error handling apis used in
https://github.com/ARM-software/optimized-routines
without errno setting.
|
|
Previously type casts or assignments were used for handling excess
precision, which assumed standard C99 semantics, but since it's a
rarely needed obscure detail, it's better to use explicit helper
functions to document where we rely on this. It also helps if the
code is used outside of the libc in non-C99 compilation mode: with the
default excess precision handling of gcc, explicit inline asm barriers
are needed for narrowing on FLT_EVAL_METHOD!=0 targets.
I plan to use this in new code with the existing style that uses
double_t and float_t as much as possible.
One ugliness is that it is required for almost every return statement
since that does not drop excess precision (the standard changed this
in C11 annex F, but that does not help in non-standard compilation
modes or with old compilers).
|
|
C99 has ways to support fenv access, but compilers don't implement it
and assume nearest rounding mode and no fp status flag access. (gcc has
-frounding-math and then it does not assume nearest rounding mode, but
it still assumes the compiled code itself does not change the mode.
Even if the C99 mechanism was implemented it is not ideal: it requires
all code in the library to be compiled with FENV_ACCESS "on" to make it
usable in non-nearest rounding mode, but that limits optimizations more
than necessary.)
The math functions should give reasonable results in all rounding modes
(but the quality may be degraded in non-nearest rounding modes) and the
fp status flag settings should follow the spec, so fenv side-effects are
important and code transformations that break them should be prevented.
Unfortunately compilers don't give any help with this, the best we can
do is to add fp barriers to the code using volatile local variables
(they create a stack frame and undesirable memory accesses to it) or
inline asm (gcc specific, requires target specific fp reg constraints,
often creates unnecessary reg moves and multiple barriers are needed to
express that an operation has side-effects) or extern call (only useful
in tail-call position to avoid stack-frame creation and does not work
with lto).
We assume that in a math function if an operation depends on the input
and the output depends on it, then the operation will be evaluated at
runtime when the function is called, producing all the expected fenv
side-effects (this is not true in case of lto and in case the operation
is evaluated with excess precision that is not rounded away). So fp
barriers are needed (1) to prevent the move of an operation within a
function (in case it may be moved from an unevaluated code path into an
evaluated one or if it may be moved across a fenv access), (2) force the
evaluation of an operation for its side-effect when it has no input
dependency (may be constant folded) or (3) when its output is unused. I
belive that fp_barrier and fp_force_eval can take care of these and they
should not be needed in hot code paths.
|
|
Nothing is left from the original fdlibm header nor from the bsd
modifications to it other than some internal api declarations.
Comments are dropped that may be copyrightable content.
|
|
Code generation for SET_HIGH_WORD slightly changes, but it only affects
pow, otherwise the generated code is unchanged.
|
|
This makes it easier to build musl math code with a compiler that
does not support complex types (tcc) and in general more sensible
factorization of the internal headers.
|
|
this makes significant differences to codegen on archs with an
expensive PLT-calling ABI; on i386 and gcc 7.3 for example, the sin
and sinf functions no longer touch call-saved registers or the stack
except for pushing outgoing arguments. performance is likely improved
too, but no measurements were taken.
|
|
|
|
since x86 and m68k are the only archs with 80-bit long double and each
has mandatory endianness, select the variant via endianness.
differences are minor: apparently just byte order and representation
of infinities. the m68k format is not well-documented anywhere I could
find, so if other differences are found they may require additional
changes later.
|
|
This is in preparation for the aarch64 port only to have the long
double math symbols available on ld128 platforms. The implementations
should be fixed up later once we have proper tests for these functions.
Added bigendian handling for ld128 bit manipulations too.
|
|
this avoids assuming the presence of C11 macro definitions in the
public complex.h, which need changes potentially incompatible with the
way these macros are being used internally.
|
|
gcc did not always drop excess precision according to c99 at assignments
before version 4.5 even if -std=c99 was requested which caused badly
broken mathematical functions on i386 when FLT_EVAL_METHOD!=0
but STRICT_ASSIGN was not used consistently and it is worked around for
old compilers with -ffloat-store so it is no longer needed
the new convention is to get the compiler respect c99 semantics and when
excess precision is not harmful use float_t or double_t or to specialize
code using FLT_EVAL_METHOD
|
|
libc.h is only for weak_alias so include it directly where it is used
|
|
|
|
only fma used these macros and the explicit union is clearer
|
|
|
|
new ldshape union, ld128 support is kept, code that used the old
ldshape union was rewritten (IEEEl2bits union of freebsd libm is
not touched yet)
ld80 __fpclassifyl no longer tries to handle invalid representation
|
|
|
|
the volatile hack in STRICT_ASSIGN is only needed if
assignment is not respected and excess precision is kept.
gcc -fexcess-precision=standard and -ffloat-store both
respect assignment and musl use these flags by default.
i kept the macro for now so the workaround may be used
for bad compilers in the future.
|
|
|
|
updated nextafter* to use FORCE_EVAL, it can be used in many other
places in the math code to improve readability.
|
|
|
|
|
|
|
|
standard functions cannot depend on nonstandard symbols
|
|
thanks to the hard work of Szabolcs Nagy (nsz), identifying the best
(from correctness and license standpoint) implementations from freebsd
and openbsd and cleaning them up! musl should now fully support c99
float and long double math functions, and has near-complete complex
math support. tgmath should also work (fully on gcc-compatible
compilers, and mostly on any c99 compiler).
based largely on commit 0376d44a890fea261506f1fc63833e7a686dca19 from
nsz's libm git repo, with some additions (dummy versions of a few
missing long double complex functions, etc.) by me.
various cleanups still need to be made, including re-adding (if
they're correct) some asm functions that were dropped.
|