Age | Commit message (Collapse) | Author | Files | Lines |
|
none of these are actual textrels because of ld-time binding performed
by -Bsymbolic-functions, but I'm changing them with the goal of making
ld-time binding purely an optimization rather than relying on it for
semantic purposes.
in the case of memmove's call to memcpy, making it explicit that the
memmove asm is assuming the forward-copying behavior of the memcpy asm
is desirable anyway; in case memcpy is ever changed, the semantic
mismatch would be apparent while editing memmcpy.s.
|
|
there were two problems:
* omitted underflow on subnormal results: exp2l(-16383.5) was calculated
as sqrt(2)*2^-16384, the last bits of sqrt(2) are zero so the down scaling
does not underflow eventhough the result is in subnormal range
* spurious underflow for subnormal inputs: exp2l(0x1p-16400) was evaluated
as f2xm1(x)+1 and f2xm1 raised underflow (because inexact subnormal result)
the first issue is fixed by raising underflow manually if x is in
(-32768,-16382] and not integer (x-0x1p63+0x1p63 != x)
the second issue is fixed by treating x in (-0x1p64,0x1p64) specially
for these fixes the special case handling was completely rewritten
|
|
underflow is raised by an inexact subnormal float store,
since subnormal operations are slow, check the underflow
flag and skip the store if it's already raised
|
|
with naive exp2l(x*log2e) the last 12bits of the result was incorrect
for x with large absolute value
with hi + lo = x*log2e is caluclated to 128 bits precision and then
expl(x) = exp2l(hi) + exp2l(hi) * f2xm1(lo)
this gives <1.5ulp measured error everywhere in nearest rounding mode
|
|
exp(inf), exp(-inf), exp(nan) used to raise wrong flags
|
|
exponents (base 2) near 16383 were broken due to (1) wrong cutoff, and
(2) inability to fit the necessary range of scalings into a long
double value.
as a solution, we fall back to using frndint/fscale for insanely large
exponents, and also have to special-case infinities here to avoid
inf-inf generating nan.
thankfully the costly code never runs in normal usage cases.
|
|
up to 30% faster exp2 by avoiding slow frndint and fscale functions.
expm1 also takes a much more direct path for small arguments (the
expected usage case).
|
|
infinities were getting converted into nans. the new code simply tests
for infinity and replaces it with a large magnitude value of the same
sign.
also, the fcomi instruction is apparently not part of the i387
instruction set, so avoid using it.
|
|
|