summaryrefslogtreecommitdiff
path: root/arch/aarch64/fp_arch.h
diff options
context:
space:
mode:
authorSzabolcs Nagy <nsz@port70.net>2018-11-26 23:30:00 +0000
committerRich Felker <dalias@aerifal.cx>2019-04-17 13:06:43 -0400
commitb50d315fd23f0fbc4c11e2583801dd123d933745 (patch)
treef0221e457b31f78b04e2fbe6721f3c8d7c621d01 /arch/aarch64/fp_arch.h
parentf107d34e762a0c18be2ba25518667780242e21e0 (diff)
downloadmusl-b50d315fd23f0fbc4c11e2583801dd123d933745.tar.gz
musl-b50d315fd23f0fbc4c11e2583801dd123d933745.tar.bz2
musl-b50d315fd23f0fbc4c11e2583801dd123d933745.tar.xz
musl-b50d315fd23f0fbc4c11e2583801dd123d933745.zip
math: add fp_arch.h with fp_barrier and fp_force_eval
C99 has ways to support fenv access, but compilers don't implement it and assume nearest rounding mode and no fp status flag access. (gcc has -frounding-math and then it does not assume nearest rounding mode, but it still assumes the compiled code itself does not change the mode. Even if the C99 mechanism was implemented it is not ideal: it requires all code in the library to be compiled with FENV_ACCESS "on" to make it usable in non-nearest rounding mode, but that limits optimizations more than necessary.) The math functions should give reasonable results in all rounding modes (but the quality may be degraded in non-nearest rounding modes) and the fp status flag settings should follow the spec, so fenv side-effects are important and code transformations that break them should be prevented. Unfortunately compilers don't give any help with this, the best we can do is to add fp barriers to the code using volatile local variables (they create a stack frame and undesirable memory accesses to it) or inline asm (gcc specific, requires target specific fp reg constraints, often creates unnecessary reg moves and multiple barriers are needed to express that an operation has side-effects) or extern call (only useful in tail-call position to avoid stack-frame creation and does not work with lto). We assume that in a math function if an operation depends on the input and the output depends on it, then the operation will be evaluated at runtime when the function is called, producing all the expected fenv side-effects (this is not true in case of lto and in case the operation is evaluated with excess precision that is not rounded away). So fp barriers are needed (1) to prevent the move of an operation within a function (in case it may be moved from an unevaluated code path into an evaluated one or if it may be moved across a fenv access), (2) force the evaluation of an operation for its side-effect when it has no input dependency (may be constant folded) or (3) when its output is unused. I belive that fp_barrier and fp_force_eval can take care of these and they should not be needed in hot code paths.
Diffstat (limited to 'arch/aarch64/fp_arch.h')
-rw-r--r--arch/aarch64/fp_arch.h25
1 files changed, 25 insertions, 0 deletions
diff --git a/arch/aarch64/fp_arch.h b/arch/aarch64/fp_arch.h
new file mode 100644
index 00000000..f3d445b9
--- /dev/null
+++ b/arch/aarch64/fp_arch.h
@@ -0,0 +1,25 @@
+#define fp_barrierf fp_barrierf
+static inline float fp_barrierf(float x)
+{
+ __asm__ __volatile__ ("" : "+w"(x));
+ return x;
+}
+
+#define fp_barrier fp_barrier
+static inline double fp_barrier(double x)
+{
+ __asm__ __volatile__ ("" : "+w"(x));
+ return x;
+}
+
+#define fp_force_evalf fp_force_evalf
+static inline void fp_force_evalf(float x)
+{
+ __asm__ __volatile__ ("" : "+w"(x));
+}
+
+#define fp_force_eval fp_force_eval
+static inline void fp_force_eval(double x)
+{
+ __asm__ __volatile__ ("" : "+w"(x));
+}