diff options
author | Rich Felker <dalias@aerifal.cx> | 2019-02-17 23:22:27 -0500 |
---|---|---|
committer | Rich Felker <dalias@aerifal.cx> | 2019-02-18 21:01:16 -0500 |
commit | 9d44b6460ab603487dab4d916342d9ba4467e6b9 (patch) | |
tree | c7aa27a062fe7847972b204ced082217b5e8b0ad /src/thread | |
parent | 805288929fdf511b4044cf07c59e02e2eaa9c546 (diff) | |
download | musl-9d44b6460ab603487dab4d916342d9ba4467e6b9.tar.gz musl-9d44b6460ab603487dab4d916342d9ba4467e6b9.tar.bz2 musl-9d44b6460ab603487dab4d916342d9ba4467e6b9.tar.xz musl-9d44b6460ab603487dab4d916342d9ba4467e6b9.zip |
install dynamic tls synchronously at dlopen, streamline access
previously, dynamic loading of new libraries with thread-local storage
allocated the storage needed for all existing threads at load-time,
precluding late failure that can't be handled, but left installation
in existing threads to take place lazily on first access. this imposed
an additional memory access and branch on every dynamic tls access,
and imposed a requirement, which was not actually met, that the
dynamic tlsdesc asm functions preserve all call-clobbered registers
before calling C code to to install new dynamic tls on first access.
the x86[_64] versions of this code wrongly omitted saving and
restoring of fpu/vector registers, assuming the compiler would not
generate anything using them in the called C code. the arm and aarch64
versions saved known existing registers, but failed to be future-proof
against expansion of the register file.
now that we track live threads in a list, it's possible to install the
new dynamic tls for each thread at dlopen time. for the most part,
synchronization is not needed, because if a thread has not
synchronized with completion of the dlopen, there is no way it can
meaningfully request access to a slot past the end of the old dtv,
which remains valid for accessing slots which already existed.
however, it is necessary to ensure that, if a thread sees its new dtv
pointer, it sees correct pointers in each of the slots that existed
prior to the dlopen. my understanding is that, on most real-world
coherency architectures including all the ones we presently support, a
built-in consume order guarantees this; however, don't rely on that.
instead, the SYS_membarrier syscall is used to ensure that all threads
see the stores to the slots of their new dtv prior to the installation
of the new dtv. if it is not supported, the same is implemented in
userspace via signals, using the same mechanism as __synccall.
the __tls_get_addr function, variants, and dynamic tlsdesc asm
functions are all updated to remove the fallback paths for claiming
new dynamic tls, and are now all branch-free.
Diffstat (limited to 'src/thread')
-rw-r--r-- | src/thread/__tls_get_addr.c | 7 | ||||
-rw-r--r-- | src/thread/i386/tls.s | 8 | ||||
-rw-r--r-- | src/thread/pthread_create.c | 2 |
3 files changed, 3 insertions, 14 deletions
diff --git a/src/thread/__tls_get_addr.c b/src/thread/__tls_get_addr.c index d7afdabd..19524fe0 100644 --- a/src/thread/__tls_get_addr.c +++ b/src/thread/__tls_get_addr.c @@ -1,12 +1,7 @@ -#include <stddef.h> #include "pthread_impl.h" void *__tls_get_addr(tls_mod_off_t *v) { pthread_t self = __pthread_self(); - if (v[0] <= self->dtv[0]) - return (void *)(self->dtv[v[0]] + v[1]); - return __tls_get_new(v); + return (void *)(self->dtv[v[0]] + v[1]); } - -weak_alias(__tls_get_addr, __tls_get_new); diff --git a/src/thread/i386/tls.s b/src/thread/i386/tls.s index 76d5d462..6e4c4cb9 100644 --- a/src/thread/i386/tls.s +++ b/src/thread/i386/tls.s @@ -4,14 +4,6 @@ ___tls_get_addr: mov %gs:4,%edx mov (%eax),%ecx - cmp %ecx,(%edx) - jc 1f mov 4(%eax),%eax add (%edx,%ecx,4),%eax ret -1: push %eax -.weak __tls_get_new -.hidden __tls_get_new - call __tls_get_new - pop %edx - ret diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c index cec82157..0142b347 100644 --- a/src/thread/pthread_create.c +++ b/src/thread/pthread_create.c @@ -15,6 +15,7 @@ weak_alias(dummy_0, __release_ptc); weak_alias(dummy_0, __pthread_tsd_run_dtors); weak_alias(dummy_0, __do_orphaned_stdio_locks); weak_alias(dummy_0, __dl_thread_cleanup); +weak_alias(dummy_0, __dl_prepare_for_threads); void __tl_lock(void) { @@ -235,6 +236,7 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att init_file_lock(__stderr_used); __syscall(SYS_rt_sigprocmask, SIG_UNBLOCK, SIGPT_SET, 0, _NSIG/8); self->tsd = (void **)__pthread_tsd_main; + __dl_prepare_for_threads(); libc.threaded = 1; } if (attrp && !c11) attr = *attrp; |