musl - The musl libc tree (WIP / dev branches)

Age	Commit message (Collapse)	Author	Files	Lines
2015-05-25	mark mips cancellable syscall code as code	Rich Felker	1	-0/+3
	otherwise disassemblers treat it as data.
2015-05-16	eliminate costly tricks to avoid TLS access for current locale state	Rich Felker	1	-6/+0
	the code being removed used atomics to track whether any threads might be using a locale other than the current global locale, and whether any threads might have abstract 8-bit (non-UTF-8) LC_CTYPE active, a feature which was never committed (still pending). the motivations were to support early execution prior to setup of the thread pointer, to partially support systems (ancient kernels) where thread pointer setup is not possible, and to avoid high performance cost on archs where accessing the thread pointer may be very slow. since commit 19a1fe670acb3ab9ead0fe31859ca7d4fe40dd54, the thread pointer is always available, so these hacks are no longer needed. removing them greatly simplifies the affected code.
2015-05-16	in i386 __set_thread_area, don't assume %gs register is initially zero	Rich Felker	1	-4/+9
	commit f630df09b1fd954eda16e2f779da0b5ecc9d80d3 added logic to handle the case where __set_thread_area is called more than once by reusing the GDT slot already in the %gs register, and only setting up a new GDT slot when %gs is zero. this created a hidden assumption that %gs is zero when a new process image starts, which is true in practice on Linux, but does not seem to be documented ABI, and fails to hold under qemu app-level emulation. while it would in theory be possible to zero %gs in the entry point code, this code is shared between static and dynamic binaries, and dynamic binaries must not clobber the value of %gs already setup by the dynamic linker. the alternative solution implemented in this commit simply uses global data to store the GDT index that's selected. __set_thread_area should only be called in the initial thread anyway (subsequent threads get their thread pointer setup by __clone), but even if it were called by another thread, it would simply read and write back the same GDT index that was already assigned to the initial thread, and thus (in the x86 memory model) there is no data race.
2015-05-06	fix stack protector crashes on x32 & powerpc due to misplaced TLS canary	Rich Felker	1	-1/+1
	i386, x86_64, x32, and powerpc all use TLS for stack protector canary values in the default stack protector ABI, but the location only matched the ABI on i386 and x86_64. on x32, the expected location for the canary contained the tid, thus producing spurious mismatches (resulting in process termination) upon fork. on powerpc, the expected location contained the stdio_locks list head, so returning from a function after calling flockfile produced spurious mismatches. in both cases, the random canary was not present, and a predictable value was used instead, making the stack protector hardening much less effective than it should be. in the current fix, the thread structure has been expanded to have canary fields at all three possible locations, and archs that use a non-default location must define a macro in pthread_arch.h to choose which location is used. for most archs (which lack TLS canary ABI) the choice does not matter.
2015-05-02	fix x32 __set_thread_area failure due to junk in upper bits	Rich Felker	1	-1/+1
	the kernel does not properly clear the upper bits of the syscall argument, so we have to do it before the syscall.
2015-04-22	minor optimization to pthread_spin_trylock	Rich Felker	2	-2/+4
	use CAS instead of swap since it's lighter for most archs, and keep EBUSY in the lock value so that the old value obtained by CAS can be used directly as the return value for pthread_spin_trylock.
2015-04-22	optimize spin lock not to dirty cache line while spinning	Rich Felker	1	-1/+1

2015-04-21	fix mmap leak in sem_open failure path for link call	Rich Felker	1	-0/+1
	the leak was found by static analysis (reported by Alexander Monakov), not tested/observed, but seems to have occured both when failing due to O_EXCL, and in a race condition with O_CREAT but not O_EXCL where a semaphore by the same name was created concurrently.
2015-04-18	make dlerror state and message thread-local and dynamically-allocated	Rich Felker	1	-0/+2
	this fixes truncation of error messages containing long pathnames or symbol names. the dlerror state was previously required by POSIX to be global. the resolution of bug 97 relaxed the requirements to allow thread-safe implementations of dlerror with thread-local state and message buffer.
2015-04-17	fix sh build regressions in asm	Rich Felker	1	-1/+1
	even hidden functions need @PLT symbol references; otherwise an absolute address is produced instead of a PC-relative one.
2015-04-17	fix sh __set_thread_area uninitialized return value	Rich Felker	1	-1/+2
	this caused the dynamic linker/startup code to abort when r0 happened to contain a negative value.
2015-04-14	use hidden __tls_get_new for tls/tlsdesc lookup fallback cases	Rich Felker	1	-1/+3
	previously, the dynamic tlsdesc lookup functions and the i386 special-ABI ___tls_get_addr (3 underscores) function called __tls_get_addr when the slot they wanted was not already setup; __tls_get_addr would then in turn also see that it's not setup and call __tls_get_new. calling __tls_get_new directly is both more efficient and avoids the issue of calling a non-hidden (public API/ABI) function from asm. for the special i386 function, a weak reference to __tls_get_new is used since this function is not defined when static linking (the code path that needs it is unreachable in static-linked programs).
2015-04-14	cleanup use of visibility attributes in pthread_cancel.c	Rich Felker	1	-8/+9
	applying the attribute to a weak_alias macro was a hack. instead use a separate declaration to apply the visibility, and consolidate declarations together to avoid having visibility mess all over the file.
2015-04-14	fix inconsistent visibility for internal syscall symbols	Rich Felker	1	-0/+5

2015-04-14	consistently use hidden visibility for cancellable syscall internals	Rich Felker	11	-30/+96
	in a few places, non-hidden symbols were referenced from asm in ways that assumed ld-time binding. while these is no semantic reason these symbols need to be hidden, fixing the references without making them hidden was going to be ugly, and hidden reduces some bloat anyway. in the asm files, .global/.hidden directives have been moved to the top to unclutter the actual code.
2015-04-14	fix inconsistent visibility for internal __tls_get_new function	Rich Felker	1	-3/+2
	at the point of call it was declared hidden, but the definition was not hidden. for some toolchains this inconsistency produced textrels without ld-time binding.
2015-04-13	remove remnants of support for running in no-thread-pointer mode	Rich Felker	4	-11/+5
	since 1.1.0, musl has nominally required a thread pointer to be setup. most of the remaining code that was checking for its availability was doing so for the sake of being usable by the dynamic linker. as of commit 71f099cb7db821c51d8f39dfac622c61e54d794c, this is no longer necessary; the thread pointer is now valid before any libc code (outside of dynamic linker bootstrap functions) runs. this commit essentially concludes "phase 3" of the "transition path for removing lazy init of thread pointer" project that began during the 1.1.0 release cycle.
2015-04-13	allow i386 __set_thread_area to be called more than once	Rich Felker	1	-1/+5
	previously a new GDT slot was requested, even if one had already been obtained by a previous call. instead extract the old slot number from GS and reuse it if it was already set. the formula (GS-3)/8 for the slot number automatically yields -1 (request for new slot) if GS is zero (unset).
2015-04-11	remove mismatched arguments from vmlock function definitions	Rich Felker	1	-2/+2
	commit f08ab9e61a147630497198fe3239149275c0a3f4 introduced these accidentally as remnants of some work I tried that did not work out.
2015-04-10	apply vmlock wait to __unmapself in pthread_exit	Rich Felker	1	-0/+4

2015-04-10	redesign and simplify vmlock system	Rich Felker	5	-30/+18
	this global lock allows certain unlock-type primitives to exclude mmap/munmap operations which could change the identity of virtual addresses while references to them still exist. the original design mistakenly assumed mmap/munmap would conversely need to exclude the same operations which exclude mmap/munmap, so the vmlock was implemented as a sort of 'symmetric recursive rwlock'. this turned out to be unnecessary. commit 25d12fc0fc51f1fae0f85b4649a6463eb805aa8f already shortened the interval during which mmap/munmap held their side of the lock, but left the inappropriate lock design and some inefficiency. the new design uses a separate function, __vm_wait, which does not hold any lock itself and only waits for lock users which were already present when it was called to release the lock. this is sufficient because of the way operations that need to be excluded are sequenced: the "unlock-type" operations using the vmlock need only block mmap/munmap operations that are precipitated by (and thus sequenced after) the atomic-unlock they perform while holding the vmlock. this allows for a spectacular lack of synchronization in the __vm_wait function itself.
2015-04-10	optimize out setting up robust list with kernel when not needed	Rich Felker	2	-6/+5
	as a result of commit 12e1e324683a1d381b7f15dd36c99b37dd44d940, kernel processing of the robust list is only needed for process-shared mutexes. previously the first attempt to lock any owner-tracked mutex resulted in robust list initialization and a set_robust_list syscall. this is no longer necessary, and since the kernel's record of the robust list must now be cleared at thread exit time for detached threads, optimizing it out is more worthwhile than before too.
2015-04-10	process robust list in pthread_exit to fix detached thread use-after-unmap	Rich Felker	2	-26/+27
	the robust list head lies in the thread structure, which is unmapped before exit for detached threads. this leaves the kernel unable to process the exiting thread's robust list, and with a dangling pointer which may happen to point to new unrelated data at the time the kernel processes it. userspace processing of the robust list was already needed for non-pshared robust mutexes in order to perform private futex wakes rather than the shared ones the kernel would do, but it was conditional on linking pthread_mutexattr_setrobust and did not bother processing the pshared mutexes in the list, which requires additional logic for the robust list pending slot in case pthread_exit is interrupted by asynchronous process termination. the new robust list processing code is linked unconditionally (inlined in pthread_exit), handles both private and shared mutexes, and also removes the kernel's reference to the robust list before unmapping and exit if the exiting thread is detached.
2015-03-16	block all signals (even internal ones) in cancellation signal handler	Rich Felker	1	-1/+2
	previously the implementation-internal signal used for multithreaded set*id operations was left unblocked during handling of the cancellation signal. however, on some archs, signal contexts are huge (up to 5k) and the possibility of nested signal handlers drastically increases the minimum stack requirement. since the cancellation signal handler will do its job and return in bounded time before possibly passing execution to application code, there is no need to allow other signals to interrupt it.
2015-03-11	add aarch64 port	Szabolcs Nagy	4	-0/+69
	This adds complete aarch64 target support including bigendian subarch. Some of the long double math functions are known to be broken otherwise interfaces should be fully functional, but at this point consider this port experimental. Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
2015-03-07	fix regression in pthread_cond_wait with cancellation disabled	Rich Felker	1	-0/+1
	due to a logic error in the use of masked cancellation mode, pthread_cond_wait did not honor PTHREAD_CANCEL_DISABLE but instead failed with ECANCELED when cancellation was pending.
2015-03-04	fix signed left-shift overflow in pthread_condattr_setpshared	Rich Felker	1	-1/+1

2015-03-03	make all objects used with atomic operations volatile	Rich Felker	9	-16/+18
	the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,p,f(p)); where the latter is intended to show multiple loads of p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.
2015-03-02	suppress masked cancellation in pthread_join	Rich Felker	1	-1/+5
	like close, pthread_join is a resource-deallocation function which is also a cancellation point. the intent of masked cancellation mode is to exempt such functions from failure with ECANCELED.
2015-03-02	fix namespace issue in pthread_join affecting thrd_join	Rich Felker	1	-1/+2
	pthread_testcancel is not in the ISO C reserved namespace and thus cannot be used here. use the namespace-protected version of the function instead.
2015-03-02	factor cancellation cleanup push/pop out of futex __timedwait function	Rich Felker	7	-24/+21
	previously, the __timedwait function was optionally a cancellation point depending on whether it was passed a pointer to a cleaup function and context to register. as of now, only one caller actually used such a cleanup function (and it may face removal soon); most callers either passed a null pointer to disable cancellation or a dummy cleanup function. now, __timedwait is never a cancellation point, and __timedwait_cp is the cancellable version. this makes the intent of the calling code more obvious and avoids ugly dummy functions and long argument lists.
2015-02-27	fix failure of internal futex __timedwait to report ECANCELED	Rich Felker	1	-1/+1
	as part of abstracting the futex wait, this function suppresses all futex error values which callers should not see using a whitelist approach. when the masked cancellation mode was added, the new ECANCELED error was not whitelisted. this omission caused the new pthread_cond_wait code using masked cancellation to exhibit a spurious wake (rather than acting on cancellation) when the request arrived after blocking on the cond var.
2015-02-23	fix breakage in pthread_cond_wait due to typo	Rich Felker	1	-1/+1
	due to accidental use of = instead of ==, the error code was always set to zero in the signaled wake case for non-shared cv waits. suppressing ETIMEDOUT (the only possible wait error) is harmless and actually permitted in this case, but suppressing mutex errors could give the caller false information about the state of the mutex. commit 8741ffe625363a553e8f509dc3ca7b071bdbab47 introduced this regression and commit d9da1fb8c592469431c764732d09f7756340190e preserved it when reorganizing the code.
2015-02-22	simplify cond var code now that cleanup handler is not needed	Rich Felker	1	-86/+63

2015-02-22	fix pthread_cond_wait cancellation race	Rich Felker	1	-5/+38
	it's possible that signaling a waiter races with cancellation of that same waiter. previously, cancellation was acted upon, causing the signal to be consumed with no waiter returning. by using the new masked cancellation state, it's possible to refuse to act on the cancellation request and instead leave it pending. to ease review and understanding of the changes made, this commit leaves the unwait function, which was previously the cancellation cleanup handler, in place. additional simplifications could be made by removing it.
2015-02-21	add new masked cancellation mode	Rich Felker	2	-10/+16
	this is a new extension which is presently intended only for experimental and internal libc use. interface and behavior details may change subject to feedback and experience from using it internally. the basic concept for the new PTHREAD_CANCEL_MASKED state is that the first cancellation point to observe the cancellation request fails with an errno value of ECANCELED rather than acting on cancellation, allowing the caller to process the status and choose whether/how to act upon it.
2015-02-20	prepare cancellation syscall asm for possibility of __cancel returning	Rich Felker	5	-11/+32

2015-02-16	make pthread_exit responsible for disabling cancellation	Rich Felker	2	-3/+2
	this requirement is tucked away in XSH 2.9.5 Thread Cancellation under the heading Thread Cancellation Cleanup Handlers.
2015-02-09	use the internal macro name FUTEX_PRIVATE in __wait	Szabolcs Nagy	1	-1/+1
	the name was recently added for the setxid/synccall rework, so use the name now that we have it.
2015-02-03	fix missing memory barrier in cancellation signal handler	Rich Felker	1	-0/+1
	in practice this was probably a non-issue, because the necessary barrier almost certainly exists in kernel space -- implementing signal delivery without such a barrier seems impossible -- but for the sake of correctness, it should be done here too. in principle, without a barrier, it is possible that the thread to be cancelled does not see the store of its cancellation flag performed by another thread. this affects both the case where the signal arrives before entering the critical program counter range from __cp_begin to __cp_end (in which case both the signal handler and the inline check fail to see the value which was already stored) and the case where the signal arrives during the critical range (in which case the signal handler should be responsible for cancellation, but when it does not see the cancellation flag, it assumes the signal is spurious and refuses to act on it). in the fix, the barrier is placed only in the signal handler, not in the inline check at the beginning of the critical program counter range. if the signal handler runs before the critical range is entered, it will of course take no action, but its barrier will ensure that the inline check subsequently sees the store. if on the other hand the inline check runs first, it may miss seeing the store, but the subsequent signal handler in the critical range will act upon the cancellation request. this strategy avoids adding a memory barrier in the common, non-cancellation code path.
2015-01-15	overhaul __synccall and fix AS-safety and other issues in set*id	Rich Felker	2	-45/+138
	multi-threaded setid and setrlimit use the internal __synccall function to work around the kernel's wrongful treatment of these process properties as thread-local. the old implementation of __synccall failed to be AS-safe, despite POSIX requiring setuid and setgid to be AS-safe, and was not rigorous in assuring that all threads were caught. in a worst case, threads late in the process of exiting could retain permissions after setuid reported success, in which case attacks to regain dropped permissions may have been possible under the right conditions. the new implementation of __synccall depends on the presence of /proc/self/task and will fail if it can't be opened, but is able to determine that it has caught all threads, and does not use any locks except its own. it thereby achieves AS-safety simply by blocking signals to preclude re-entry in the same thread. with this commit, all known conformance and safety issues in setid functions should be fixed.
2015-01-15	suppress EINTR in sem_wait and sem_timedwait	Rich Felker	1	-1/+1
	per POSIX, the EINTR condition is an optional error for these functions, not a mandatory one. since old kernels (pre-2.6.22) failed to honor SA_RESTART for the futex syscall, it's dangerous to trust EINTR from the kernel. thankfully POSIX offers an easy way out.
2014-11-22	fix __aeabi_read_tp oversight in arm atomics/tls overhaul	Rich Felker	1	-4/+0
	calls to __aeabi_read_tp may be generated by the compiler to access TLS on pre-v6 targets. previously, this function was hard-coded to call the kuser helper, which would crash on kernels with kuser helper removed. to fix the problem most efficiently, the definition of __aeabi_read_tp is moved so that it's an alias for the new __a_gettp. however, on v7+ targets, code to initialize the runtime choice of thread-pointer loading code is not even compiled, meaning that defining __aeabi_read_tp would have caused an immediate crash due to using the default implementation of __a_gettp with a HCF instruction. fortunately there is an elegant solution which reduces overall code size: putting the native thread-pointer loading instruction in the default code path for __a_gettp, so that separate default/native code paths are not needed. this function should never be called before __set_thread_area anyway, and if it is called early on pre-v6 hardware, the old behavior (crashing) is maintained. ideally __aeabi_read_tp would not be called at all on v7+ targets anyway -- in fact, prior to the overhaul, the same problem existed, but it was never caught by users building for v7+ with kuser disabled. however, it's possible for calls to __aeabi_read_tp to end up in a v7+ binary if some of the object files were built for pre-v7 targets, e.g. in the case of static libraries that were built separately, so this case needs to be handled.
2014-11-19	overhaul ARM atomics/tls for performance and compatibility	Rich Felker	1	-12/+1
	previously, builds for pre-armv6 targets hard-coded use of the "kuser helper" system for atomics and thread-pointer access, resulting in binaries that fail to run (crash) on systems where this functionality has been disabled (as a security/hardening measure) in the kernel. additionally, builds for armv6 hard-coded an outdated/deprecated memory barrier instruction which may require emulation (extremely slow) on future models. this overhaul replaces the behavior for all pre-armv7 builds (both of the above cases) to perform runtime detection of the appropriate mechanisms for barrier, atomic compare-and-swap, and thread pointer access. detection is based on information provided by the kernel in auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture version encoded in AT_PLATFORM. direct use of the instructions is preferred when possible, since probing for the existence of the kuser helper page would be difficult and would incur runtime cost. for builds targeting armv7 or later, the runtime detection code is not compiled at all, and much more efficient versions of the non-cas atomic operations are provided by using ldrex/strex directly rather than wrapping cas.
2014-10-20	manually "shrink wrap" fast path in pthread_once	Rich Felker	1	-8/+12
	this change is a workaround for the inability of current compilers to perform "shrink wrapping" optimizations. in casual testing, it roughly doubled the performance of pthread_once when called on an already-finished once control object.
2014-10-13	eliminate global waiters count in pthread_once	Rich Felker	1	-9/+13

2014-10-10	fix missing barrier in pthread_once/call_once shortcut path	Rich Felker	1	-2/+6
	these functions need to be fast when the init routine has already run, since they may be called very often from code which depends on global initialization having taken place. as such, a fast path bypassing atomic cas on the once control object was used to avoid heavy memory contention. however, on archs with weakly ordered memory, the fast path failed to ensure that the caller actually observes the side effects of the init routine. preliminary performance testing showed that simply removing the fast path was not practical; a performance drop of roughly 85x was observed with 20 threads hammering the same once control on a 24-core machine. so the new explicit barrier operation from atomic.h is used to retain the fast path while ensuring memory visibility. performance may be reduced on some archs where the barrier actually makes a difference, but the previous behavior was unsafe and incorrect on these archs. future improvements to the implementation of a_barrier should reduce the impact.
2014-09-07	add C11 thread creation and related thread functions	Rich Felker	9	-7/+82
	based on patch by Jens Gustedt. the main difficulty here is handling the difference between start function signatures and thread return types for C11 threads versus POSIX threads. pointers to void are assumed to be able to represent faithfully all values of int. the function pointer for the thread start function is cast to an incorrect type for passing through pthread_create, but is cast back to its correct type before calling so that the behavior of the call is well-defined. changes to the existing threads implementation were kept minimal to reduce the risk of regressions, and duplication of code that carries implementation-specific assumptions was avoided for ease and safety of future maintenance.
2014-09-06	add C11 condition variable functions	Jens Gustedt	6	-0/+57
	Because of the clear separation for private pthread_cond_t these interfaces are quite simple and direct.
2014-09-06	add C11 mutex functions	Jens Gustedt	6	-0/+69