Age | Commit message (Collapse) | Author | Files | Lines |
|
In TLS variant I the TLS is above TP (or above a fixed offset from TP)
but on some targets there is a reserved gap above TP before TLS starts.
This matters for the local-exec tls access model when the offsets of
TLS variables from the TP are hard coded by the linker into the
executable, so the libc must compute these offsets the same way as the
linker. The tls offset of the main module has to be
alignup(GAP_ABOVE_TP, main_tls_align).
If there is no TLS in the main module then the gap can be ignored
since musl does not use it and the tls access models of shared
libraries are not affected.
The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0
(i.e. TLS did not require large alignment) because the gap was
treated as a fixed offset from TP. Now the TP points at the end
of the pthread struct (which is aligned) and there is a gap above
it (which may also need alignment).
The fix required changing TP_ADJ and __pthread_self on affected
targets (aarch64, arm and sh) and in the tlsdesc asm the offset to
access the dtv changed too.
|
|
previously, some accesses to the detached state (from pthread_join and
pthread_getattr_np) were unsynchronized; they were harmless in
programs with well-defined behavior, but ugly. other accesses (in
pthread_exit and pthread_detach) were synchronized by a poorly named
"exitlock", with an ad-hoc trylock operation on it open-coded in
pthread_detach, whose only purpose was establishing protocol for which
thread is responsible for deallocation of detached-thread resources.
instead, use an atomic detach_state and unify it with the futex used
to wait for thread exit. this eliminates 2 members from the pthread
structure, gets rid of the hackish lock usage, and makes rigorous the
trap added in commit 80bf5952551c002cf12d96deb145629765272db0 for
catching attempts to join detached threads. it should also make
attempt to detach an already-detached thread reliably trap.
|
|
the tid field in the pthread structure is not volatile, and really
shouldn't be, so as not to limit the compiler's ability to reorder,
merge, or split loads in code paths that may be relevant to
performance (like controlling lock ownership).
however, use of objects which are not volatile or atomic with futex
wait is inherently broken, since the compiler is free to transform a
single load into multiple loads, thereby using a different value for
the controlling expression of the loop and the value passed to the
futex syscall, leading the syscall to block instead of returning.
reportedly glibc's pthread_join was actually affected by an equivalent
issue in glibc on s390.
add a separate, dedicated join_futex object for pthread_join to use.
|
|
it was reported by Erik Bosman that poll fails without setting revents
when the nfds argument exceeds the current value for RLIMIT_NOFILE,
causing the subsequent open calls to be bypassed. if the rlimit is
either 1 or 2, this leaves fd 0 and 1 potentially closed but openable
when the application code is reached.
based on a brief reading of the poll syscall documentation and code,
it may be possible for poll to fail under other attacker-controlled
conditions as well. if it turns out these are reasonable conditions
that may happen in the real world, we may have to go back and
implement fallbacks to probe each fd individually if poll fails, but
for now, keep things simple and treat all poll failures as fatal.
|
|
this is for consistency with the way it's done in in the dynamic
linker, avoiding a deprecated C feature (non-prototype function
types), and improving code generation. GCC unnecessarily uses the
variadic calling convention (e.g. clearing rax on x86_64) when making
a call where the argument types are not known for compatibility with
wrong code which calls variadic functions this way. (C on the other
hand is clear that such calls have undefined behavior.)
|
|
This aligns clearenv with the Linux man page by setting 'environ'
rather than '*environ' to NULL, and stops it from leaking entries
allocated by the libc.
|
|
Rewrite environment access functions to slim down code, fix bugs and
avoid invoking undefined behavior.
* avoid using int-typed iterators where size_t would be correct;
* use strncmp instead of memcmp consistently;
* tighten prologues by invoking __strchrnul;
* handle NULL environ.
putenv:
* handle "=value" input via unsetenv too (will return -1/EINVAL);
* rewrite and simplify __putenv; fix the leak caused by failure to
deallocate entry added by preceding setenv when called from putenv.
setenv:
* move management of libc-allocated entries to this translation unit,
and use no-op weak symbols in putenv/unsetenv;
unsetenv:
* rewrite; this fixes UB caused by testing a free'd pointer against
NULL on entry to subsequent loops.
Not changed:
Failure to extend allocation tracking array (previously __env_map, now
env_alloced) is ignored rather than causing to report -1/ENOMEM to the
caller; the worst-case consequence is leaking this allocation when it
is removed or replaced in a subsequent environment access.
Initially UB in unsetenv was reported by Alexander Cherepanov.
Using a weak alias to avoid pulling in malloc via unsetenv was
suggested by Rich Felker.
|
|
It is possible for argv[0] to be a null pointer, but the __progname
variable is used to implement functions in src/legacy/err.c that do not
expect it to be null. It is also available to the user via the
program_invocation_name alias as a GNU extension, and the implementation
in Glibc initializes it to a pointer to empty string rather than NULL.
Since argv[0] is usually non-null and it's preferable to keep those
variables in BSS, implement the fallbacks in __init_libc, which also
allows to have an intermediate fallback to AT_EXECFN.
|
|
the static-linked version of __init_tls needs to locate the TLS
initialization image via the ELF program headers, which requires
determining the base address at which the program was loaded. the
existing code attempted to do this by comparing the actual address of
the program headers (obtained via auxv) with the virtual address for
the PT_PHDR record in the program headers. however, the linker seems
to produce a PT_PHDR record only when a program interpreter (dynamic
linker) is used. thus the computation failed and used the default base
address of 0, leading to a crash when trying to access the TLS image
at the wrong address.
the dynamic linker entry point and static-PIE rcrt1.o startup code
compute the base address instead by taking the difference between the
run-time address of _DYNAMIC and the virtual address in the PT_DYNAMIC
record. this patch copies the approach they use, but with a weak
symbolic reference to _DYNAMIC instead of obtaining the address from
the crt_arch.h asm. this works because relocations have already been
performed at the time __init_tls is called.
|
|
This is the minimal fix for __putenv leaving a pointer to freed heap
storage in __env_map array, which could later on lead to errors such
as double-free.
|
|
commit ad1cd43a86645ba2d4f7c8747240452a349d6bc1 eliminated
preprocessor-level omission of references to the init/fini array
symbols from object files going into libc.so. the references are weak,
and the intent was that the linker would resolve them to zero in
libc.so, but instead it leaves undefined references that could be
satisfied at runtime. normally these references would be harmless,
since the code using them does not even get executed, but some older
binutils versions produce a linking error: when linking a program
against libc.so, ld first tries to use the hidden init/fini array
symbols produced by the linker script to satisfy the references in
libc.so, then produces an error because the definitions are hidden.
ideally ld would have already provided definitions of these symbols
when linking libc.so, but the linker script for -shared omits them.
to avoid this situation, the dynamic linker now provides its own dummy
definitions of the init/fini array symbols for libc.so. since they are
hidden, everything binds at ld time and no references remain in the
dynamic symbol table. with modern binutils and --gc-sections, both
the dummy empty array objects and the code referencing them get
dropped at link time, anyway.
the _init and _fini symbols are also switched back to using weak
definitions rather than weak references since the latter behave
somewhat problematically in general, and the weak definition approach
was known to work well.
|
|
this both allows removal of some of the main remaining uses of the
SHARED macro and clears one obstacle to static-linked dlopen support,
which may be added at some point in the future.
specialized single-TLS-module versions of __copy_tls and __reset_tls
are removed and replaced with code adapted from their dynamic-linked
versions, capable of operating on a whole chain of TLS modules, and
use of the dynamic linker's DSO chain (which contains large struct dso
objects) by these functions is replaced with a new chain of struct
tls_module objects containing only the information needed for
implementing TLS. this may also yield some performance benefit
initializing TLS for a new thread when a large number of modules
without TLS have been loaded, since since there is no need to walk
structures for modules without TLS.
|
|
use weak definitions that the dynamic linker can override instead of
preprocessor conditionals on SHARED so that the same libc start and
exit code can be used for both static and dynamic linking.
|
|
this is the first and simplest stage of removal of the SHARED macro,
which will eventually allow libc.a and libc.so to be produced from the
same object files.
the original motivation for these #ifdefs which are now being removed
was to allow building a static-only libc using a compiler that does
not support visibility. however, SHARED was the wrong condition to
test for this anyway; various assembly-language sources refer to
hidden symbols and declare them with the .hidden directive, making it
wrong to define the referenced symbols as non-hidden. if there is a
need in the future to build libc using compilers that lack visibility,
support could be moved to the build system or perhaps the __PIC__
macro could be checked instead of SHARED.
|
|
this change is needed to be compatible with fdpic, where some of the
main application's relocations may be performed as part of the crt1
entry point. if we call init functions before passing control, these
relocations will not yet have been performed, and the init code will
potentially make use of invalid pointers.
conceptually, no code provided by the application or third-party
libraries should run before the application entry point. the
difference is not observable to programs using the crt1 we provide,
but it could come into play if custom entry point code is used, so
it's better to be doing this right anyway.
|
|
this symbol is needed only on archs where the PLT call ABI is klunky,
and only for position-independent code compiled with stack protector.
thus references usually only appear in shared libraries or PIE
executables, but they can also appear when linking statically if some
of the object files being linked were built as PIC/PIE.
normally libssp_nonshared.a from the compiler toolchain should provide
__stack_chk_fail_local, but reportedly it appears prior to -lc in the
link order, thus failing to satisfy references from libc itself (which
arise only if libc.a was built as PIC/PIE with stack protector
enabled).
|
|
i386, x86_64, x32, and powerpc all use TLS for stack protector canary
values in the default stack protector ABI, but the location only
matched the ABI on i386 and x86_64. on x32, the expected location for
the canary contained the tid, thus producing spurious mismatches
(resulting in process termination) upon fork. on powerpc, the expected
location contained the stdio_locks list head, so returning from a
function after calling flockfile produced spurious mismatches. in both
cases, the random canary was not present, and a predictable value was
used instead, making the stack protector hardening much less effective
than it should be.
in the current fix, the thread structure has been expanded to have
canary fields at all three possible locations, and archs that use a
non-default location must define a macro in pthread_arch.h to choose
which location is used. for most archs (which lack TLS canary ABI) the
choice does not matter.
|
|
both static and dynamic linked versions of the __copy_tls function
have a hidden assumption that the alignment of the beginning or end of
the memory passed is suitable for storing an array of pointers for the
dtv. pthread_create satisfies this requirement except when
libc.tls_size is misaligned, which cannot happen with dynamic linking
due to way update_tls_size computes the total size, but could happen
with static linking and odd-sized TLS.
|
|
commit dab441aea240f3b7c18a26d2ef51979ea36c301c, which made thread
pointer init mandatory for all programs, rendered this store obsolete
by removing the early-return path for static programs with no TLS.
|
|
this slightly reduces the code size cost of TLS/thread-pointer for
static linking since __init_tp can be inlined into its only caller and
removed. this is analogous to the handling of __init_libc in
__libc_start_main, where the function only has external linkage when
it needs to be called from the dynamic linker.
|
|
these are used as hidden by asm files (and such use is the whole
reason they exist), but their actual definitions were not hidden.
|
|
part of the goal here is to eliminate use of the ATTR_LIBC_VISIBILITY
macro outside of libc.h, since it was never intended to be 'public'.
|
|
this was already essentially possible as a result of the previous
commits changing the dynamic linker/thread pointer bootstrap process.
this commit mainly adds build system infrastructure:
configure no longer attempts to disable stack protector. instead it
simply determines how so the makefile can disable stack protector for
a few translation units used during early startup.
stack protector is also disabled for memcpy and memset since compilers
(incorrectly) generate calls to them on some archs to implement
struct initialization and assignment, and such calls may creep into
early initialization.
no explicit attempt to enable stack protector is made by configure at
this time; any stack protector option supported by the compiler can be
passed to configure in CFLAGS, and if the compiler uses stack
protector by default, this default is respected.
|
|
since 1.1.0, musl has nominally required a thread pointer to be setup.
most of the remaining code that was checking for its availability was
doing so for the sake of being usable by the dynamic linker. as of
commit 71f099cb7db821c51d8f39dfac622c61e54d794c, this is no longer
necessary; the thread pointer is now valid before any libc code
(outside of dynamic linker bootstrap functions) runs.
this commit essentially concludes "phase 3" of the "transition path
for removing lazy init of thread pointer" project that began during
the 1.1.0 release cycle.
|
|
as a result of commit 12e1e324683a1d381b7f15dd36c99b37dd44d940, kernel
processing of the robust list is only needed for process-shared
mutexes. previously the first attempt to lock any owner-tracked mutex
resulted in robust list initialization and a set_robust_list syscall.
this is no longer necessary, and since the kernel's record of the
robust list must now be cleared at thread exit time for detached
threads, optimizing it out is more worthwhile than before too.
|
|
There are two main abi variants for thread local storage layout:
(1) TLS is above the thread pointer at a fixed offset and the pthread
struct is below that. So the end of the struct is at known offset.
(2) the thread pointer points to the pthread struct and TLS starts
below it. So the start of the struct is at known (zero) offset.
Assembly code for the dynamic TLSDESC callback needs to access the
dynamic thread vector (dtv) pointer which is currently at the front
of the pthread struct. So in case of (1) the asm code needs to hard
code the offset from the end of the struct which can easily break if
the struct changes.
This commit adds a copy of the dtv at the end of the struct. New members
must not be added after dtv_copy, only before it. The size of the struct
is increased a bit, but there is opportunity for size optimizations.
|
|
a conservative estimate of 4*sizeof(size_t) was used as the minimum
alignment for thread-local storage, despite the only requirements
being alignment suitable for struct pthread and void* (which struct
pthread already contains). additional alignment required by the
application or libraries is encoded in their headers and is already
applied.
over-alignment prevented the builtin_tls array from ever being used in
dynamic-linked programs on 64-bit archs, thereby requiring allocation
at startup even in programs with no TLS of their own.
|
|
C99 6.10.3p11 disallows such constructs
so use an #ifdef outside of the argument list of __syscall
|
|
the main motivation for this change is to remove the assumption that
the tid of the main thread is also the pid of the process. (the value
returned by the set_tid_address syscall was used to fill both fields
despite it semantically being the tid.) this is historically and
presently true on linux and unlikely to change, but it conceivably
could be false on other systems that otherwise reproduce the linux
syscall api/abi.
only a few parts of the code were actually still using the cached pid.
in a couple places (aio and synccall) it was a minor optimization to
avoid a syscall. caching could be reintroduced, but lazily as part of
the public getpid function rather than at program startup, if it's
deemed important for performance later. in other places (cancellation
and pthread_kill) the pid was completely unnecessary; the tkill
syscall can be used instead of tgkill. this is actually a rather
subtle issue, since tgkill is supposedly a solution to race conditions
that can affect use of tkill. however, as documented in the commit
message for commit 7779dbd2663269b465951189b4f43e70839bc073, tgkill
does not actually solve this race; it just limits it to happening
within one process rather than between processes. we use a lock that
avoids the race in pthread_kill, and the use in the cancellation
signal handler is self-targeted and thus not subject to tid reuse
races, so both are safe regardless of which syscall (tgkill or tkill)
is used.
|
|
this commit adds non-stub implementations of setlocale, duplocale,
newlocale, and uselocale, along with the data structures and minimal
code needed for representing the active locale on a per-thread basis
and optimizing the common case where thread-local locale settings are
not in use.
at this point, the data structures only contain what is necessary to
represent LC_CTYPE (a single flag) and LC_MESSAGES (a name for use in
finding message translation files). representation for the other
categories will be added later; the expectation is that a single
pointer will suffice for each.
for LC_CTYPE, the strings "C" and "POSIX" are treated as special; any
other string is accepted and treated as "C.UTF-8". for other
categories, any string is accepted after being truncated to a maximum
supported length (currently 15 bytes). for LC_MESSAGES, the name is
kept regardless of whether libc itself can use such a message
translation locale, since applications using catgets or gettext should
be able to use message locales libc is not aware of. for other
categories, names which are not successfully loaded as locales (which,
at present, means all names) are treated as aliases for "C". setlocale
never fails.
locale settings are not yet used anywhere, so this commit should have
no visible effects except for the contents of the string returned by
setlocale.
|
|
|
|
such separation serves multiple purposes:
- by having the common path for __tls_get_addr alone in its own
function with a tail call to the slow case, code generation is
greatly improved.
- by having __tls_get_addr in it own file, it can be replaced on a
per-arch basis as needed, for optimization or ABI-specific purposes.
- by removing __tls_get_addr from __init_tls.c, a few bytes of code
are shaved off of static binaries (which are unlikely to use this
function unless the linker messed up).
|
|
the motivation for the errno_ptr field in the thread structure, which
this commit removes, was to allow the main thread's errno to keep its
address when lazy thread pointer initialization was used. &errno was
evaluated prior to setting up the thread pointer and stored in
errno_ptr for the main thread; subsequently created threads would have
errno_ptr pointing to their own errno_val in the thread structure.
since lazy initialization was removed, there is no need for this extra
level of indirection; __errno_location can simply return the address
of the thread's errno_val directly. this does cause &errno to change,
but the change happens before entry to application code, and thus is
not observable.
|
|
such kernels cannot support threads, but the thread pointer is also
important for other purposes, most notably stack protector. without a
valid thread pointer, all code compiled with stack protector will
crash. the same applies to any use of thread-local storage by
applications or libraries.
the concept of this patch is to fall back to using the modify_ldt
syscall, which has been around since linux 1.0, to setup the gs
segment register. since the kernel does not have a way to
automatically assign ldt entries, use of slot zero is hard-coded. if
this fallback path is used, __set_thread_area returns a positive value
(rather than the usual zero for success, or negative for error)
indicating to the caller that the thread pointer was successfully set,
but only for the main thread, and that thread creation will not work
properly. the code in __init_tp has been changed accordingly to record
this result for later use by pthread_create.
|
|
such archs are expected to omit definitions of the SYS_* macros for
syscalls their kernels lack from arch/$ARCH/bits/syscall.h. the
preprocessor is then able to select the an appropriate implementation
for affected functions. two basic strategies are used on a
case-by-case basis:
where the old syscalls correspond to deprecated library-level
functions, the deprecated functions have been converted to wrappers
for the modern function, and the modern function has fallback code
(omitted at the preprocessor level on new archs) to make use of the
old syscalls if the new syscall fails with ENOSYS. this also improves
functionality on older kernels and eliminates the incentive to program
with deprecated library-level functions for the sake of compatibility
with older kernels.
in other situations where the old syscalls correspond to library-level
functions which are not deprecated but merely lack some new features,
such as the *at functions, the old syscalls are still used on archs
which support them. this may change at some point in the future if or
when fallback code is added to the new functions to make them usable
(possibly with reduced functionality) on old kernels.
|
|
open is handled specially because it is used from so many places, in
so many variants (2 or 3 arguments, setting errno or not, and
cancellable or not). trying to do it as a function would not only
increase bloat, but would also risk subtle breakage.
this is the first step towards supporting "new" archs where linux
lacks "old" syscalls.
|
|
being static allows it to be inlined in __libc_start_main; inlining
should take place at all levels since the function is called exactly
once. this further reduces mandatory startup code size for static
binaries.
|
|
there is no reason (and seemingly there never was any) for
__init_security to be its own function. it's linked unconditionally
so it can just be placed inline in __init_libc.
|
|
moving the call to __init_ssp from __init_security to __init_libc
makes __init_security a leaf function, which allows the compiler to
make it smaller. __init_libc is already non-leaf, and the additional
call makes no difference to the amount of register spillage.
in addition, it really made no sense for the call to __init_ssp to be
buried inside __init_security rather than parallel with other init
functions.
|
|
|
|
the function itself was static, but the weak alias provided an
externally visible reference and thus prevented the dead code from
being omitted from the output. so this change actually reduces bloat
in mandatory static-linked code.
|
|
now that thread pointer is initialized always, ssp canary
initialization can be done unconditionally. this simplifies
the ldso as it does not try to detect ssp usage, and the
init function itself as it is always called exactly once.
this also merges ssp init path for shared and static linking.
|
|
this is the first step in an overhaul aimed at greatly simplifying and
optimizing everything dealing with thread-local state.
previously, the thread pointer was initialized lazily on first access,
or at program startup if stack protector was in use, or at certain
random places where inconsistent state could be reached if it were not
initialized early. while believed to be fully correct, the logic was
fragile and non-obvious.
in the first phase of the thread pointer overhaul, support is retained
(and in some cases improved) for systems/situation where loading the
thread pointer fails, e.g. old kernels.
some notes on specific changes:
- the confusing use of libc.main_thread as an indicator that the
thread pointer is initialized is eliminated in favor of an explicit
has_thread_pointer predicate.
- sigaction no longer needs to ensure that the thread pointer is
initialized before installing a signal handler (this was needed to
prevent a situation where the signal handler caused the thread
pointer to be initialized and the subsequent sigreturn cleared it
again) but it still needs to ensure that implementation-internal
thread-related signals are not blocked.
- pthread tsd initialization for the main thread is deferred in a new
manner to minimize bloat in the static-linked __init_tp code.
- pthread_setcancelstate no longer needs special handling for the
situation before the thread pointer is initialized. it simply fails
on systems that cannot support a thread pointer, which are
non-conforming anyway.
- pthread_cleanup_push/pop now check for missing thread pointer and
nop themselves out in this case, so stdio no longer needs to avoid
the cancellable path when the thread pointer is not available.
a number of cases remain where certain interfaces may crash if the
system does not support a thread pointer. at this point, these should
be limited to pthread interfaces, and the number of such cases should
be fewer than before.
|
|
the external mmap function is heavy because it has to handle error
reporting that the kernel cannot do, and has to do some locking for
arcane race-condition-avoidance purposes. for allocating initial TLS,
we do not need any of that; the raw syscall suffices.
on i386, this change shaves off 13% of the size of .text for the empty
program.
|
|
|
|
|
|
|
|
PAGE_SIZE was hardcoded to 4096, which is historically what most
systems use, but on several archs it is a kernel config parameter,
user space can only know it at execution time from the aux vector.
PAGE_SIZE and PAGESIZE are not defined on archs where page size is
a runtime parameter, applications should use sysconf(_SC_PAGE_SIZE)
to query it. Internally libc code defines PAGE_SIZE to libc.page_size,
which is set to aux[AT_PAGESZ] in __init_libc and early in __dynlink
as well. (Note that libc.page_size can be accessed without GOT, ie.
before relocations are done)
Some fpathconf settings are hardcoded to 4096, these should be actually
queried from the filesystem using statfs.
|
|
this is needed for reused threads in the SIGEV_THREAD timer
notification system, and could be reused elsewhere in the future if
needed, though it should be refactored for such use.
for static linking, __init_tls.c is simply modified to export the TLS
info in a structure with external linkage, rather than using statics.
this perhaps makes the code more clear, since the statics were poorly
named for statics. the new __reset_tls.c is only linked if it is used.
for dynamic linking, the code is in dynlink.c. sharing code with
__copy_tls is not practical since __reset_tls must also re-zero
thread-local bss.
|
|
these functions were mistakenly assumed to be needed to match glibc
ABI, but glibc has them as part of the non-shared part of libc that's
always statically linked into the main program. moreover, the only
place they are referenced from is glibc's crt1.o.
|