summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)AuthorFilesLines
2012-10-25use explicit visibility to optimize a few hot-path function callsRich Felker3-11/+13
on x86 and some other archs, functions which make function calls which might go through a PLT incur a significant overhead cost loading the GOT register prior to making the call. this load is utterly useless in musl, since all calls are bound at library-creation time using -Bsymbolic-functions, but the compiler has no way of knowing this, and attempts to set the default visibility to protected have failed due to bugs in GCC and binutils. this commit simply manually assigns hidden/protected visibility, as appropriate, to a few internal-use-only functions which have many callers, or which have callers that are hot paths like getc/putc. it shaves about 5k off the i386 libc.so with -Os. many of the improvements are in syscall wrappers, where the benefit is just size and performance improvement is unmeasurable noise amid the syscall overhead. however, stdio may be measurably faster. if in the future there are toolchains that can do the same thing globally without introducing linking bugs, it might be worth considering removing these workarounds.
2012-10-24correct locking in stdio functions that tried to be lock-freeRich Felker6-16/+36
these functions must behave as if they obtain the lock via flockfile to satisfy POSIX requirements. since another thread can provably hold the lock when they are called, they must wait to obtain the lock before they can return, even if the correct return value could be obtained without locking. in the case of fclose and freopen, failure to do so could cause correct (albeit obscure) programs to crash or otherwise misbehave; in the case of feof, ferror, and fwide, failure to obtain the lock could sometimes return incorrect results. in any case, having these functions proceed and return while another thread held the lock was wrong.
2012-10-24greatly improve freopen behaviorRich Felker5-17/+41
1. don't open /dev/null just as a basis to copy flags; use shared __fmodeflags function to get the right file flags for the mode. 2. handle the case (probably invalid, but whatever) case where the original stream's file descriptor was closed; previously, the logic re-closed it. 3. accept the "e" mode flag for close-on-exec; update dup3 to fallback to using dup2 so we can simply call __dup3 instead of putting fallback logic in freopen itself.
2012-10-24remove useless failure-check from freopen (can't happen)Rich Felker1-2/+2
2012-10-22simplify logic in stpcpy; avoid copying first aligned byte twiceRich Felker1-4/+4
gcc seems to be generating identical or near-identical code for both versions, but the newer code is more expressive of what it's doing.
2012-10-21as an extension, have putenv("VAR") behave as unsetenv("VAR")Rich Felker1-5/+5
the behavior of putenv is left undefined if the argument does not contain an equal sign, but traditional implementations behave this way and gnulib replaces putenv if it doesn't do this.
2012-10-21accept "nan(n-char-sequence)" in strtod/scanf functionsRich Felker1-1/+19
this will prevent gnulib from wrapping our strtod to handle this useless feature.
2012-10-21fix copy/paste error in popen changes that broke signalsRich Felker1-1/+1
signal mask was not being restored after fork, but instead blocked again.
2012-10-19support looking up thread-local objects with dlsymRich Felker1-0/+6
2012-10-19fix breakage in dlsym for looking up RTLD_DEFAULT, etc.Rich Felker1-2/+5
this was broken during the early dynamic-linked TLS commits, which rearranged some of the code for handling new relocation types.
2012-10-19fix usage of locks with vforkRich Felker3-3/+4
__release_ptc() is only valid in the parent; if it's performed in the child, the lock will be unlocked early then double-unlocked later, corrupting the lock state.
2012-10-19fix crashes in static-linked multithreaded programs without TLSRich Felker1-0/+2
2012-10-19fix order of syscall args for microblaze clone syscallRich Felker1-3/+2
with this commit, based on testing with patches to qemu which are not yet upstream,
2012-10-18ensure microblaze __set_thread_area returns successRich Felker1-1/+2
since it did not set the return-value register, the caller could wrongly interpret this as failure.
2012-10-18avoid raising spurious division-by-zero exception in printfRich Felker1-1/+1
2012-10-18floating point environment/exceptions support for mipsRich Felker1-0/+60
2012-10-18fix parent-memory-clobber in posix_spawn (environ)Rich Felker3-9/+17
2012-10-18overhaul system() and popen() to use vfork; fix various related bugsRich Felker4-56/+110
since we target systems without overcommit, special care should be taken that system() and popen(), like posix_spawn(), do not fail in processes whose commit charges are too high to allow ordinary forking. this in turn requires special precautions to ensure that the parent process's signal handlers do not end up running in the shared-memory child, where they could corrupt the state of the parent process. popen has also been updated to use pipe2, so it does not have a fd-leak race in multi-threaded programs. since pipe2 is missing on older kernels, (non-atomic) emulation has been added. some silly bugs in the old code should be gone too.
2012-10-18fix (hopefully; untested) completely broken/incomplete microblaze sigsetjmpRich Felker1-3/+12
2012-10-17fix microblaze asm relocations for shared libcRich Felker4-6/+6
only @PLT relocations are considered functions for purposes of -Bsymbolic-functions, so always use @PLT. it should not hurt in the static-linked case.
2012-10-15add memmem function (gnu extension)Rich Felker1-0/+148
based on strstr. passes gnulib tests and a few quick checks of my own.
2012-10-15add support for TLS variant I, presently needed for arm and mipsRich Felker5-6/+46
despite documentation that makes it sound a lot different, the only ABI-constraint difference between TLS variants II and I seems to be that variant II stores the initial TLS segment immediately below the thread pointer (i.e. the thread pointer points to the end of it) and variant I stores the initial TLS segment above the thread pointer, requiring the thread descriptor to be stored below. the actual value stored in the thread pointer register also tends to have per-arch random offsets applied to it for silly micro-optimization purposes. with these changes applied, TLS should be basically working on all supported archs except microblaze. I'm still working on getting the necessary information and a working toolchain that can build TLS binaries for microblaze, but in theory, static-linked programs with TLS and dynamic-linked programs where only the main executable uses TLS should already work on microblaze. alignment constraints have not yet been heavily tested, so it's possible that this code does not always align TLS segments correctly on archs that need TLS variant I.
2012-10-15block uid/gid changes during posix_spawnRich Felker1-0/+10
usage of vfork creates a situation where a process of lower privilege may momentarily have write access to the memory of a process of higher privilege. consider the case of a multi-threaded suid program which is calling posix_spawn in one thread while another thread drops the elevated privileges then runs untrusted (relative to the elevated privilege) code as the original invoking user. this untrusted code can then potentially modify the data the child process will use before calling exec, for example changing the pathname or arguments that will be passed to exec. note that if vfork is implemented as fork, the lock will not be held until the child execs, but since memory is not shared it does not matter.
2012-10-14fix overlap of thread stacks with thread tls segmentsRich Felker1-2/+1
2012-10-14fix main program TLS alignment for dynamic-linked programsRich Felker1-6/+5
this change brings the behavior in line with the static-linked code, which seems to be correct.
2012-10-13workaround broken hidden-visibility handling in pccRich Felker1-1/+1
with this change, pcc-built musl libc.so seems to work correctly. the problem is that pcc generates GOT lookups for external-linkage symbols even if they are hidden, rather than using GOT-relative addressing. the entire reason we're using hidden visibility on the __libc object is to make it accessible prior to relocations -- not to mention inexpensive to access. unfortunately, the workaround makes it even more expensive on pcc. when the pcc issue is fixed, an appropriate version test should be added so new pcc can use the much more efficient variant.
2012-10-13fix namespace clash (libc) in dynlink.cRich Felker1-14/+13
this makes it so the #undef libc and __libc name are no longer needed, which were problematic because the "accessor function" mode for accessing the libc struct could not be used, breaking build on any compiler without (working) visibility.
2012-10-13remove dead code from dynamic linkerRich Felker1-10/+0
2012-10-11comment possibly-confusing i386 vsyscall asmRich Felker1-1/+13
2012-10-11avoid the thread-ptr-init behavior of sigaction when not installing handlerRich Felker1-1/+2
this is necessary because posix_spawn calls sigaction after vfork, and if the thread pointer is not already initialized, initializing it in the child corrupts the parent process's state.
2012-10-11i386 vsyscall support (vdso-provided sysenter/syscall instruction based)Rich Felker3-16/+62
this doubles the performance of the fastest syscalls on the atom I tested it on; improvement is reportedly much more dramatic on worst-case cpus. cannot be used for cancellable syscalls.
2012-10-08ensure that buffer for decoding auxv at startup is initially zeroRich Felker1-1/+1
2012-10-07clean up and refactor program initializationRich Felker6-34/+33
the code in __libc_start_main is now responsible for parsing auxv, rather than duplicating the parsing all over the place. this should shave off a few cycles and some code size. __init_libc is left as an external-linkage function despite the fact that it could be static, to prevent it from being inlined and permanently wasting stack space when main is called. a few other minor changes are included, like eliminating per-thread ssp canaries (they were likely broken when combined with certain dlopen usages, and completely unnecessary) and some other unnecessary checks. since this code gets linked into every program, it should be as small and simple as possible.
2012-10-07fix breakage due to initializing thread pointer when loading libsRich Felker1-1/+1
at initial program load, all libraries must be loaded before the thread pointer can be setup, since the TP-relative addresses of all initial TLS objects must be constant.
2012-10-06make new TLS setup block even implementation-internals signalsRich Felker1-2/+1
this is needed to ensure async-cancel-safety, i.e. to make it safe to access TLS objects when async cancellation is enabled. otherwise, if cancellation were acter upon after the atomic fetch/add but before the thread saved the obtained memory, another access to the same TLS in the cancellation handler could end up performing the atomic fetch/add again, consuming more memory than is actually available and overflowing into other objects on the heap.
2012-10-06don't crash if TLS library is loaded into process with no thread pointerRich Felker1-0/+5
2012-10-06fix buggy TLS size/alignment computations in static-linked TLSRich Felker1-5/+22
2012-10-06fix symbol acceptance/rejection rules for TLSRich Felker1-8/+14
symbol value of 0 is not "undefined" for TLS; it's the address of the first symbol in the TLS segment. however, non-definition TLS references also have values of 0, so check the section. hopefully the new logic is more clear, too.
2012-10-06TLS fixes, mainly alignment handlingRich Felker1-39/+48
compute offsets from the thread pointer statically when loading the library, rather than repeating the logic on each thread creation. not only is the latter less efficient at runtime; it also fails to provide solid guarantees that the offsets will remain the same when the initial alignment of memory is different. the new alignment handling is both more rigorous and simpler. the old code was also clobbering TLS bss with random image data in some cases due to using tls_size (size of TLS segment) instead of tls_len (length of the TLS data image).
2012-10-05fix/improve shared library ctor/dtor handling, allow recursive dlopenRich Felker1-7/+29
some libraries call dlopen from their constructors, resulting in recursive calls to dlopen. previously, this resulted in deadlock. I'm now unlocking the dlopen lock before running constructors (this is especially important since the lock also blocked pthread_create and was being held while application code runs!) and using a separate recursive mutex protecting the ctor/dtor state instead. in order to prevent the same ctor from being called more than once, a module is considered "constructed" just before the ctor runs. also, switch from using atexit to register each dtor to using a single atexit call to register the dynamic linker's dtor processing as just one handler. this is necessary because atexit performs allocation and may fail, but the library has already been loaded and cannot be backed-out at the time dtor registration is performed. this change also ensures that all dtors run after all atexit functions, rather than in mixed order.
2012-10-05small dynamic linker module search fixRich Felker1-1/+2
libraries loaded more than once by pathname should not get shortnames that would cause them to later be used to satisfy non-pathname load requests.
2012-10-05support for TLS in dynamic-loaded (dlopen) modulesRich Felker7-47/+115
unlike other implementations, this one reserves memory for new TLS in all pre-existing threads at dlopen-time, and dlopen will fail with no resources consumed and no new libraries loaded if memory is not available. memory is not immediately distributed to running threads; that would be too complex and too costly. instead, assurances are made that threads needing the new TLS can obtain it in an async-signal-safe way from a buffer belonging to the dynamic linker/new module (via atomic fetch-and-add based allocator). I've re-appropriated the lock that was previously used for __synccall (synchronizing set*id() syscalls between threads) as a general pthread_create lock. it's a "backwards" rwlock where the "read" operation is safe atomic modification of the live thread count, which multiple threads can perform at the same time, and the "write" operation is making sure the count does not increase during an operation that depends on it remaining bounded (__synccall or dlopen). in static-linked programs that don't use __synccall, this lock is a no-op and has no cost.
2012-10-05fix race condition in dlopenRich Felker1-1/+3
orig_tail was being saved before the lock was obtained, allowing dlopen failure to roll-back other dlopens that had succeeded.
2012-10-04dynamic-linked TLS support for everything but dlopen'd libsRich Felker1-38/+58
currently, only i386 is tested. x86_64 and arm should probably work. the necessary relocation types for mips and microblaze have not been added because I don't understand how they're supposed to work, and I'm not even sure if it's defined yet on microblaze. I may be able to reverse engineer the requirements out of gcc/binutils output.
2012-10-04remove freeing of dynamic linker data when dlopen/dlsym are not usedRich Felker1-11/+0
this was an optimization to save/recover a minimal amount of extra memory for use by malloc, that's becoming increasingly costly to keep around. freeing this data: 1. breaks debugging with gdb (it can't find library symbols) 2. breaks thread-local storage in shared libraries it would be possible to disable freeing when TLS is used, but in addition to the above breakages, tracking whether dlopen/dlsym is used adds a cost to every symbol lookup, possibly making program startup slower for large programs. combined with the complexity, it's not worth it. we already save/recover plenty of memory in the dynamic linker with reclaim_gaps.
2012-10-04beginnings of full TLS support in shared librariesRich Felker4-1/+19
this code will not work yet because the necessary relocations are not supported, and cannot be supported without some internal changes to how relocation processing works (coming soon).
2012-10-04partial TLS support for dynamic-linked programsRich Felker2-27/+77
only TLS in the main program is supported so far; TLS defined in shared libraries will not work yet.
2012-10-04TLS (GNU/C11 thread-local storage) support for static-linked programsRich Felker6-14/+117
the design for TLS in dynamic-linked programs is mostly complete too, but I have not yet implemented it. cost is nonzero but still low for programs which do not use TLS and/or do not use threads (a few hundred bytes of new code, plus dependency on memcpy). i believe it can be made smaller at some point by merging __init_tls and __init_security into __libc_start_main and avoiding duplicate auxv-parsing code. at the same time, I've also slightly changed the logic pthread_create uses to allocate guard pages to ensure that guard pages are not counted towards commit charge.
2012-09-30add getopt reset supportRich Felker2-2/+18
based on proposed patches by Daniel Cegiełka, with minor changes: - use a weak symbol for optreset so it doesn't clash with namespace - also reset optpos (position in multi-option arg like -lR) - also make getopt_long support reset
2012-09-30protect sem_open against cancellationRich Felker1-13/+19
also fix one minor bug: failure to free the early-reserved slot when the semaphore later found to already be mapped.