summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-10have ldso track replacement of aligned_allocRich Felker3-0/+4
this is in preparation for improving behavior of malloc interposition.
2020-06-10reintroduce calloc elison of memset for direct-mmapped allocationsRich Felker3-1/+15
a new weak predicate function replacable by the malloc implementation, __malloc_allzerop, is introduced. by default it's always false; the default version will be used when static linking if the bump allocator was used (in which case performance doesn't matter) or if malloc was replaced by the application. only if the real internal malloc is linked (always the case with dynamic linking) does the real version get used. if malloc was replaced dynamically, as indicated by __malloc_replaced, the predicate function is ignored and conditional-memset is always performed.
2020-06-10move __malloc_replaced to a top-level malloc fileRich Felker2-2/+3
it's not part of the malloc implementation but glue with musl dynamic linker.
2020-06-10switch to a common calloc implementationRich Felker3-47/+37
abstractly, calloc is completely malloc-implementation-independent; it's malloc followed by memset, or as we do it, a "conditional memset" that avoids touching fresh zero pages. previously, calloc was kept separate for the bump allocator, which can always skip memset, and the version of calloc provided with the full malloc conditionally skipped the clearing for large direct-mmapped allocations. the latter is a moderately attractive optimization, and can be added back if needed. however, further consideration to make it correct under malloc replacement would be needed. commit b4b1e10364c8737a632be61582e05a8d3acf5690 documented the contract for malloc replacement as allowing omission of calloc, and indeed that worked for dynamic linking, but for static linking it was possible to get the non-clearing definition from the bump allocator; if not for that, it would have been a link error trying to pull in malloc.o. the conditional-clearing code for the new common calloc is taken from mal0_clear in oldmalloc, but drops the need to access actual page size and just uses a fixed value of 4096. this avoids potentially needing access to global data for the sake of an optimization that at best marginally helps archs with offensively-large page sizes.
2020-06-03move oldmalloc to its own directory under src/mallocRich Felker5-1/+2
this sets the stage for replacement, and makes it practical to keep oldmalloc around as a build option for a while if that ends up being useful. only the files which are actually part of the implementation are moved. memalign and posix_memalign are entirely generic. in theory calloc could be pulled out too, but it's useful to have it tied to the implementation so as to optimize out unnecessary memset when implementation details make it possible to know the memory is already clear.
2020-06-03move __expand_heap into malloc.cRich Felker3-73/+64
this function is no longer used elsewhere, and moving it reduces the number of source files specific to the malloc implementation.
2020-06-03rename memalign source file back to its proper nameRich Felker1-0/+0
2020-06-03rename aligned_alloc source file back to its proper nameRich Felker1-0/+0
2020-06-03reverse dependency order of memalign and aligned_allocRich Felker4-10/+5
this change eliminates the internal __memalign function and makes the memalign and posix_memalign functions completely independent of the malloc implementation, written portably in terms of aligned_alloc.
2020-06-03rename aligned_alloc source fileRich Felker1-0/+0
this is the first step of swapping the name of the actual implementation to aligned_alloc while preserving history follow.
2020-06-03remove stale document from malloc src directoryRich Felker1-22/+0
this was an unfinished draft document present since the initial check-in, that was never intended to ship in its current form. remove it as part of reorganizing for replacement of the allocator.
2020-06-03rewrite bump allocator to fix corner cases, decouple from expand_heapRich Felker1-17/+72
this affects the bump allocator used when static linking in programs that don't need allocation metadata due to not using realloc, free, etc. commit e3bc22f1eff87b8f029a6ab31f1a269d69e4b053 refactored the bump allocator to share code with __expand_heap, used by malloc, for the purpose of fixing the case (mainly nommu) where brk doesn't work. however, the geometric growth behavior of __expand_heap is not actually well-suited to the bump allocator, and can produce significant excessive memory usage. in particular, by repeatedly requesting just over the remaining free space in the current mmap-allocated area, the total mapped memory will be roughly double the nominal usage. and since the main user of the no-brk mmap fallback in the bump allocator is nommu, this excessive usage is not just virtual address space but physical memory. in addition, even on systems with brk, having a unified size request to __expand_heap without knowing whether the brk or mmap backend would get used made it so the brk could be expanded twice as far as needed. for example, with malloc(n) and n-1 bytes available before the current brk, the brk would be expanded by n bytes rounded up to page size, when expansion by just one page would have sufficed. the new implementation computes request size separately for the cases where brk expansion is being attempted vs using mmap, and also performs individual mmap of large allocations without moving to a new bump area and throwing away the rest of the old one. this greatly reduces the need for geometric area size growth and limits the extent to which free space at the end of one bump area might be unusable for future allocations. as a bonus, the resulting code size is somewhat smaller than the combined old version plus __expand_heap.
2020-06-02move malloc_impl.h from src/internal to src/mallocRich Felker1-0/+0
this reflects that it is no longer intended for consumption outside of the malloc implementation.
2020-06-02move declaration of interfaces between malloc and ldso to dynlink.hRich Felker3-5/+4
this eliminates consumers of malloc_impl.h outside of the malloc implementation.
2020-06-02reformat clock_adjtime with always-true condition removedRich Felker1-48/+46
2020-06-02always use time64 syscall first for clock_adjtimeRich Felker1-2/+1
clock_adjtime always returns the current clock setting in struct timex, so it's always possible that the time64 version is needed.
2020-06-02fix broken time64 clock_adjtimeRich Felker1-1/+1
the 64-bit time code path used the wrong (time32) syscall. fortunately this code path is not yet taken unless attempting to set a post-Y2038 time.
2020-06-02fix unbounded heap expansion race in mallocRich Felker1-152/+87
this has been a longstanding issue reported many times over the years, with it becoming increasingly clear that it could be hit in practice. under concurrent malloc and free from multiple threads, it's possible to hit usage patterns where unbounded amounts of new memory are obtained via brk/mmap despite the total nominal usage being small and bounded. the underlying cause is that, as a fundamental consequence of keeping locking as fine-grained as possible, the state where free has unbinned an already-free chunk to merge it with a newly-freed one, but has not yet re-binned the combined chunk, is exposed to other threads. this is bad even with small chunks, and leads to suboptimal use of memory, but where it really blows up is where the already-freed chunk in question is the large free region "at the top of the heap". in this situation, other threads momentarily see a state of having almost no free memory, and conclude that they need to obtain more. as far as I can tell there is no fix for this that does not harm performance. the fix made here forces all split/merge of free chunks to take place under a single lock, which also takes the place of the old free_lock, being held at least momentarily at the time of free to determine whether there are neighboring free chunks that need merging. as a consequence, the pretrim, alloc_fwd, and alloc_rev operations no longer make sense and are deleted. simplified merging now takes place inline in free (__bin_chunk) and realloc. as commented in the source, holding the split_merge_lock precludes any chunk transition from in-use to free state. for the most part, it also precludes change to chunk header sizes. however, __memalign may still modify the sizes of an in-use chunk to split it into two in-use chunks. arguably this should require holding the split_merge_lock, but that would necessitate refactoring to expose it externally, which is a mess. and it turns out not to be necessary, at least assuming the existing sloppy memory model malloc has been using, because if free (__bin_chunk) or realloc sees any unsynchronized change to the size, it will also see the in-use bit being set, and thereby can't do anything with the neighboring chunk that changed size.
2020-06-01suppress unwanted warnings when configuring with clangRich Felker1-0/+7
coding style warnings enabled by default in clang have long been a source of spurious questions/bug-reports. since clang provides a -w that behaves differently from gcc's, and that lets us enable any warnings we may actually want after turning them all off to start with a clean slate, use it at configure time if clang is detected.
2020-05-22restore lock-skipping for processes that return to single-threaded stateRich Felker4-6/+12
the design used here relies on the barrier provided by the first lock operation after the process returns to single-threaded state to synchronize with actions by the last thread that exited. by storing the intent to change modes in the same object used to detect whether locking is needed, it's possible to avoid an extra (possibly costly) memory load after the lock is taken.
2020-05-22cut down size of some libc struct membersRich Felker1-3/+3
these are all flags that can be single-byte values.
2020-05-22don't use libc.threads_minus_1 as relaxed atomic for skipping locksRich Felker3-3/+3
after all but the last thread exits, the next thread to observe libc.threads_minus_1==0 and conclude that it can skip locking fails to synchronize with any changes to memory that were made by the last-exiting thread. this can produce data races. on some archs, at least x86, memory synchronization is unlikely to be a problem; however, with the inline locks in malloc, skipping the lock also eliminated the compiler barrier, and caused code that needed to re-check chunk in-use bits after obtaining the lock to reuse a stale value, possibly from before the process became single-threaded. this in turn produced corruption of the heap state. some uses of libc.threads_minus_1 remain, especially for allocation of new TLS in the dynamic linker; otherwise, it could be removed entirely. it's made non-volatile to reflect that the remaining accesses are only made under lock on the thread list. instead of libc.threads_minus_1, libc.threaded is now used for skipping locks. the difference is that libc.threaded is permanently true once an additional thread has been created. this will produce some performance regression in processes that are mostly single-threaded but occasionally creating threads. in the future it may be possible to bring back the full lock-skipping, but more care needs to be taken to produce a safe design.
2020-05-22reorder thread list unlink in pthread_exit after all locksRich Felker1-8/+11
since the backend for LOCK() skips locking if single-threaded, it's unsafe to make the process appear single-threaded before the last use of lock. this fixes potential unsynchronized access to a linked list via __dl_thread_cleanup.
2020-05-21fix incorrect SIGSTKFLT on all mips archsRich Felker3-3/+3
signal 7 is SIGEMT on Linux mips* ABI according to the man pages and kernel. it's not clear where the wrong name came from but it dates back to original mips commit.
2020-05-21handle possibility that SIGEMT replaces SIGSTKFLT in strsignalRich Felker1-0/+10
presently all archs define SIGSTKFLT but this is not correct. change strsignal as a prerequisite for fixing that.
2020-05-19fix return value of res_send, res_query on errors from nameserverRich Felker1-1/+1
the internal __res_msend returns 0 on timeout without having obtained any conclusive answer, but in this case has not filled in meaningful anslen. res_send wrongly treated that as success, but returned a zero answer length. any reasonable caller would eventually end up treating that as an error when attempting to parse/validate it, but it should just be reported as an error. alternatively we could return the last-received inconclusive answer (typically servfail), but doing so would require internal changes in __res_msend. this may be considered later.
2020-05-19fix handling of errors resolving one of paired A+AAAA queryRich Felker1-4/+7
the old logic here likely dates back, at least in inspiration, to before it was recognized that transient errors must not be allowed to reflect the contents of successful results and must be reported to the application. here, the dns backend for getaddrinfo, when performing a paired query for v4 and v6 addresses, accepted results for one address family even if the other timed out. (the __res_msend backend does not propagate error rcodes back to the caller, but continues to retry until timeout, so other error conditions were not actually possible.) this patch moves the checks to take place before answer parsing, and performs them for each answer rather than only the answer to the first query. if nxdomain is seen it's assumed to apply to both queries since that's how dns semantics work.
2020-05-18set AD bit in dns queries, suppress for internal useRich Felker3-0/+3
the AD (authenticated data) bit in outgoing dns queries is defined by rfc3655 to request that the nameserver report (via the same bit in the response) whether the result is authenticated by DNSSEC. while all results returned by a DNSSEC conforming nameserver will be either authenticated or cryptographically proven to lack DNSSEC protection, for some applications it's necessary to be able to distinguish these two cases. in particular, conforming and compatible handling of DANE (TLSA) records requires enforcing them only in signed zones. when the AD bit was first defined for queries, there were reports of compatibility problems with broken firewalls and nameservers dropping queries with it set. these problems are probably a thing of the past, and broken nameservers are already unsupported. however, since there is no use in the AD bit with the netdb.h interfaces, explicitly clear it in the queries they make. this ensures that, even with broken setups, the standard functions will work, and at most the res_* functions break.
2020-04-30fix undefined behavior from signed overflow in strstr and memmemRich Felker2-8/+8
unsigned char promotes to int, which can overflow when shifted left by 24 bits or more. this has been reported multiple times but then forgotten. it's expected to be benign UB, but can trap when built with explicit overflow catching (ubsan or similar). fix it now. note that promotion to uint32_t is safe and portable even outside of the assumptions usually made in musl, since either uint32_t has rank at least unsigned int, so that no further default promotions happen, or int is wide enough that the shift can't overflow. this is a desirable property to have in case someone wants to reuse the code elsewhere.
2020-04-26remove arm (32-bit) support for vdso clock_gettimeRich Felker1-6/+0
it's been reported that the vdso clock_gettime64 function on (32-bit) arm is broken, producing erratic results that grow at a rate far greater than one reported second per actual elapsed second. the vdso function seems to have been added sometime between linux 5.4 and 5.6, so if there's ever been a working version, it was only present for a very short window. it's not clear what the eventual upstream kernel solution will be, but something needs to be done on the libc side so as not to be producing binaries that seem to work on older/existing/lts kernels (which lack the function and thus lack the bug) but will break fantastically when moving to newer kernels. hopefully vdso support will be added back soon, but with a new symbol name or version from the kernel to allow continued rejection of broken ones.
2020-04-24fix undefined behavior in wcsto[ld] family functionsRich Felker2-4/+2
analogous to commit b287cd745c2243f8e5114331763a5a9813b5f6ee but for the custom FILE stream type the wcstol and wcstod family use. __toread could be used here as well, but there's a simple direct fix to make the buffer pointers initially valid for subtraction, so just do that to avoid pulling in stdio exit code in programs that don't use stdio.
2020-04-18fix sh fesetround failure to clear old modeRich Felker1-0/+2
the sh version of fesetround or'd the new rounding mode onto the control register without clearing the old rounding mode bits, making changes sticky. this was the root cause of multiple test failures.
2020-04-17move __string_read into vsscanf source fileRich Felker3-21/+13
apparently this function was intended at some point to be used by strto* family as well, and thus was put in its own file; however, as far as I can tell, it's only ever been used by vsscanf. move it to the same file to reduce the number of source files and external symbols.
2020-04-17remove spurious repeated semicolon in fmemopenRich Felker1-1/+1
2020-04-17combine two calls to memset in fmemopenRich Felker1-2/+2
this idea came up when I thought we might need to zero the UNGET portion of buf as well, but it seems like a useful improvement even when that turned out not to be necessary.
2020-04-17fix possible access to uninitialized memory in shgetc (via scanf)Rich Felker1-1/+1
shgetc sets up to be able to perform an "unget" operation without the caller having to remember and pass back the character value, and for this purpose used a conditional store idiom: if (f->rpos[-1] != c) f->rpos[-1] = c to make it safe to use with non-writable buffers (setup by the sh_fromstring macro or __string_read with sscanf). however, validity of this depends on the buffer space at rpos[-1] being initialized, which is not the case under some conditions (including at least unbuffered files and fmemopen ones). whenever data was read "through the buffer", the desired character value is already in place and does not need to be written. thus, rather than testing for the absence of the value, we can test for rpos<=buf, indicating that the last character read could not have come from the buffer, and thereby that we have a "real" buffer (possibly of zero length) with writable pushback (UNGET bytes) below it.
2020-04-17fix undefined behavior in scanf coreRich Felker1-0/+3
as reported/analyzed by Pascal Cuoq, the shlim and shcnt macros/functions are called by the scanf core (vfscanf) with f->rpos potentially null (if the FILE is not yet activated for reading at the time of the call). in this case, they compute differences between a null pointer (f->rpos) and a non-null one (f->buf), resulting in undefined behavior. it's unlikely that any observably wrong behavior occurred in practice, at least without LTO, due to limits on what's visible to the compiler from translation unit boundaries, but this has not been checked. fix is simply ensuring that the FILE is activated for read mode before entering the main scanf loop, and erroring out early if it can't be.
2020-03-24math: add x86_64 remquolAlexander Monakov1-0/+32
2020-03-24math: move x87-family fmod functions to C with inline asmAlexander Monakov8-44/+38
2020-03-24math: move x87-family remainder functions to C with inline asmAlexander Monakov8-50/+42
2020-03-24math: move x87-family rint functions to C with inline asmAlexander Monakov8-24/+28
2020-03-24math: move x87-family lrint functions to C with inline asmAlexander Monakov16-60/+64
2020-03-24math: move x86_64 (l)lrint(f) functions to C with inline asmAlexander Monakov8-20/+32
2020-03-24math: move i386 sqrt to C with inline asmAlexander Monakov2-21/+15
2020-03-24math: move i386 sqrtf to C with inline asmAlexander Monakov2-7/+12
2020-03-24math: move trivial x86-family sqrt functions to C with inline asmAlexander Monakov8-18/+28
2020-03-24math: move x87-family fabs functions to C with inline asmAlexander Monakov8-24/+28
2020-03-24math: move x86_64 fabs, fabsf to C with inline asmAlexander Monakov4-16/+20
2020-03-21fix parsing offsets after long timezone namesSamuel Holland1-5/+5
TZ containg a timezone name with >TZNAME_MAX characters currently breaks musl's timezone parsing. getname() stops after TZNAME_MAX characters. getoff() will consume no characters (because the next character is not a digit) and incorrectly return 0. Then, because there are remaining alphabetic characters, __daylight == 1, and dst_off == -3600. getname() must consume the entire timezone name, even if it will not fit in d/__tzname, so when it returns, s points to the offset digits.
2020-03-21avoid out-of-bounds read for invalid quoted timezoneSamuel Holland1-2/+2
Parsing the timezone name must stop when reaching the null terminator. In that case, there is no '>' to skip.