musl - The musl libc tree (WIP / dev branches)

Age	Commit message (Collapse)	Author	Files	Lines
2015-03-03	make all objects used with atomic operations volatile	Rich Felker	2	-5/+5
	the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,p,f(p)); where the latter is intended to show multiple loads of p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.
2014-09-06	fix non-static dummy function that slipped in with locale implementation	Rich Felker	1	-1/+1

2014-08-13	add inline isspace in ctype.h as an optimization	Szabolcs Nagy	1	-8/+0
	isspace can be a bottleneck in a simple parser, inlining it gives slightly smaller and faster code src/locale/pleval.o already had this optimization, the size change for other libc functions for i386 is src/internal/intscan.o 2134 2118 -16 src/locale/dcngettext.o 1562 1552 -10 src/network/res_msend.o 1961 1940 -21 src/network/lookup_name.o 2627 2608 -19 src/network/getnameinfo.o 1814 1811 -3 src/network/lookup_serv.o 643 624 -19 src/stdio/vfscanf.o 2675 2663 -12 src/stdlib/atoll.o 117 107 -10 src/stdlib/atoi.o 95 91 -4 src/stdlib/atol.o 95 91 -4 src/time/strptime.o 1515 1503 -12 (TOTALS) 432451 432321 -130
2014-07-31	harden locale name handling and prevent slashes in LC_MESSAGES	Rich Felker	1	-3/+3
	the code which loads locale files was already rejecting locale names containing slashes. however, LC_MESSAGES records a locale name even if libc does not have a matching locale file, so that gettext or application code can use the recorded locale name for message translations to languages that libc does not support. this recorded name was not being checked for slashes, meaning that such code could potentially be tricked into directory traversal. in addition, since the value of a locale category is sometimes used as a pathname component by callers, the improved code rejects any value beginning with a dot. this prevents traversal to the parent directory via "..", use of the top-level locale directory via ".", and also avoids "hidden" directories as a side effect. finally, overly long locale names are now rejected (treated as an unrecognized name and thus as an alias for C.UTF-8) rather than being truncated.
2014-07-30	plural rule evaluator rewrite for dcngettext	Szabolcs Nagy	1	-128/+106
	using an operator precedence parser the code size became smaller and it is only slower by about %10 size of old vs new pleval.o on different archs: (with inlined isspace added to pleval.c for now) old: text data bss dec hex filename 828 0 0 828 33c pl.i386.o 1152 0 0 1152 480 pl.arm.o 1704 0 0 1704 6a8 pl.mips.o 1328 0 0 1328 530 pl.ppc.o 992 0 0 992 3e0 pl.x64.o new: text data bss dec hex filename 693 0 0 693 2b5 pl.i386.o 972 0 0 972 3cc pl.arm.o 1276 0 0 1276 4fc pl.mips.o 1087 0 0 1087 43f pl.ppc.o 846 0 0 846 34e pl.x64.o
2014-07-29	tweaks to plural rules evaluator	Szabolcs Nagy	1	-54/+44
	const parsing, depth accounting and failure handling was changed a bit so the generated code is slightly smaller.
2014-07-29	harden dcngettext plural processing	Rich Felker	1	-2/+3
	while the __mo_lookup backend can verify that the translated message ends with a null terminator, is has no way to know nplurals and thus no way to verify that sufficiently many null terminators are present in the string to satisfy all plural forms. the code in dcngettext was already attempting to avoid reading past the end of the mo file mapping, but failed to do so because the strlen call itself could over-read. using strnlen instead allows us to avoid the problem.
2014-07-29	harden mo file processing for locale/translations	Rich Felker	1	-2/+6
	rather than just checking that the start of the string lies within the mapping, also check that the nominal length remains within the mapping, and that the null terminator is present at the nominal length. this ensures that the caller, using the result as a C string, will not read past the end of the mapping. the nominal length is never exposed to the caller, but it's useful internally to find where the null terminator should be without having to restort to linear search via strnlen/memchr.
2014-07-28	implement non-default plural rules for ngettext translations	Rich Felker	2	-8/+243
	the new code in dcngettext was written by me, and the expression evaluator by Szabolcs Nagy (nsz).
2014-07-27	implement gettext message translation functions	Rich Felker	4	-68/+271
	this commit replaces the stub implementations with working message translation functions. translation units are factored so as to prevent pulling in the legacy, non-library-safe functions which use a global textdomain in modern code which is using the versions with an explicit domain argument. bind_textdomain_codeset is also placed in its own file since it should not be needed by most programs. this implementation is still missing some features: the LANGUAGE environment variable (for multiple fallback languages) is not honored, and non-default plural-form rules are not supported. these issues will be addressed in a later commit. one notable difference from the GNU implementation is that there is no default path for loading translation files. in principle one could be added, but since the documented correct usage is to call the bindtextdomain function, a default path is probably unnecessary.
2014-07-26	add support for LC_TIME and LC_MESSAGES translations	Rich Felker	2	-7/+1
	for LC_MESSAGES, translation of strerror and similar literal message functions is supported. for messages in other places (particularly the dynamic linker) that use format strings, translation is not yet supported. in order to make it possible and safe, such messages will need to be refactored to separate the textual content from the format. for LC_TIME, the day and month names and strftime-style format strings provided by nl_langinfo are supported for translation. however there may be limitations, as some of the original C-locale nl_langinfo strings are non-unique and thus perhaps non-suitable as keys. overall, the locale support activated by this commit should not be seen as complete and polished but as a basis for beginning to test locale functionality and implement locales.
2014-07-26	add missing yes/no strings to nl_langinfo	Rich Felker	1	-2/+2
	these were removed from the standard but still offered as an extension in langinfo.h, so nl_langinfo should support them.
2014-07-26	fix nl_langinfo table for LC_TIME era-related items	Rich Felker	1	-1/+2
	due to a skipped slot and missing null terminator, the last few strings were off by one or two slots from their item codes.
2014-07-26	implement mo file string lookup for translations	Rich Felker	3	-0/+65
	the core is based on a binary search; hash table is not used. both native and reverse-endian mo files are supported. all offsets read from the mapped mo file are checked against the mapping size to prevent the possibility of reads outside the mapping. this commit has no observable effects since there are not yet any callers to the message translation code.
2014-07-24	implement locale file loading and state for remaining locale categories	Rich Felker	2	-2/+70
	there is still no code which actually uses the loaded locale files, so the main observable effect of this commit is that calls to setlocale store and give back the names of the selected locales for the remaining categories (LC_TIME, LC_COLLATE, LC_MONETARY) if a locale file by the requested name could be loaded.
2014-07-24	fix locale environment variable logic for empty strings	Rich Felker	1	-3/+3
	per POSIX (XBD 8.2) LC_*/LANG environment variables set to to the empty string are supposed to be treated as if they were not set at all.
2014-07-02	properly pass current locale to *_l functions when used internally	Rich Felker	6	-6/+12
	this change is presently non-functional since the callees do not yet use their locale argument for anything.
2014-07-02	consolidate str[n]casecmp_l into str[n]casecmp source files	Rich Felker	2	-13/+0
	this is mainly done for consistency with the ctype functions and to declutter the src/locale directory.
2014-07-02	consolidate *_l ctype/wctype functions into their non-_l source files	Rich Felker	32	-204/+0
	the main practical purposes of this commit are to remove a huge amount of clutter from the src/locale directory, to cut down on the length of the $(AR) and $(LD) command lines, and to reduce the amount of space wasted by object file headers in the static libc.a. build time may also be reduced, though this has not been measured. as an additional justification, if there ever were a need for the behavior of these functions to vary by locale, it would be necessary for the non-_l versions to call the _l versions, so that linking the former without the latter would not be possible anyway.
2014-07-02	add locale framework	Rich Felker	5	-19/+155
	this commit adds non-stub implementations of setlocale, duplocale, newlocale, and uselocale, along with the data structures and minimal code needed for representing the active locale on a per-thread basis and optimizing the common case where thread-local locale settings are not in use. at this point, the data structures only contain what is necessary to represent LC_CTYPE (a single flag) and LC_MESSAGES (a name for use in finding message translation files). representation for the other categories will be added later; the expectation is that a single pointer will suffice for each. for LC_CTYPE, the strings "C" and "POSIX" are treated as special; any other string is accepted and treated as "C.UTF-8". for other categories, any string is accepted after being truncated to a maximum supported length (currently 15 bytes). for LC_MESSAGES, the name is kept regardless of whether libc itself can use such a message translation locale, since applications using catgets or gettext should be able to use message locales libc is not aware of. for other categories, names which are not successfully loaded as locales (which, at present, means all names) are treated as aliases for "C". setlocale never fails. locale settings are not yet used anywhere, so this commit should have no visible effects except for the contents of the string returned by setlocale.
2014-06-10	replace all remaining internal uses of pthread_self with __pthread_self	Rich Felker	1	-1/+1
	prior to version 1.1.0, the difference between pthread_self (the public function) and __pthread_self (the internal macro or inline function) was that the former would lazily initialize the thread pointer if it was not already initialized, whereas the latter would crash in this case. since lazy initialization is no longer supported, use of pthread_self no longer makes sense; it simply generates larger, slower code.
2014-05-13	add cp437 and cp850 to available iconv conversions	Rich Felker	2	-177/+206
	perhaps some additional legacy DOS-era codepages would also be useful to have, but these are the ones for which there has been demand. the size of the diff is due to the fact that legacychars.h is updated in such a way that new characters are inserted into the table in unicode codepoint order; thus other mappings in codepages.h have changed to reflect the new table indices of their characters.
2014-01-23	fix an overflow in wcsxfrm when n==0	Szabolcs Nagy	1	-2/+4
	posix allows zero length destination
2013-12-12	include cleanups: remove unused headers and add feature test macros	Szabolcs Nagy	3	-3/+1

2013-11-25	remove duplicate includes from dynlink.c, strfmon.c and getaddrinfo.c	Szabolcs Nagy	1	-1/+0

2013-08-17	remove spurious tmp file present since initial git check-in	Rich Felker	1	-390/+0

2013-08-17	add hkscs/big5-2003/eten extensions to iconv big5	Rich Felker	3	-977/+1433
	with these changes, the character set implemented as "big5" in musl is a pure superset of cp950, the canonical "big5", and agrees with the normative parts of Unicode. this means it has minor differences from both hkscs and big5-2003: - the range A2CC-A2CE maps to CJK ideographs rather than numerals, contrary to changes made in big5-2003. - C6CD maps to a CJK ideograph rather than its corresponding Kangxi radical character, contrary to changes made in hkscs. - F9FE maps to U+2593 rather than U+FFED. of these differences, none but the last are visually distinct, and the last is a character used purely for text-based graphics, not to convey linguistic content. should there be future demand for strict conformance to big5-2003 or hkscs mappings, the present charset aliases can be replaced with distinct variants. reportedly there are other non-standard big5 extensions in common use in Taiwan and perhaps elsewhere, which could also be added as layers on top of the existing big5 support. there may be additional characters which should be added to the hkscs table: the whatwg standard for big5 defines what appears to be a superset of hkscs.
2013-08-07	add Big5 charset support to iconv	Rich Felker	2	-0/+1066
	at this point, it is just the common base charset equivalent to Windows CP 950, with no further extensions. HKSCS and possibly other supersets will be added later. other aliases may need to be added too.
2013-08-05	iconv support for legacy Korean encodings	Rich Felker	2	-0/+678
	like for other character sets, stateful iso-2022 form is not supported yet but everything else should work. all charset aliases are treated the same, as Windows codepage 949, because reportedly the EUC-KR charset name is in widespread (mis?)usage in email and on the web for data which actually uses the extended characters outside the standard 93x94 grid. this could easily be changed if desired. the principle of this converter for handling the giant bulk of rare Hangul syllables outside of the standard KS X 1001 93x94 grid is the same as the GB18030 converter's treatment of non-explicitly-coded Unicode codepoints: sequences in the extension range are mapped to an integer index N, and the converter explicitly computes the Nth Hangul syllable not explicitly encoded in the character map. empirically, this requires at most 7 passes over the grid. this approach reduces the table size required for Korean legacy encodings from roughly 44k to 17k and should have minimal performance impact on real-world text conversions since the "slow" characters are rare. where it does have impact, the cost is merely a large constant time factor.
2013-07-28	fix semantically incorrect use of LC_GLOBAL_LOCALE	Rich Felker	5	-5/+5
	LC_GLOBAL_LOCALE refers to the global locale, controlled by setlocale, not the thread-local locale in effect which these functions should be using. neither LC_GLOBAL_LOCALE nor 0 has an argument to the *_l functions has behavior defined by the standard, but 0 is a more logical choice for requesting the callee to lookup the current locale. in the future I may move the current locale lookup the the caller (the non-_l-suffixed wrapper). at this point, all of the locale logic is dummied out, so no harm was done, but it should at least avoid misleading usage.
2013-07-24	rework langinfo code for ABI compat and for use by time code	Rich Felker	2	-9/+8

2013-07-24	update strxfrm/wcsxfrm for future LC_COLLATE support and ABI compat	Rich Felker	4	-14/+20

2013-07-24	add ABI compat aliases for a number of locale_t functions	Rich Felker	8	-0/+24

2013-07-24	prepare strcoll/wcscoll for LC_COLLATE support and add ABI symbols	Rich Felker	4	-15/+20

2013-07-24	move strftime_l into strftime.c and add __-prefixed version	Rich Felker	1	-7/+0
	the latter is both for ABI purposes, and to facilitate eventually adding LC_TIME support. it's also nice to eliminate an extra source file.
2013-06-26	fix iconv conversion to legacy 8bit codepages	Rich Felker	1	-2/+2
	this seems to have been a simple copy-and-paste error from the code for converting from legacy codepages.
2012-09-06	use restrict everywhere it's required by c99 and/or posix 2008	Rich Felker	7	-8/+8
	to deal with the fact that the public headers may be used with pre-c99 compilers, __restrict is used in place of restrict, and defined appropriately for any supported compiler. we also avoid the form [restrict] since older versions of gcc rejected it due to a bug in the original c99 standard, and instead use the form *restrict.
2012-06-20	duplocale: don't crash when called with LC_GLOBAL_LOCALE	Rich Felker	1	-1/+1
	posix has resolved to add this usage; for now, we just avoid writing anything to the new locale object since it's not used anyway.
2012-06-19	fix localeconv values and implementation	Rich Felker	1	-15/+28
	dynamic-allocation of the structure is not valid; it can crash an application if malloc fails. since localeconv is not specified to have failure conditions, the object needs to have static storage duration. need to review whether all the values are right or not still..
2012-06-18	fix multiple iconv bugs reading utf-16/32 and wchar_t	Rich Felker	1	-8/+8

2012-06-18	fix iconv dest utf-16: unavailable chars must be replaced; EILSEQ is wrong	Rich Felker	1	-2/+2

2012-06-18	fix erroneous utf-16 encoding with surrogates in iconv	Rich Felker	1	-0/+1
	apparently this was never tested before.
2012-04-21	fix major breakage in iconv, bogus rejecting of dest charsets	Rich Felker	1	-1/+1

2012-03-25	add strfmon_l variant (still mostly incomplete)	Rich Felker	1	-3/+27

2012-03-21	initial, very primitive strfmon	Rich Felker	1	-0/+77

2012-03-01	add all missing wchar functions except floating point parsers	Rich Felker	2	-0/+12
	these are mostly untested and adapted directly from corresponding byte string functions and similar.
2012-02-06	more locale_t interfaces (string stuff) and header updates	Rich Felker	7	-0/+48
	this should be everything except for some functions where the non-_l version isn't even implemented yet (mainly some non-ISO-C wcs* functions).
2012-02-06	fix some omissions and mistakes in locale_t interface definitions	Rich Felker	13	-13/+13

2012-02-06	add more of the locale_t interfaces, all dummied out to ignore the locale	Rich Felker	18	-0/+108

2011-07-12	gb18030 support in iconv (only from, not to)	Rich Felker	2	-2/+1887
	also support (and restrict to subsets) older chinese sets, and explicitly refuse to convert to cjk (since there's no code for it yet)