Age | Commit message (Collapse) | Author | Files | Lines |
|
Some packages call gettext to format a message to be sent to perror.
If the currently set user locale points to a non-existent .mo file,
open via __map_file in dcngettext will set errno to ENOENT.
Maintainer's notes: Non-modification of errno is a documented part of
the interface contract for the GNU version of this function and likely
other versions. The issue being fixed here seems to be a regression
from commit 1b52863e244ecee5b5935b6d36bb9e6efe84c035, which enabled
setting of errno from __map_file.
|
|
there is no good reason to wait to find and process the plural rules
for a translated message file until a gettext form requesting plural
rule processing is used. it just imposes additional synchronization,
here in the form of clunky use of atomics.
it looks like there may also have been a race condition where nplurals
could be seen without plural_rule being seen, possibly leading to null
pointer dereference. if so, this commit fixes it.
|
|
this further reduces the number of source files which need to include
libc.h and thereby be potentially exposed to libc global state and
internals.
this will also facilitate further improvements like adding an inline
fast-path, if we want to do so later.
|
|
libc.h was intended to be a header for access to global libc state and
related interfaces, but ended up included all over the place because
it was the way to get the weak_alias macro. most of the inclusions
removed here are places where weak_alias was needed. a few were
recently introduced for hidden. some go all the way back to when
libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented)
cancellation points had to include it.
remaining spurious users are mostly callers of the LOCK/UNLOCK macros
and files that use the LFS64 macro to define the awful *64 aliases.
in a few places, new inclusion of libc.h is added because several
internal headers no longer implicitly include libc.h.
declarations for __lockfile and __unlockfile are moved from libc.h to
stdio_impl.h so that the latter does not need libc.h. putting them in
libc.h made no sense at all, since the macros in stdio_impl.h are
needed to use them correctly anyway.
|
|
commits leading up to this one have moved the vast majority of
libc-internal interface declarations to appropriate internal headers,
allowing them to be type-checked and setting the stage to limit their
visibility. the ones that have not yet been moved are mostly
namespace-protected aliases for standard/public interfaces, which
exist to facilitate implementing plain C functions in terms of POSIX
functionality, or C or POSIX functionality in terms of extensions that
are not standardized. some don't quite fit this description, but are
"internally public" interfacs between subsystems of libc.
rather than create a number of newly-named headers to declare these
functions, and having to add explicit include directives for them to
every source file where they're needed, I have introduced a method of
wrapping the corresponding public headers.
parallel to the public headers in $(srcdir)/include, we now have
wrappers in $(srcdir)/src/include that come earlier in the include
path order. they include the public header they're wrapping, then add
declarations for namespace-protected versions of the same interfaces
and any "internally public" interfaces for the subsystem they
correspond to.
along these lines, the wrapper for features.h is now responsible for
the definition of the hidden, weak, and weak_alias macros. this means
source files will no longer need to include any special headers to
access these features.
over time, it is my expectation that the scope of what is "internally
public" will expand, reducing the number of source files which need to
include *_impl.h and related headers down to those which are actually
implementing the corresponding subsystems, not just using them.
|
|
locale_impl.h could have been used, but this function is completely
independent of anything else, and preserving that property seems nice.
|
|
In all cases this is just a change from two volatile int to one.
|
|
often translations will be named only by language, whereas locale
names may also include a territory code, modifier, and codeset
portion. previously, only translations exactly matching the locale
name were loaded. this was a major usability issue, requiring
workarounds like symlinks or tweaking of the locale name.
with these changes, gettext now searches for translations by first
removing the codeset portion of the locale name, then trying the
remainder in full, with modifier (@mod) removed, with territory code
(_XX) removed, and with both removed.
part of the reason gettext lacked support for searching fallbacks
before is that the candidate pathname for a translation file was
constructed on each call and used as the key to lookup an
already-mapped translation file. this was very costly/inefficient. we
now use the tuple of textdomain binding pointer, locale map pointer,
and integer category id as the key for looking up a translation file
mapping.
based on patch by He X.
|
|
use the standard strnlen idiom for cases where lengths greater than an
imposed limit are going to be rejected immediately anyway.
|
|
the plural_rule field of allocated msgcat structures was assumed to be
initially-null but was never initialized. for future-proofing, the
nplurals field which was left uninitialized should also be cleared.
likewise, in the binding structure, the active field could be used
uninitialized by a technicality: the a_store which stores the initial
value of 0 may be implemented as a cas operation, which reads the old
value.
rather than fixing these issues individually, just use calloc for both
allocations. this does result in wasteful clearing of name buffers (up
to NAME_MAX+PATH_MAX) before filling them, but since the size if
bounded and the time is dominated by filesystem operations, it really
doesn't matter; simplicity and future-proofing have more value here.
modified from patch submitted by He X.
|
|
this loop was only supposed to deactivate other bindings for the same
text domain name, but due to copy-and-paste error, deactivated all
other bindings.
patch by He X.
|
|
previously, LC_MESSAGES was treated specially as the only category
which could be set to a locale name without a definition file, in
order to facilitate gettext message translations when no libc locale
was available. LC_NUMERIC was completely un-settable, and LC_CTYPE
stored a flag intended to be used for a possible future byte-based C
locale, instead of storing a __locale_map pointer like the other
categories use.
this patch changes all categories to be represented by pointers to
__locale_map structures, and allows locale names without definition
files to be treated as valid locales with trivial definition when used
in any category. outwardly visible functional changes should be minor,
limited mainly to the strings read back from setlocale and the way
gettext handles translations in categories other than LC_MESSAGES.
various internal refactoring has also been performed, and improvements
in const correctness have been made.
|
|
if setlocale has not been called, the current locale's messages_name
may be a null pointer. the code path where it's assumed to be non-null
was only reachable if bindtextdomain had already been called, which is
normally not done in programs which do not call setlocale, so the
omitted check went unnoticed.
patch from Void Linux, with description rewritten.
|
|
the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
|
|
while the __mo_lookup backend can verify that the translated message
ends with a null terminator, is has no way to know nplurals and thus
no way to verify that sufficiently many null terminators are present
in the string to satisfy all plural forms. the code in dcngettext was
already attempting to avoid reading past the end of the mo file
mapping, but failed to do so because the strlen call itself could
over-read. using strnlen instead allows us to avoid the problem.
|
|
the new code in dcngettext was written by me, and the expression
evaluator by Szabolcs Nagy (nsz).
|
|
this commit replaces the stub implementations with working message
translation functions. translation units are factored so as to prevent
pulling in the legacy, non-library-safe functions which use a global
textdomain in modern code which is using the versions with an explicit
domain argument. bind_textdomain_codeset is also placed in its own
file since it should not be needed by most programs.
this implementation is still missing some features: the LANGUAGE
environment variable (for multiple fallback languages) is not honored,
and non-default plural-form rules are not supported. these issues will
be addressed in a later commit.
one notable difference from the GNU implementation is that there is no
default path for loading translation files. in principle one could be
added, but since the documented correct usage is to call the
bindtextdomain function, a default path is probably unnecessary.
|