diff options
author | Rich Felker <dalias@aerifal.cx> | 2015-10-01 22:59:56 +0000 |
---|---|---|
committer | Rich Felker <dalias@aerifal.cx> | 2015-10-01 22:59:56 +0000 |
commit | 2d51c4ad57d5cbb083b5bce94ff692490c10ee2d (patch) | |
tree | 3186f504065dd07ae2f78dd8942f758f74530af2 /crt | |
parent | fd2add5ba01145593d71362c40a0a39b0b3dd7d2 (diff) | |
download | musl-2d51c4ad57d5cbb083b5bce94ff692490c10ee2d.tar.gz musl-2d51c4ad57d5cbb083b5bce94ff692490c10ee2d.tar.bz2 musl-2d51c4ad57d5cbb083b5bce94ff692490c10ee2d.tar.xz musl-2d51c4ad57d5cbb083b5bce94ff692490c10ee2d.zip |
make nl_langinfo(CODESET) always return "ASCII" in byte-based C locale
commit 844212d94f582c4e3c5055e0a1524931e89ebe76, which did not make it
into any releases, changed nl_langinfo(CODESET) to always return
"UTF-8", even in the byte-based C locale. this was problematic because
application software was found to use the string match for "UTF-8" to
activate its own UTF-8 processing. this both undermines the byte-based
functionality of the C locale, and if mixed with with calls to the
standard multibyte functions, which happened in practice, could result
in severe mis-handling of input.
the motive for the previous change was that, to avoid widespread
compatibility problems, the string returned by nl_langinfo(CODESET)
needs to be accepted by iconv and by third-party character conversion
code. thus, the only remaining choice is "ASCII". this choice
accurately represents the intent that high bytes do not have
individual meaning in the C locale, but it does mean that iconv, when
passed nl_langinfo(CODESET) in the C locale, will produce errors in
cases where mbrtowc would have succeeded. for reference, glibc behaves
similarly in this regard, so I don't think it will be a problem.
Diffstat (limited to 'crt')
0 files changed, 0 insertions, 0 deletions