diff options
author | Rich Felker <dalias@aerifal.cx> | 2015-06-16 04:44:17 +0000 |
---|---|---|
committer | Rich Felker <dalias@aerifal.cx> | 2015-06-16 05:28:48 +0000 |
commit | 1507ebf837334e9e07cfab1ca1c2e88449069a80 (patch) | |
tree | 92bad1f861e442f7e2d2fa4e178f471f4371509a /src/locale | |
parent | 38e2f727237230300fea6aff68802db04625fd23 (diff) | |
download | musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.tar.gz musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.tar.bz2 musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.tar.xz musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.zip |
byte-based C locale, phase 1: multibyte character handling functions
this patch makes the functions which work directly on multibyte
characters treat the high bytes as individual abstract code units
rather than as multibyte sequences when MB_CUR_MAX is 1. since
MB_CUR_MAX is presently defined as a constant 4, all of the new code
added is dead code, and optimizing compilers' code generation should
not be affected at all. a future commit will activate the new code.
as abstract code units, bytes 0x80 to 0xff are represented by wchar_t
values 0xdf80 to 0xdfff, at the end of the surrogates range. this
ensures that they will never be misinterpreted as Unicode characters,
and that all wctype functions return false for these "characters"
without needing locale-specific logic. a high range outside of Unicode
such as 0x7fffff80 to 0x7fffffff was also considered, but since C11's
char16_t also needs to be able to represent conversions of these
bytes, the surrogate range was the natural choice.
Diffstat (limited to 'src/locale')
-rw-r--r-- | src/locale/langinfo.c | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/src/locale/langinfo.c b/src/locale/langinfo.c index a1ada246..776b4478 100644 --- a/src/locale/langinfo.c +++ b/src/locale/langinfo.c @@ -33,7 +33,8 @@ char *__nl_langinfo_l(nl_item item, locale_t loc) int idx = item & 65535; const char *str; - if (item == CODESET) return "UTF-8"; + if (item == CODESET) + return MB_CUR_MAX==1 ? "UTF-8-CODE-UNITS" : "UTF-8"; switch (cat) { case LC_NUMERIC: |