Age | Commit message (Collapse) | Author | Files | Lines |
|
previously, fgetwc left all but the first byte of an illegal sequence
unread (available for subsequent calls) when reading out of the FILE
buffer, but dropped all bytes contibuting to the error when falling
back to reading a byte at a time. neither behavior was ideal. in the
buffered case, each malformed character produced one error per byte,
rather than one per character. in the unbuffered case, consuming the
last byte that caused the transition from "incomplete" to "invalid"
state potentially dropped (and produced additional spurious encoding
errors for) the next valid character.
to handle both cases uniformly without duplicate code, revise the
buffered case to only cover situations where a complete and valid
character is present in the buffer, and fall back to byte-at-a-time
for all other cases. this allows using mbtowc (stateless) instead of
mbrtowc, which may slightly improve performance too.
when an encoding error has been hit in the byte-at-a-time case, leave
the final byte that produced the error unread (via ungetc) except in
the case of single-byte errors (for UTF-8, bytes c0, c1, f5-ff, and
continuation bytes with no lead byte). single-byte errors are fully
consumed so as not to leave the caller in an infinite loop repeating
the same error.
none of these changes are distinguished from a conformance standpoint,
since the file position is unspecified after encoding errors. they are
intended merely as QoI/consistency improvements.
|
|
Update the buffer position according to the bytes consumed into st when
decoding an incomplete character at the end of the buffer.
|
|
this patch adjusts libc components which use the multibyte functions
internally, and which depend on them operating in a particular
encoding, to make the appropriate locale changes before calling them
and restore the calling thread's locale afterwards. activating the
byte-based C locale without these changes would cause regressions in
stdio and iconv.
in the case of iconv, the current implementation was simply using the
multibyte functions as UTF-8 conversions. setting a multibyte UTF-8
locale for the duration of the iconv operation allows the code to
continue working.
in the case of stdio, POSIX requires that FILE streams have an
encoding rule bound at the time of setting wide orientation. as long
as all locales, including the C locale, used the same encoding,
treating high bytes as UTF-8, there was no need to store an encoding
rule as part of the stream's state.
a new locale field in the FILE structure points to the locale that
should be made active during fgetwc/fputwc/ungetwc on the stream. it
cannot point to the locale active at the time the stream becomes
oriented, because this locale could be mutable (the global locale) or
could be destroyed (locale_t objects produced by newlocale) before the
stream is closed. instead, a pointer to the static C or C.UTF-8 locale
object added in commit commit aeeac9ca5490d7d90fe061ab72da446c01ddf746
is used. this is valid since categories other than LC_CTYPE will not
affect these functions.
|
|
the old idiom, f->mode |= f->mode+1, was adapted from the idiom for
setting byte orientation, f->mode |= f->mode-1, but the adaptation was
incorrect. unless the stream was alreasdy set byte-oriented, this code
incremented f->mode each time it was executed, which would eventually
lead to overflow. it could be fixed by changing it to f->mode |= 1,
but upcoming changes will require slightly more work at the time of
wide orientation, so it makes sense to just call fwide. as an
optimization in the single-character functions, fwide is only called
if the stream is not already wide-oriented.
|
|
this header evolved to facilitate the extremely lazy practice of
omitting explicit includes of the necessary headers in individual
stdio source files; not only was this sloppy, but it also increased
build time.
now, stdio_impl.h is only including the headers it needs for its own
use; any further headers needed by source files are included directly
where needed.
|
|
previously, stdio used spinlocks, which would be unacceptable if we
ever add support for thread priorities, and which yielded
pathologically bad performance if an application attempted to use
flockfile on a key file as a major/primary locking mechanism.
i had held off on making this change for fear that it would hurt
performance in the non-threaded case, but actually support for
recursive locking had already inflicted that cost. by having the
internal locking functions store a flag indicating whether they need
to perform unlocking, rather than using the actual recursive lock
counter, i was able to combine the conditionals at unlock time,
eliminating any additional cost, and also avoid a nasty corner case
where a huge number of calls to ftrylockfile could cause deadlock
later at the point of internal locking.
this commit also fixes some issues with usage of pthread_self
conflicting with __attribute__((const)) which resulted in crashes with
some compiler versions/optimizations, mainly in flockfile prior to
pthread_create.
|
|
the biggest change in this commit is that stdio now uses readv to fill
the caller's buffer and the FILE buffer with a single syscall, and
likewise writev to flush the FILE buffer and write out the caller's
buffer in a single syscall.
making this change required fundamental architectural changes to
stdio, so i also made a number of other improvements in the process:
- the implementation no longer assumes that further io will fail
following errors, and no longer blocks io when the error flag is set
(though the latter could easily be changed back if desired)
- unbuffered mode is no longer implemented as a one-byte buffer. as a
consequence, scanf unreading has to use ungetc, to the unget buffer
has been enlarged to hold at least 2 wide characters.
- the FILE structure has been rearranged to maintain the locations of
the fields that might be used in glibc getc/putc type macros, while
shrinking the structure to save some space.
- error cases for fflush, fseek, etc. should be more correct.
- library-internal macros are used for getc_unlocked and putc_unlocked
now, eliminating some ugly code duplication. __uflow and __overflow
are no longer used anywhere but these macros. switch to read or
write mode is also separated so the code can be better shared, e.g.
with ungetc.
- lots of other small things.
|
|
|
|
|