Age | Commit message (Collapse) | Author | Files | Lines |
|
these functions need to be fast when the init routine has already run,
since they may be called very often from code which depends on global
initialization having taken place. as such, a fast path bypassing
atomic cas on the once control object was used to avoid heavy memory
contention. however, on archs with weakly ordered memory, the fast
path failed to ensure that the caller actually observes the side
effects of the init routine.
preliminary performance testing showed that simply removing the fast
path was not practical; a performance drop of roughly 85x was observed
with 20 threads hammering the same once control on a 24-core machine.
so the new explicit barrier operation from atomic.h is used to retain
the fast path while ensuring memory visibility.
performance may be reduced on some archs where the barrier actually
makes a difference, but the previous behavior was unsafe and incorrect
on these archs. future improvements to the implementation of a_barrier
should reduce the impact.
(cherry picked from commit df37d3960abec482e17fad2274a99b790f6cc08b)
(edited not to depend on a_barrier, which is not available in 1.0.x)
|
|
at the end of successful pthread_once, there was a race window during
which another thread calling pthread_once would momentarily change the
state back from 2 (finished) to 1 (in-progress). in this case, the
status was immediately changed back, but with no wake call, meaning
that waiters which arrived during this short window could block
forever. there are two possible fixes. one would be adding the wake to
the code path where it was missing. but it's better just to avoid
reverting the status at all, by using compare-and-swap instead of
swap.
(cherry picked from commit 0d0c2f40344640a2a6942dda156509593f51db5d)
|
|
the issue was a break statement that was breaking only from the
switch, not the enclosing for loop, and a failure to set the final
success state.
|
|
|