From b713b8b2e4b9595eec72ec3c4fe7714076d60478 Mon Sep 17 00:00:00 2001 From: Rich Felker Date: Thu, 12 Aug 2021 18:07:44 -0400 Subject: fix excessively slow TLS performance on some mips models commit 6d99ad91e869aab35a4d76d34c3c9eaf29482bad introduced this regression as part of a larger change, based on an incorrect assumption that rdhwr being part of the mips r2 ISA level meant that the TLS register, known in the mips documentation as UserLocal, was unconditionally present on chips providing this ISA level and would not need trap-and-emulate. this turns out to be false. based on research by Stanislav Kljuhhin and Abilio Marques, who reported the problem as a performance regression on certain routers using OpenWRT vs older uclibc-based versions, it turns out the mips manuals document the UserLocal register as a feature that might or might not be implemented or enabled, reflected by a cpu capability bit in the CONFIG3 register, and that Linux checks for this and has to explicitly enable it on models that have it. thus, it's indeed possible that r2+ chips can lack the feature, bringing us back to the situation where Linux only has a fast trap-and-emulate path for the case where the destination register is $3. so, always read the thread pointer through $3. this may incur a gratuitous move to the desired final register on chips where it's not needed, but it really doesn't matter. --- arch/mips/pthread_arch.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'arch/mips') diff --git a/arch/mips/pthread_arch.h b/arch/mips/pthread_arch.h index c45347ab..376b7741 100644 --- a/arch/mips/pthread_arch.h +++ b/arch/mips/pthread_arch.h @@ -1,10 +1,9 @@ static inline uintptr_t __get_tp() { -#if __mips_isa_rev < 2 register uintptr_t tp __asm__("$3"); +#if __mips_isa_rev < 2 __asm__ (".word 0x7c03e83b" : "=r" (tp) ); #else - uintptr_t tp; __asm__ ("rdhwr %0, $29" : "=r" (tp) ); #endif return tp; -- cgit v1.2.3-70-g09d2