diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-04-24 10:39:27 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-04-24 10:39:27 -0700 |
commit | a5624566431de76b17862383d9ae254d9606cba9 (patch) | |
tree | 3801fe2a8a90f972691af3c5e305bd7e3db615f6 /arch/x86/lib/memset_64.S | |
parent | 487c20b016dc48230367a7be017f40313e53e3bd (diff) | |
parent | 034ff37d34071ff3f48755f728cd229e42a4f15d (diff) | |
download | linux-stable-a5624566431de76b17862383d9ae254d9606cba9.tar.gz linux-stable-a5624566431de76b17862383d9ae254d9606cba9.tar.bz2 linux-stable-a5624566431de76b17862383d9ae254d9606cba9.zip |
Merge branch 'x86-rep-insns': x86 user copy clarifications
Merge my x86 user copy updates branch.
This cleans up a lot of our x86 memory copy code, particularly for user
accesses. I've been pushing for microarchitectural support for good
memory copying and clearing for a long while, and it's been visible in
how the kernel has aggressively used 'rep movs' and 'rep stos' whenever
possible.
And that micro-architectural support has been improving over the years,
to the point where on modern CPU's the best option for a memory copy
that would become a function call (as opposed to being something that
can just be turned into individual 'mov' instructions) is now to inline
the string instruction sequence instead.
However, that only makes sense when we have the modern markers for this:
the x86 FSRM and FSRS capabilities ("Fast Short REP MOVS/STOS").
So this cleans up a lot of our historical code, gets rid of the legacy
marker use ("REP_GOOD" and "ERMS") from the memcpy/memset cases, and
replaces it with that modern reality. Note that REP_GOOD and ERMS end
up still being used by the known large cases (ie page copyin gand
clearing).
The reason much of this ends up being about user memory accesses is that
the normal in-kernel cases are done by the compiler (__builtin_memcpy()
and __builtin_memset()) and getting to the point where we can use our
instruction rewriting to inline those to be string instructions will
need some compiler support.
In contrast, the user accessor functions are all entirely controlled by
the kernel code, so we can change those arbitrarily.
Thanks to Borislav Petkov for feedback on the series, and Jens testing
some of this on micro-architectures I didn't personally have access to.
* x86-rep-insns:
x86: rewrite '__copy_user_nocache' function
x86: remove 'zerorest' argument from __copy_user_nocache()
x86: set FSRS automatically on AMD CPUs that have FSRM
x86: improve on the non-rep 'copy_user' function
x86: improve on the non-rep 'clear_user' function
x86: inline the 'rep movs' in user copies for the FSRM case
x86: move stac/clac from user copy routines into callers
x86: don't use REP_GOOD or ERMS for user memory clearing
x86: don't use REP_GOOD or ERMS for user memory copies
x86: don't use REP_GOOD or ERMS for small memory clearing
x86: don't use REP_GOOD or ERMS for small memory copies
Diffstat (limited to 'arch/x86/lib/memset_64.S')
-rw-r--r-- | arch/x86/lib/memset_64.S | 47 |
1 files changed, 11 insertions, 36 deletions
diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S index 6143b1a6fa2c..7c59a704c458 100644 --- a/arch/x86/lib/memset_64.S +++ b/arch/x86/lib/memset_64.S @@ -18,27 +18,22 @@ * rdx count (bytes) * * rax original destination + * + * The FSRS alternative should be done inline (avoiding the call and + * the disgusting return handling), but that would require some help + * from the compiler for better calling conventions. + * + * The 'rep stosb' itself is small enough to replace the call, but all + * the register moves blow up the code. And two of them are "needed" + * only for the return value that is the same as the source input, + * which the compiler could/should do much better anyway. */ SYM_FUNC_START(__memset) - /* - * Some CPUs support enhanced REP MOVSB/STOSB feature. It is recommended - * to use it when possible. If not available, use fast string instructions. - * - * Otherwise, use original memset function. - */ - ALTERNATIVE_2 "jmp memset_orig", "", X86_FEATURE_REP_GOOD, \ - "jmp memset_erms", X86_FEATURE_ERMS + ALTERNATIVE "jmp memset_orig", "", X86_FEATURE_FSRS movq %rdi,%r9 + movb %sil,%al movq %rdx,%rcx - andl $7,%edx - shrq $3,%rcx - /* expand byte value */ - movzbl %sil,%esi - movabs $0x0101010101010101,%rax - imulq %rsi,%rax - rep stosq - movl %edx,%ecx rep stosb movq %r9,%rax RET @@ -48,26 +43,6 @@ EXPORT_SYMBOL(__memset) SYM_FUNC_ALIAS(memset, __memset) EXPORT_SYMBOL(memset) -/* - * ISO C memset - set a memory block to a byte value. This function uses - * enhanced rep stosb to override the fast string function. - * The code is simpler and shorter than the fast string function as well. - * - * rdi destination - * rsi value (char) - * rdx count (bytes) - * - * rax original destination - */ -SYM_FUNC_START_LOCAL(memset_erms) - movq %rdi,%r9 - movb %sil,%al - movq %rdx,%rcx - rep stosb - movq %r9,%rax - RET -SYM_FUNC_END(memset_erms) - SYM_FUNC_START_LOCAL(memset_orig) movq %rdi,%r10 |