summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* cpumask: re-introduce constant-sized cpumask optimizationsLinus Torvalds2023-03-051-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted in the cpumask operations potentially becoming hugely less efficient, because suddenly the cpumask was always considered to be variable-sized. The optimization was then later added back in a limited form by commit 6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that FORCE_NR_CPUS option is not useful in a generic kernel and more of a special case for embedded situations with fixed hardware. Instead, just re-introduce the optimization, with some changes. Instead of depending on CPUMASK_OFFSTACK being false, and then always using the full constant cpumask width, this introduces three different cpumask "sizes": - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids. This is used for situations where we should use the exact size. - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it fits in a single word and the bitmap operations thus end up able to trigger the "small_const_nbits()" optimizations. This is used for the operations that have optimized single-word cases that get inlined, notably the bit find and scanning functions. - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it is an sufficiently small constant that makes simple "copy" and "clear" operations more efficient. This is arbitrarily set at four words or less. As a an example of this situation, without this fixed size optimization, cpumask_clear() will generate code like movl nr_cpu_ids(%rip), %edx addq $63, %rdx shrq $3, %rdx andl $-8, %edx callq memset@PLT on x86-64, because it would calculate the "exact" number of longwords that need to be cleared. In contrast, with this patch, using a MAX_CPU of 64 (which is quite a reasonable value to use), the above becomes a single movq $0,cpumask instruction instead, because instead of caring to figure out exactly how many CPU's the system has, it just knows that the cpumask will be a single word and can just clear it all. Note that this does end up tightening the rules a bit from the original version in another way: operations that set bits in the cpumask are now limited to the actual nr_cpu_ids limit, whereas we used to do the nr_cpumask_bits thing almost everywhere in the cpumask code. But if you just clear bits, or scan for bits, we can use the simpler compile-time constants. In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()' which were not useful, and which fundamentally have to be limited to 'nr_cpu_ids'. Better remove them now than have somebody introduce use of them later. Of course, on x86-64 with MAXSMP there is no sane small compile-time constant for the cpumask sizes, and we end up using the actual CPU bits, and will generate the above kind of horrors regardless. Please don't use MAXSMP unless you really expect to have machines with thousands of cores. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'x86-urgent-2023-03-05' of ↵Linus Torvalds2023-03-051-7/+18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Thomas Gleixner: "A small set of updates for x86: - Return -EIO instead of success when the certificate buffer for SEV guests is not large enough - Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared on return to userspace for performance reasons, but the leaves user space vulnerable to cross-thread attacks which STIBP prevents. Update the documentation accordingly" * tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: virt/sev-guest: Return -EIO if certificate buffer is not large enough Documentation/hw-vuln: Document the interaction between IBRS and STIBP x86/speculation: Allow enabling STIBP with legacy IBRS
| * x86/speculation: Allow enabling STIBP with legacy IBRSKP Singh2023-02-271-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When plain IBRS is enabled (not enhanced IBRS), the logic in spectre_v2_user_select_mitigation() determines that STIBP is not needed. The IBRS bit implicitly protects against cross-thread branch target injection. However, with legacy IBRS, the IBRS bit is cleared on returning to userspace for performance reasons which leaves userspace threads vulnerable to cross-thread branch target injection against which STIBP protects. Exclude IBRS from the spectre_v2_in_ibrs_mode() check to allow for enabling STIBP (through seccomp/prctl() by default or always-on, if selected by spectre_v2_user kernel cmdline parameter). [ bp: Massage. ] Fixes: 7c693f54c873 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS") Reported-by: José Oliveira <joseloliveira11@gmail.com> Reported-by: Rodrigo Branco <rodrigo@kernelhacking.com> Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230220120127.1975241-1-kpsingh@kernel.org Link: https://lore.kernel.org/r/20230221184908.2349578-1-kpsingh@kernel.org
* | Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2023-03-0511-11/+48
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull VM_FAULT_RETRY fixes from Al Viro: "Some of the page fault handlers do not deal with the following case correctly: - handle_mm_fault() has returned VM_FAULT_RETRY - there is a pending fatal signal - fault had happened in kernel mode Correct action in such case is not "return unconditionally" - fatal signals are handled only upon return to userland and something like copy_to_user() would end up retrying the faulting instruction and triggering the same fault again and again. What we need to do in such case is to make the caller to treat that as failed uaccess attempt - handle exception if there is an exception handler for faulting instruction or oops if there isn't one. Over the years some architectures had been fixed and now are handling that case properly; some still do not. This series should fix the remaining ones. Status: - m68k, riscv, hexagon, parisc: tested/acked by maintainers. - alpha, sparc32, sparc64: tested locally - bug has been reproduced on the unpatched kernel and verified to be fixed by this series. - ia64, microblaze, nios2, openrisc: build, but otherwise completely untested" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: openrisc: fix livelock in uaccess nios2: fix livelock in uaccess microblaze: fix livelock in uaccess ia64: fix livelock in uaccess sparc: fix livelock in uaccess alpha: fix livelock in uaccess parisc: fix livelock in uaccess hexagon: fix livelock in uaccess riscv: fix livelock in uaccess m68k: fix livelock in uaccess
| * | openrisc: fix livelock in uaccessAl Viro2023-03-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | openrisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | nios2: fix livelock in uaccessAl Viro2023-03-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nios2 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | microblaze: fix livelock in uaccessAl Viro2023-03-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | microblaze equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | ia64: fix livelock in uaccessAl Viro2023-03-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ia64 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | sparc: fix livelock in uaccessAl Viro2023-03-022-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sparc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | alpha: fix livelock in uaccessAl Viro2023-03-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | alpha equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | parisc: fix livelock in uaccessAl Viro2023-03-021-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | hexagon: fix livelock in uaccessAl Viro2023-03-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hexagon equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Acked-by: Brian Cain <bcain@quicinc.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | riscv: fix livelock in uaccessAl Viro2023-03-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Tested-by: Björn Töpel <bjorn@kernel.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | m68k: fix livelock in uaccessAl Viro2023-03-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Tested-by: Finn Thain <fthain@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Remove Intel compiler supportMasahiro Yamada2023-03-053-172/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | include/linux/compiler-intel.h had no update in the past 3 years. We often forget about the third C compiler to build the kernel. For example, commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO") only mentioned GCC and Clang. init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC, and nobody has reported any issue. I guess the Intel Compiler support is broken, and nobody is caring about it. Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is deprecated: $ icc -v icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message. icc version 2021.7.0 (gcc version 12.1.0 compatibility) Arnd Bergmann provided a link to the article, "Intel C/C++ compilers complete adoption of LLVM". lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept untouched for better sync with https://github.com/facebook/zstd Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'mm-hotfixes-stable-2023-03-04-13-12' of ↵Linus Torvalds2023-03-041-19/+0
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. Eight are for MM and seven are for other parts of the kernel. Seven are cc:stable and eight address post-6.3 issues or were judged unsuitable for -stable backporting" * tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: map Dikshita Agarwal's old address to his current one mailmap: map Vikash Garodia's old address to his current one fs/cramfs/inode.c: initialize file_ra_state fs: hfsplus: fix UAF issue in hfsplus_put_super panic: fix the panic_print NMI backtrace setting lib: parser: update documentation for match_NUMBER functions kasan, x86: don't rename memintrinsics in uninstrumented files kasan: test: fix test for new meminstrinsic instrumentation kasan: treat meminstrinsic as builtins in uninstrumented files kasan: emit different calls for instrumentable memintrinsics ocfs2: fix non-auto defrag path not working issue ocfs2: fix defrag path triggering jbd2 ASSERT mailmap: map Georgi Djakov's old Linaro address to his current one mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH mm/damon/paddr: fix missing folio_put() mm/mremap: fix dup_anon_vma() in vma_merge() case 4
| * | | kasan, x86: don't rename memintrinsics in uninstrumented filesMarco Elver2023-03-021-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that memcpy/memset/memmove are no longer overridden by KASAN, we can just use the normal symbol names in uninstrumented files. Drop the preprocessor redefinitions. Link: https://lkml.kernel.org/r/20230224085942.1791837-4-elver@google.com Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linux Kernel Functional Testing <lkft@linaro.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | | Merge tag 'powerpc-6.3-2' of ↵Linus Torvalds2023-03-042-1/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Drop orphaned VAS MAINTAINERS entry - Fix build errors with clang and KCSAN - Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION together with recordmcount Thanks to Nathan Chancellor. * tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Avoid dead code/data elimination when using recordmcount powerpc/vmlinux.lds: Add .text.asan/tsan sections powerpc: Drop orphaned VAS MAINTAINERS entry
| * | | | powerpc: Avoid dead code/data elimination when using recordmcountMichael Ellerman2023-02-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although powerpc now has objtool mcount support, it's not enabled in all configurations due to dependencies. On those configurations, with some linkers (binutils 2.37 at least), it's still possible to hit the dreaded "recordmcount bug", eg. errors such as: CC kernel/kexec_file.o Cannot find symbol for section 10: .text.unlikely. kernel/kexec_file.o: failed make[1]: *** [scripts/Makefile.build:287 : kernel/kexec_file.o] Error 1 Those errors are much more prevalent when building with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, because it places every function in a separate section. CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is marked experimental and is not enabled in any powerpc defconfigs or by major distros. Although it does have at least some users on 32-bit where kernel size tends to be more important. Avoid the build errors by blocking CONFIG_LD_DEAD_CODE_DATA_ELIMINATION when the build is using recordmcount, rather than objtool. In practice that means for 64-bit big endian builds, or 64-bit clang builds - both because they lack CONFIG_MPROFILE_KERNEL. On 32-bit objtool is always used, so CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is still available there. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230221130331.2714199-1-mpe@ellerman.id.au
| * | | | powerpc/vmlinux.lds: Add .text.asan/tsan sectionsMichael Ellerman2023-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When KASAN/KCSAN are enabled clang generates .text.asan/tsan sections. Because they are not mentioned in the linker script warnings are generated, and when orphan handling is set to error that becomes a build error, eg: ld.lld: error: vmlinux.a(init/main.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' ld.lld: error: vmlinux.a(init/version.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' Fix it by adding the sections to our linker script, similar to the generic change made in 848378812e40 ("vmlinux.lds.h: Handle clang's module.{c,d}tor sections"). Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230222060037.2897169-1-mpe@ellerman.id.au
* | | | | Merge tag 's390-6.3-2' of ↵Linus Torvalds2023-03-0312-99/+128
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - Add empty command line parameter handling stubs to kernel for all command line parameters which are handled in the decompressor. This avoids invalid "Unknown kernel command line parameters" messages from the kernel, and also avoids that these will be incorrectly passed to user space. This caused already confusion, therefore add the empty stubs - Add missing phys_to_virt() handling to machine check handler - Introduce and use a union to be used for zcrypt inline assemblies. This makes sure that only a register wide member of the union is passed as input and output parameter to inline assemblies, while usual C code uses other members of the union to access bit fields of it - Add and use a READ_ONCE_ALIGNED_128() macro, which can be used to atomically read a 128-bit value from memory. This replaces the (mis-)use of the 128-bit cmpxchg operation to do the same in cpum_sf code. Currently gcc does not generate the used lpq instruction if __READ_ONCE() is used for aligned 128-bit accesses, therefore use this s390 specific helper - Simplify machine check handler code if a task needs to be killed because of e.g. register corruption due to a machine malfunction - Perform CPU reset to clear pending interrupts and TLB entries on an already stopped target CPU before delegating work to it - Generate arch/s390/boot/vmlinux.map link map for the decompressor, when CONFIG_VMLINUX_MAP is enabled for debugging purposes - Fix segment type handling for dcssblk devices. It incorrectly always returned type "READ/WRITE" even for read-only segements, which can result in a kernel panic if somebody tries to write to a read-only device - Sort config S390 select list again - Fix two kprobe reenter bugs revealed by a recently added kprobe kunit test * tag 's390-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/kprobes: fix current_kprobe never cleared after kprobes reenter s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handler s390/Kconfig: sort config S390 select list again s390/extmem: return correct segment type in __segment_load() s390/decompressor: add link map saving s390/smp: perform cpu reset before delegating work to target cpu s390/mcck: cleanup user process termination path s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg s390/rwonce: add READ_ONCE_ALIGNED_128() macro s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg union s390/nmi: fix virtual-physical address confusion s390/setup: do not complain about parameters handled in decompressor
| * | | | | s390/kprobes: fix current_kprobe never cleared after kprobes reenterVasily Gorbik2023-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent test_kprobe_missed kprobes kunit test uncovers the following problem. Once kprobe is triggered from another kprobe (kprobe reenter), all future kprobes on this cpu are considered as kprobe reenter, thus pre_handler and post_handler are not being called and kprobes are counted as "missed". Commit b9599798f953 ("[S390] kprobes: activation and deactivation") introduced a simpler scheme for kprobes (de)activation and status tracking by using push_kprobe/pop_kprobe, which supposed to work for both initial kprobe entry as well as kprobe reentry and helps to avoid handling those two cases differently. The problem is that a sequence of calls in case of kprobes reenter: push_kprobe() <- NULL (current_kprobe) push_kprobe() <- kprobe1 (current_kprobe) pop_kprobe() -> kprobe1 (current_kprobe) pop_kprobe() -> kprobe1 (current_kprobe) leaves "kprobe1" as "current_kprobe" on this cpu, instead of setting it to NULL. In fact push_kprobe/pop_kprobe can only store a single state (there is just one prev_kprobe in kprobe_ctlblk). Which is a hack but sufficient, there is no need to have another prev_kprobe just to store NULL. To make a simple and backportable fix simply reset "prev_kprobe" when kprobe is poped from this "stack". No need to worry about "kprobe_status" in this case, because its value is only checked when current_kprobe != NULL. Cc: stable@vger.kernel.org Fixes: b9599798f953 ("[S390] kprobes: activation and deactivation") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handlerVasily Gorbik2023-03-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent test_kprobe_missed kprobes kunit test uncovers the following error (reported when CONFIG_DEBUG_ATOMIC_SLEEP is enabled): BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 662, name: kunit_try_catch preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 no locks held by kunit_try_catch/662. irq event stamp: 280 hardirqs last enabled at (279): [<00000003e60a3d42>] __do_pgm_check+0x17a/0x1c0 hardirqs last disabled at (280): [<00000003e3bd774a>] kprobe_exceptions_notify+0x27a/0x318 softirqs last enabled at (0): [<00000003e3c5c890>] copy_process+0x14a8/0x4c80 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 46 PID: 662 Comm: kunit_try_catch Tainted: G N 6.2.0-173644-g44c18d77f0c0 #2 Hardware name: IBM 3931 A01 704 (LPAR) Call Trace: [<00000003e60a3a00>] dump_stack_lvl+0x120/0x198 [<00000003e3d02e82>] __might_resched+0x60a/0x668 [<00000003e60b9908>] __mutex_lock+0xc0/0x14e0 [<00000003e60bad5a>] mutex_lock_nested+0x32/0x40 [<00000003e3f7b460>] unregister_kprobe+0x30/0xd8 [<00000003e51b2602>] test_kprobe_missed+0xf2/0x268 [<00000003e51b5406>] kunit_try_run_case+0x10e/0x290 [<00000003e51b7dfa>] kunit_generic_run_threadfn_adapter+0x62/0xb8 [<00000003e3ce30f8>] kthread+0x2d0/0x398 [<00000003e3b96afa>] __ret_from_fork+0x8a/0xe8 [<00000003e60ccada>] ret_from_fork+0xa/0x40 The reason for this error report is that kprobes handling code failed to restore irqs. The problem is that when kprobe is triggered from another kprobe post_handler current sequence of enable_singlestep / disable_singlestep is the following: enable_singlestep <- original kprobe (saves kprobe_saved_imask) enable_singlestep <- kprobe triggered from post_handler (clobbers kprobe_saved_imask) disable_singlestep <- kprobe triggered from post_handler (restores kprobe_saved_imask) disable_singlestep <- original kprobe (restores wrong clobbered kprobe_saved_imask) There is just one kprobe_ctlblk per cpu and both calls saves and loads irq mask to kprobe_saved_imask. To fix the problem simply move resume_execution (which calls disable_singlestep) before calling post_handler. This also fixes the problem that post_handler is called with pt_regs which were not yet adjusted after single-stepping. Cc: stable@vger.kernel.org Fixes: 4ba069b802c2 ("[S390] add kprobes support.") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/Kconfig: sort config S390 select list againHeiko Carstens2023-03-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep the config S390 select list sorted. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/extmem: return correct segment type in __segment_load()Gerald Schaefer2023-03-011-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f05f62d04271f ("s390/vmem: get rid of memory segment list") reshuffled the call to vmem_add_mapping() in __segment_load(), which now overwrites rc after it was set to contain the segment type code. As result, __segment_load() will now always return 0 on success, which corresponds to the segment type code SEG_TYPE_SW, i.e. a writeable segment. This results in a kernel crash when loading a read-only segment as dcssblk block device, and trying to write to it. Instead of reshuffling code again, make sure to return the segment type on success, and also describe this rather delicate and unexpected logic in the function comment. Also initialize new segtype variable with invalid value, to prevent possible future confusion. Fixes: f05f62d04271 ("s390/vmem: get rid of memory segment list") Cc: <stable@vger.kernel.org> # 5.9+ Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/decompressor: add link map savingVasily Gorbik2023-02-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Produce arch/s390/boot/vmlinux.map link map for the decompressor, when CONFIG_VMLINUX_MAP option is enabled. Link map is quite useful during making kernel changes related to how the decompressor is composed and debugging linker scripts. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/smp: perform cpu reset before delegating work to target cpuHeiko Carstens2023-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear CPU state (e.g. all TLB entries, prefetched instructions, etc.) of the target CPU, however without clearing register contents before starting any work on it. This puts the target CPU in a more defined state compared to the current Stop + Restart sigp orders. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/mcck: cleanup user process termination pathAlexander Gordeev2023-02-284-32/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a machine check interrupt hits while user process is running __s390_handle_mcck() helper function is called directly from the interrupt handler and terminates the current process by calling make_task_dead() routine. The make_task_dead() is not allowed to be called from interrupt context which forces the machine check handler switch to the kernel stack and enable local interrupts first. The __s390_handle_mcck() could also be called to service pending work, but this time from the external interrupts handler. It is the machine check handler that establishes the work and schedules the external interrupt, therefore the machine check interrupt itself should be disabled while reading out the corresponding variable: local_mcck_disable(); mcck = *this_cpu_ptr(&cpu_mcck); memset(this_cpu_ptr(&cpu_mcck), 0, sizeof(mcck)); local_mcck_enable(); However, local_mcck_disable() does not have effect when __s390_handle_mcck() is called directly form the machine check handler, since the machine check interrupt is still disabled. Therefore, it is not the opening bracket to the following local_mcck_enable() function. Simplify the user process termination flow by scheduling the external interrupt and killing the affected process from the interrupt context. Assume a kernel-generated signal is always delivered and ignore a value returned by do_send_sig_info() funciton. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchgHeiko Carstens2023-02-281-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use READ_ONCE_ALIGNED_128() to read the previous value in front of a 128-bit cmpxchg loop, instead of (mis-)using a 128-bit cmpxchg operation to do the same. This makes the code more readable and is faster. Link: https://lore.kernel.org/r/20230224100237.3247871-3-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/rwonce: add READ_ONCE_ALIGNED_128() macroHeiko Carstens2023-02-281-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for fast block concurrent (atomic) 128-bit accesses. The used lpq instruction requires 128-bit alignment. This is also the reason why the compiler doesn't emit this instruction if __READ_ONCE() is used for 128-bit accesses. Link: https://lore.kernel.org/r/20230224100237.3247871-2-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg unionHarald Freudenberger2023-02-271-48/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new ap queue status register wrapper union to access register wide values. So the inline assembler only sees register wide values but the surrounding code may use a more structured view of the same value and a reader of the code (and the compiler) gets a clear understanding about the mapping between fields and register values. All the changes to access the ap queue status are local to the inline functions within ap.h. However, the struct ap_qirq_ctrl has been replaces by a union for same reason and this needed slight adaptions in the calling code. Suggested-by: Halil Pasic <pasic@linux.ibm.com> Suggested-by: Andreas Arnez <arnez@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/nmi: fix virtual-physical address confusionNico Boehr2023-02-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a machine check is received while in SIE, it is reinjected into the guest in some cases. The respective code needs to access the sie_block, which is taken from the backed up R14. Since reinjection only occurs while we are in SIE (i.e. between the labels sie_entry and sie_leave in entry.S and thus if CIF_MCCK_GUEST is set), the backed up R14 will always contain a physical address in s390_backup_mcck_info. This currently works, because virtual and physical addresses are the same. Add phys_to_virt() to resolve the virtual-physical confusion. Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20230216121208.4390-2-nrb@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | s390/setup: do not complain about parameters handled in decompressorVasily Gorbik2023-02-271-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there are several kernel command line parameters which are only parsed and handled in decompressor and not known to the kernel. This leads to the following error message during kernel boot: Unknown kernel command line parameters "mem=3G nokaslr", will be passed to user space. To avoid confusion, register those parameters with an empty stub so that kernel does not complain about them. Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* | | | | | Merge tag 'riscv-for-linus-6.3-mw2' of ↵Linus Torvalds2023-03-034-17/+27
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Some cleanups and fixes for the Zbb-optimized string routines - Support for custom (vendor or implementation defined) perf events - COMMAND_LINE_SIZE has been increased to 1024 * tag 'riscv-for-linus-6.3-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Bump COMMAND_LINE_SIZE value to 1024 drivers/perf: RISC-V: Allow programming custom firmware events riscv, lib: Fix Zbb strncmp RISC-V: improve string-function assembly
| * | | | | | riscv: Bump COMMAND_LINE_SIZE value to 1024Alexandre Ghiti2023-03-011-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase COMMAND_LINE_SIZE as the current default value is too low for syzbot kernel command line. There has been considerable discussion on this patch that has led to a larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all ports. That's not quite done yet, but it's gotten far enough we're confident this is not a uABI change so this is safe. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Link: https://lore.kernel.org/r/20210316193420.904-1-alex@ghiti.fr [Palmer: it's not uabi] Link: https://lore.kernel.org/linux-riscv/874b8076-b0d1-4aaa-bcd8-05d523060152@app.fastmail.com/#t Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | riscv, lib: Fix Zbb strncmpBjörn Töpel2023-02-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Zbb optimized strncmp has two parts; a fast path that does XLEN/8B per iteration, and a slow that does one byte per iteration. The idea is to compare aligned XLEN chunks for most of strings, and do the remainder tail in the slow path. The Zbb strncmp has two issues in the fast path: Incorrect remainder handling (wrong compare): Assume that the string length is 9. On 64b systems, the fast path should do one iteration, and one iteration in the slow path. Instead, both were done in the fast path, which lead to incorrect results. An example: strncmp("/dev/vda", "/dev/", 5); Correct by changing "bgt" to "bge". Missing NULL checks in the second string: This could lead to incorrect results for: strncmp("/dev/vda", "/dev/vda\0", 8); Correct by adding an additional check. Fixes: b6fcdb191e36 ("RISC-V: add zbb support to string functions") Suggested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20230228184211.1585641-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | RISC-V: improve string-function assemblyHeiko Stuebner2023-02-283-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adapt the suggestions for the assembly string functions that Andrew suggested but that I didn't manage to include into the series that got applied. This includes improvements to two comments, removal of unneeded labels and moving one instruction slightly higher to contradict an explanatory comment. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230208225328.1636017-3-heiko@sntech.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* | | | | | | Merge tag 'arm64-fixes' of ↵Linus Torvalds2023-03-028-13/+22
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - In copy_highpage(), only reset the tag of the destination pointer if KASAN_HW_TAGS is enabled so that user-space MTE does not interfere with KASAN_SW_TAGS (which relies on top-byte-ignore). - Remove warning if SME is detected without SVE, the kernel can cope with such configuration (though none in the field currently). - In cfi_handler(), pass the ESR_EL1 value to die() for consistency with other die() callers. - Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP on arm64 since the pte manipulation from the generic vmemmap_remap_pte() does not follow the required ARM break-before-make sequence (clear the pte, flush the TLBs, set the new pte). It may be re-enabled once this sequence is sorted. - Fix possible memory leak in the arm64 ACPI code if the SMCCC version and conduit checks fail. - Forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE since gcc ignores -falign-functions=N with -Os. - Don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN as no randomisation would actually take place. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE arm64: acpi: Fix possible memory leak of ffh_ctxt arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP arm64: pass ESR_ELx to die() of cfi_handler arm64/fpsimd: Remove warning for SME without SVE arm64: Reset KASAN tag in copy_highpage with HW tags only
| * | | | | | | arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGNArd Biesheuvel2023-02-283-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our virtual KASLR displacement is a randomly chosen multiple of 2 MiB plus an offset that is equal to the physical placement modulo 2 MiB. This arrangement ensures that we can always use 2 MiB block mappings (or contiguous PTE mappings for 16k or 64k pages) to map the kernel. This means that a KASLR offset of less than 2 MiB is simply the product of this physical displacement, and no randomization has actually taken place. Currently, we use 'kaslr_offset() > 0' to decide whether or not randomization has occurred, and so we misidentify this case. If the kernel image placement is not randomized, modules are allocated from a dedicated region below the kernel mapping, which is only used for modules and not for other vmalloc() or vmap() calls. When randomization is enabled, the kernel image is vmap()'ed randomly inside the vmalloc region, and modules are allocated in the vicinity of this mapping to ensure that relative references are always in range. However, unlike the dedicated module region below the vmalloc region, this region is not reserved exclusively for modules, and so ordinary vmalloc() calls may end up overlapping with it. This should rarely happen, given that vmalloc allocates bottom up, although it cannot be ruled out entirely. The misidentified case results in a placement of the kernel image within 2 MiB of its default address. However, the logic that randomizes the module region is still invoked, and this could result in the module region overlapping with the start of the vmalloc region, instead of using the dedicated region below it. If this happens, a single large vmalloc() or vmap() call will use up the entire region, and leave no space for loading modules after that. Since commit 82046702e288 ("efi/libstub/arm64: Replace 'preferred' offset with alignment check"), this is much more likely to occur on systems that boot via EFI but lack an implementation of the EFI RNG protocol, as in that case, the EFI stub will decide to leave the image where it found it, and the EFI firmware uses 64k alignment only. Fix this, by correctly identifying the case where the virtual displacement is a result of the physical displacement only. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230223204101.1500373-1-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZEMark Rutland2023-02-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Florian reports that when building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, he sees "Misaligned patch-site" warnings at boot, e.g. | Misaligned patch-site bcm2836_arm_irqchip_handle_irq+0x0/0x88 | WARNING: CPU: 0 PID: 0 at arch/arm64/kernel/ftrace.c:120 ftrace_call_adjust+0x4c/0x70 This is because GCC will silently ignore `-falign-functions=N` when passed `-Os`, resulting in functions not being aligned as we expect. This is a known issue, and to account for this we modified the kernel to avoid `-Os` generally. Unfortunately we forgot to account for CONFIG_CC_OPTIMIZE_FOR_SIZE. Forbid the use of CALL_OPS with CONFIG_CC_OPTIMIZE_FOR_SIZE=y to prevent this issue. All exising ftrace features will work as before, though without the performance benefit of CALL_OPS. Reported-by: Florian Fainelli <f.fainelli@gmail.com> Link: http://lore.kernel.org/linux-arm-kernel/2d9284c3-3805-402b-5423-520ced56d047@gmail.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Stefan Wahren <stefan.wahren@i2se.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230227115819.365630-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | arm64: acpi: Fix possible memory leak of ffh_ctxtSudeep Holla2023-02-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocated 'ffh_ctxt' memory leak is possible if the SMCCC version and conduit checks fail and -EOPNOTSUPP is returned without freeing the allocated memory. Fix the same by moving the allocation after the SMCCC version and conduit checks. Fixes: 1d280ce099db ("arm64: Add architecture specific ACPI FFH Opregion callbacks") Cc: <stable@vger.kernel.org> # 6.2.x Cc: Will Deacon <will@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Suggested-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202302191417.dAl9NuE8-lkp@intel.com/ Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20230223135742.2952091-1-sudeep.holla@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAPCatalin Marinas2023-02-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert the HUGETLB_PAGE_FREE_VMEMMAP selection from commit 1e63ac088f20 ("arm64: mm: hugetlb: enable HUGETLB_PAGE_FREE_VMEMMAP for arm64") but keep the flush_dcache_page() compound_head() change as it aligns with the corresponding check in the __sync_icache_dcache() function. The original config option was renamed in commit 47010c040dec ("mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*") to HUGETLB_PAGE_OPTIMIZE_VMEMMAP and the flush_dcache_page() check was further simplified by commit 2da1c30929a2 ("mm: hugetlb_vmemmap: delete hugetlb_optimize_vmemmap_enabled()"). The reason for the revert is that the generic vmemmap_remap_pte() function changes both the permissions (writeable to read-only) and the output address (pfn) of the vmemmap ptes. This is deemed UNPREDICTABLE by the Arm architecture without a break-before-make sequence (make the PTE invalid, TLBI, write the new valid PTE). However, such sequence is not possible since the vmemmap may be concurrently accessed by the kernel. Disable the optimisation until a better solution is found. Fixes: 1e63ac088f20 ("arm64: mm: hugetlb: enable HUGETLB_PAGE_FREE_VMEMMAP for arm64") Cc: <stable@vger.kernel.org> # 5.19.x Cc: Muchun Song <muchun.song@linux.dev> Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/Y9pZALdn3pKiJUeQ@arm.com Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20230222175232.540851-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | arm64: pass ESR_ELx to die() of cfi_handlerSangmoon Kim2023-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0f2cb928a154 ("arm64: consistently pass ESR_ELx to die()") caused all callers to pass the ESR_ELx value to die(). For consistency, this patch also adds esr to die() call of cfi_handler. Also, when CFI error occurs, die handlers can use ESR_ELx value. Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230220073441.2753-1-sangmoon.kim@samsung.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | arm64/fpsimd: Remove warning for SME without SVEMark Brown2023-02-221-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support for SME without SVE is architecturally valid and has now been tested well enough so let's remove the warning message that is displayed at boot. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230209-arm64-sme-no-sve-v1-1-74eb3df2f878@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | | | | arm64: Reset KASAN tag in copy_highpage with HW tags onlyPeter Collingbourne2023-02-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During page migration, the copy_highpage function is used to copy the page data to the target page. If the source page is a userspace page with MTE tags, the KASAN tag of the target page must have the match-all tag in order to avoid tag check faults during subsequent accesses to the page by the kernel. However, the target page may have been allocated in a number of ways, some of which will use the KASAN allocator and will therefore end up setting the KASAN tag to a non-match-all tag. Therefore, update the target page's KASAN tag to match the source page. We ended up unintentionally fixing this issue as a result of a bad merge conflict resolution between commit e059853d14ca ("arm64: mte: Fix/clarify the PG_mte_tagged semantics") and commit 20794545c146 ("arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags""), which preserved a tag reset for PG_mte_tagged pages which was considered to be unnecessary at the time. Because SW tags KASAN uses separate tag storage, update the code to only reset the tags when HW tags KASAN is enabled. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/If303d8a709438d3ff5af5fd85706505830f52e0c Reported-by: "Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com> Cc: <stable@vger.kernel.org> # 6.1 Fixes: 20794545c146 ("arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"") Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20230215050911.1433132-1-pcc@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | | | | | | | Merge tag 'mips_6.3_1' of ↵Linus Torvalds2023-03-028-31/+31
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull more MIPS updates from Thomas Bogendoerfer: "A few more cleanups and fixes" * tag 'mips_6.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Workaround clang inline compat branch issue mips: dts: ralink: mt7621: add phandle to system controller node for watchdog mips: dts: ralink: mt7621: rename watchdog node from 'wdt' into 'watchdog' mips: ralink: make SOC_MT7621 select PINCTRL mips: remove SYS_HAS_CPU_MIPS32_R1 from RALINK MIPS: cevt-r4k: Offset the value used to clear compare interrupt MIPS: smp-cps: Don't rely on CP0_CMGCRBASE MIPS: Remove DMA_PERDEV_COHERENT
| * | | | | | | | MIPS: Workaround clang inline compat branch issueJiaxun Yang2023-02-282-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clang is unable to handle the situation that a chunk of inline assembly ends with a compat branch instruction and then compiler generates another control transfer instruction immediately after this compat branch. The later instruction will end up in forbidden slot and cause exception. Workaround by add a option to control the use of compact branch. Currently it's selected by CC_IS_CLANG and hopefully we can change it to a version check in future if clang manages to fix it. Fix boot on boston board. Link: https://github.com/llvm/llvm-project/issues/61045 Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Acked-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | | | | | mips: dts: ralink: mt7621: add phandle to system controller node for watchdogSergio Paracuellos2023-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To allow to access system controller registers from watchdog driver code add a phandle in the watchdog 'wdt' node. This avoid using arch dependent operations in driver code. Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | | | | | mips: dts: ralink: mt7621: rename watchdog node from 'wdt' into 'watchdog'Sergio Paracuellos2023-02-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Watchdog nodes must use 'watchdog' for node name. When a 'make dtbs_check' is performed the following warning appears: wdt@100: $nodename:0: 'wdt@100' does not match '^watchdog(@.*|-[0-9a-f])?$' Fix this warning up properly renaming the node into 'watchdog'. Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | | | | | mips: ralink: make SOC_MT7621 select PINCTRLArınç ÜNAL2023-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, out of every Ralink SoC, only the dt-binding of the MT7621 SoC uses pinctrl. Because of this, PINCTRL is not selected at all. Make SOC_MT7621 select PINCTRL. Remove PINCTRL_MT7621, enabling it for the MT7621 SoC will be handled under the PINCTRL_MT7621 option. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>