summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/include/asm
Commit message (Collapse)AuthorAgeFilesLines
* KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs ↵Nicholas Piggin2021-09-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | are live [ Upstream commit 1782663897945a5cf28e564ba5eed730098e9aa4 ] After the L1 saves its PMU SPRs but before loading the L2's PMU SPRs, switch the pmcregs_in_use field in the L1 lppaca to the value advertised by the L2 in its VPA. On the way out of the L2, set it back after saving the L2 PMU registers (if they were in-use). This transfers the PMU liveness indication between the L1 and L2 at the points where the registers are not live. This fixes the nested HV bug for which a workaround was added to the L0 HV by commit 63279eeb7f93a ("KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting"), which explains the problem in detail. That workaround is no longer required for guests that include this bug fix. Fixes: 360cae313702 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Link: https://lore.kernel.org/r/20210811160134.904987-10-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/ps3: Add dma_mask to ps3_dma_regionGeoff Levand2021-07-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9733862e50fdba55e7f1554e4286fcc5302ff28e ] Commit f959dcd6ddfd29235030e8026471ac1b022ad2b0 (dma-direct: Fix potential NULL pointer dereference) added a null check on the dma_mask pointer of the kernel's device structure. Add a dma_mask variable to the ps3_dma_region structure and set the device structure's dma_mask pointer to point to this new variable. Fixes runtime errors like these: # WARNING: Fixes tag on line 10 doesn't match correct format # WARNING: Fixes tag on line 10 doesn't match correct format ps3_system_bus_match:349: dev=8.0(sb_01), drv=8.0(ps3flash): match WARNING: CPU: 0 PID: 1 at kernel/dma/mapping.c:151 .dma_map_page_attrs+0x34/0x1e0 ps3flash sb_01: ps3stor_setup:193: map DMA region failed Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/562d0c9ea0100a30c3b186bcc7adb34b0bbd2cd7.1622746428.git.geoff@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/barrier: Avoid collision with clang's __lwsync macroNathan Chancellor2021-07-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 015d98149b326e0f1f02e44413112ca8b4330543 upstream. A change in clang 13 results in the __lwsync macro being defined as __builtin_ppc_lwsync, which emits 'lwsync' or 'msync' depending on what the target supports. This breaks the build because of -Werror in arch/powerpc, along with thousands of warnings: In file included from arch/powerpc/kernel/pmc.c:12: In file included from include/linux/bug.h:5: In file included from arch/powerpc/include/asm/bug.h:109: In file included from include/asm-generic/bug.h:20: In file included from include/linux/kernel.h:12: In file included from include/linux/bitops.h:32: In file included from arch/powerpc/include/asm/bitops.h:62: arch/powerpc/include/asm/barrier.h:49:9: error: '__lwsync' macro redefined [-Werror,-Wmacro-redefined] #define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") ^ <built-in>:308:9: note: previous definition is here #define __lwsync __builtin_ppc_lwsync ^ 1 error generated. Undefine this macro so that the runtime patching introduced by commit 2d1b2027626d ("powerpc: Fixup lwsync at runtime") continues to work properly with clang and the build no longer breaks. Cc: stable@vger.kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/1386 Link: https://github.com/llvm/llvm-project/commit/62b5df7fe2b3fda1772befeda15598fbef96a614 Link: https://lore.kernel.org/r/20210528182752.1852002-1-nathan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S HV: Fix TLB management on SMT8 POWER9 and POWER10 processorsSuraj Jitindar Singh2021-07-141-0/+30
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 77bbbc0cf84834ed130838f7ac1988567f4d0288 ] The POWER9 vCPU TLB management code assumes all threads in a core share a TLB, and that TLBIEL execued by one thread will invalidate TLBs for all threads. This is not the case for SMT8 capable POWER9 and POWER10 (big core) processors, where the TLB is split between groups of threads. This results in TLB multi-hits, random data corruption, etc. Fix this by introducing cpu_first_tlb_thread_sibling etc., to determine which siblings share TLBs, and use that in the guest TLB flushing code. [npiggin@gmail.com: add changelog and comment] Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210602040441.3984352-1-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/64s/syscall: Fix ptrace syscall info with scv syscallsNicholas Piggin2021-05-262-35/+52
| | | | | | | | | | | | | | | | | | | | | commit d72500f992849d31ebae8f821a023660ddd0dcc2 upstream. The scv implementation missed updating syscall return value and error value get/set functions to deal with the changed register ABI. This broke ptrace PTRACE_GET_SYSCALL_INFO as well as some kernel auditing and tracing functions. Fix. tools/testing/selftests/ptrace/get_syscall_info now passes when scv is used. Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Cc: stable@vger.kernel.org # v5.9+ Reported-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520111931.2597127-2-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocksNicholas Piggin2021-05-262-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2c8c89b95831f46a2fb31a8d0fef4601694023ce ] The paravit queued spinlock slow path adds itself to the queue then calls pv_wait to wait for the lock to become free. This is implemented by calling H_CONFER to donate cycles. When hcall tracing is enabled, this H_CONFER call can lead to a spin lock being taken in the tracing code, which will result in the lock to be taken again, which will also go to the slow path because it queues behind itself and so won't ever make progress. An example trace of a deadlock: __pv_queued_spin_lock_slowpath trace_clock_global ring_buffer_lock_reserve trace_event_buffer_lock_reserve trace_event_buffer_reserve trace_event_raw_event_hcall_exit __trace_hcall_exit plpar_hcall_norets_trace __pv_queued_spin_lock_slowpath trace_clock_global ring_buffer_lock_reserve trace_event_buffer_lock_reserve trace_event_buffer_reserve trace_event_raw_event_rcu_dyntick rcu_irq_exit irq_exit __do_irq call_do_irq do_IRQ hardware_interrupt_common_virt Fix this by introducing plpar_hcall_norets_notrace(), and using that to make SPLPAR virtual processor dispatching hcalls by the paravirt spinlock code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210508101455.1578318-2-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/64: Fix the definition of the fixmap areaChristophe Leroy2021-05-143-2/+16
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9ccba66d4d2aff9a3909aa77d57ea8b7cc166f3c ] At the time being, the fixmap area is defined at the top of the address space or just below KASAN. This definition is not valid for PPC64. For PPC64, use the top of the I/O space. Because of circular dependencies, it is not possible to include asm/fixmap.h in asm/book3s/64/pgtable.h , so define a fixed size AREA at the top of the I/O space for fixmap and ensure during build that the size is big enough. Fixes: 265c3491c4bc ("powerpc: Add support for GENERIC_EARLY_IOREMAP") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0d51620eacf036d683d1a3c41328f69adb601dc0.1618925560.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/smp: Reintroduce cpu_core_maskSrikar Dronamraju2021-05-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c47f892d7aa62765bf0689073f75990b4517a4cf ] Daniel reported that with Commit 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask") QEMU was unable to set single NUMA node SMP topologies such as: -smp 8,maxcpus=8,cores=2,threads=2,sockets=2 i.e he expected 2 sockets in one NUMA node. The above commit helped to reduce boot time on Large Systems for example 4096 vCPU single socket QEMU instance. PAPR is silent on having more than one socket within a NUMA node. cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the number of sockets is different from the number of NUMA nodes. One option is to reintroduce cpu_core_mask but use a slightly different method to arrive at the cpu_core_mask. Previously each CPU's chip-id would be compared with all other CPU's chip-id to verify if both the CPUs were related at the chip level. Now if a CPU 'A' is found related / (unrelated) to another CPU 'B', all the thread siblings of 'A' and thread siblings of 'B' are automatically marked as related / (unrelated). Also if a platform doesn't support ibm,chip-id property, i.e its cpu_to_chip_id returns -1, cpu_core_map holds a copy of cpu_cpu_mask(). Fixes: 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask") Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210415120934.232271-2-srikar@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/64s: Fix pte update for kernel memory on radixJordan Niethe2021-05-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b8b2f37cf632434456182e9002d63cbc4cccc50c ] When adding a PTE a ptesync is needed to order the update of the PTE with subsequent accesses otherwise a spurious fault may be raised. radix__set_pte_at() does not do this for performance gains. For non-kernel memory this is not an issue as any faults of this kind are corrected by the page fault handler. For kernel memory these faults are not handled. The current solution is that there is a ptesync in flush_cache_vmap() which should be called when mapping from the vmalloc region. However, map_kernel_page() does not call flush_cache_vmap(). This is troublesome in particular for code patching with Strict RWX on radix. In do_patch_instruction() the page frame that contains the instruction to be patched is mapped and then immediately patched. With no ordering or synchronization between setting up the PTE and writing to the page it is possible for faults. As the code patching is done using __put_user_asm_goto() the resulting fault is obscured - but using a normal store instead it can be seen: BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c Faulting instruction address: 0xc00000000008bd74 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: nop_module(PO+) [last unloaded: nop_module] CPU: 4 PID: 757 Comm: sh Tainted: P O 5.10.0-rc5-01361-ge3c1b78c8440-dirty #43 NIP: c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810 REGS: c000000016f634a0 TRAP: 0300 Tainted: P O (5.10.0-rc5-01361-ge3c1b78c8440-dirty) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44002884 XER: 00000000 CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1 This results in the kind of issue reported here: https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/ Chris Riedl suggested a reliable way to reproduce the issue: $ mount -t debugfs none /sys/kernel/debug $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) & Turning ftrace on and off does a large amount of code patching which in usually less then 5min will crash giving a trace like: ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000) ------------[ ftrace bug ]------------ ftrace failed to modify [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390 actual: 11:3b:47:4b Setting ftrace call site to call ftrace function ftrace record flags: 80000001 (1) expected tramp: c00000000006c96c ------------[ cut here ]------------ WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8 Modules linked in: nop_module(PO-) [last unloaded: nop_module] CPU: 4 PID: 809 Comm: sh Tainted: P O 5.10.0-rc5-01360-gf878ccaf250a #1 NIP: c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0 REGS: c000000004c8b760 TRAP: 0700 Tainted: P O (5.10.0-rc5-01360-gf878ccaf250a) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28008848 XER: 20040000 CFAR: c0000000001a9c98 IRQMASK: 0 GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022 GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8 GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118 GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000 GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008 GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8 GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020 GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0 NIP ftrace_bug+0x28c/0x2e8 LR ftrace_bug+0x288/0x2e8 Call Trace: ftrace_bug+0x288/0x2e8 (unreliable) ftrace_modify_all_code+0x168/0x210 arch_ftrace_update_code+0x18/0x30 ftrace_run_update_code+0x44/0xc0 ftrace_startup+0xf8/0x1c0 register_ftrace_function+0x4c/0xc0 function_trace_init+0x80/0xb0 tracing_set_tracer+0x2a4/0x4f0 tracing_set_trace_write+0xd4/0x130 vfs_write+0xf0/0x330 ksys_write+0x84/0x140 system_call_exception+0x14c/0x230 system_call_common+0xf0/0x27c To fix this when updating kernel memory PTEs using ptesync. Fixes: f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Tidy up change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/powernv: Enable HAIL (HV AIL) for ISA v3.1 processorsNicholas Piggin2021-05-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | commit 49c1d07fd04f54eb588c4a1dfcedc8d22c5ffd50 upstream. Starting with ISA v3.1, LPCR[AIL] no longer controls the interrupt mode for HV=1 interrupts. Instead, a new LPCR[HAIL] bit is defined which behaves like AIL=3 for HV interrupts when set. Set HAIL on bare metal to give us mmu-on interrupts and improve performance. This also fixes an scv bug: we don't implement scv real mode (AIL=0) vectors because they are at an inconvenient location, so we just disable scv support when AIL can not be set. However powernv assumes that LPCR[AIL] will enable AIL mode so it enables scv support despite HV interrupts being AIL=0, which causes scv interrupts to go off into the weeds. Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210402024124.545826-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/4xx: Fix build errors from mfdcr()Michael Ellerman2021-03-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit eead089311f4d935ab5d1d8fbb0c42ad44699ada ] lkp reported a build error in fsp2.o: CC arch/powerpc/platforms/44x/fsp2.o {standard input}:577: Error: unsupported relocation against base Which comes from: pr_err("GESR0: 0x%08x\n", mfdcr(base + PLB4OPB_GESR0)); Where our mfdcr() macro is stringifying "base + PLB4OPB_GESR0", and passing that to the assembler, which obviously doesn't work. The mfdcr() macro already checks that the argument is constant using __builtin_constant_p(), and if not calls the out-of-line version of mfdcr(). But in this case GCC is smart enough to notice that "base + PLB4OPB_GESR0" will be constant, even though it's not something we can immediately stringify into a register number. Segher pointed out that passing the register number to the inline asm as a constant would be better, and in fact it fixes the build error, presumably because it gives GCC a chance to resolve the value. While we're at it, change mtdcr() similarly. Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210218123058.748882-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc: Force inlining of cpu_has_feature() to avoid build failureChristophe Leroy2021-03-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | commit eed5fae00593ab9d261a0c1ffc1bdb786a87a55a upstream. The code relies on constant folding of cpu_has_feature() based on possible and always true values as defined per CPU_FTRS_ALWAYS and CPU_FTRS_POSSIBLE. Build failure is encountered with for instance book3e_all_defconfig on kisskb in the AMDGPU driver which uses cpu_has_feature(CPU_FTR_VSX_COMP) to decide whether calling kernel_enable_vsx() or not. The failure is due to cpu_has_feature() not being inlined with that configuration with gcc 4.9. In the same way as commit acdad8fb4a15 ("powerpc: Force inlining of mmu_has_feature to fix build failure"), for inlining of cpu_has_feature(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b231dfa040ce4cc37f702f5c3a595fdeabfe0462.1615378209.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()Christophe Leroy2021-03-171-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bd73758803c2eedc037c2268b65a19542a832594 upstream. Add stub instances of enable_kernel_vsx() and disable_kernel_vsx() when CONFIG_VSX is not set, to avoid following build failure. CC [M] drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o In file included from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services_types.h:29, from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services.h:37, from drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:27: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c: In function 'dcn_bw_apply_registry_override': ./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:64:3: error: implicit declaration of function 'enable_kernel_vsx'; did you mean 'enable_kernel_fp'? [-Werror=implicit-function-declaration] 64 | enable_kernel_vsx(); \ | ^~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:640:2: note: in expansion of macro 'DC_FP_START' 640 | DC_FP_START(); | ^~~~~~~~~~~ ./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:75:3: error: implicit declaration of function 'disable_kernel_vsx'; did you mean 'disable_kernel_fp'? [-Werror=implicit-function-declaration] 75 | disable_kernel_vsx(); \ | ^~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:676:2: note: in expansion of macro 'DC_FP_END' 676 | DC_FP_END(); | ^~~~~~~~~ cc1: some warnings being treated as errors make[5]: *** [drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o] Error 1 This works because the caller is checking if VSX is available using cpu_has_feature(): #define DC_FP_START() { \ if (cpu_has_feature(CPU_FTR_VSX_COMP)) { \ preempt_disable(); \ enable_kernel_vsx(); \ } else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) { \ preempt_disable(); \ enable_kernel_altivec(); \ } else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) { \ preempt_disable(); \ enable_kernel_fp(); \ } \ When CONFIG_VSX is not selected, cpu_has_feature(CPU_FTR_VSX_COMP) constant folds to 'false' so the call to enable_kernel_vsx() is discarded and the build succeeds. Fixes: 16a9dea110a6 ("amdgpu: Enable initial DCN support on POWER") Cc: stable@vger.kernel.org # v5.6+ Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Incorporate some discussion comments into the change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8d7d285a027e9d21f5ff7f850fa71a2655b0c4af.1615279170.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Fix inverted SET_FULL_REGS bitopNicholas Piggin2021-03-171-2/+2
| | | | | | | | | | | | | | | | | commit 73ac79881804eed2e9d76ecdd1018037f8510cb1 upstream. This bit operation was inverted and set the low bit rather than cleared it, breaking the ability to ptrace non-volatile GPRs after exec. Fix. Only affects 64e and 32-bit. Fixes: feb9df3462e6 ("powerpc/64s: Always has full regs, so remove remnant checks") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210308085530.3191843-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()Naveen N. Rao2021-03-171-1/+1
| | | | | | | | | | | | | | | | | commit cea15316ceee2d4a51dfdecd79e08a438135416c upstream. 'lis r2,N' is 'addis r2,0,N' and the instruction encoding in the macro LIS_R2 is incorrect (it currently maps to 'addis r0,r2,N'). Fix the same. Fixes: c71b7eff426f ("powerpc: Add ABIv2 support to ppc_function_entry") Cc: stable@vger.kernel.org # v3.16+ Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210304020411.16796-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/64: Fix stack trace not displaying final frameMichael Ellerman2021-03-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e3de1e291fa58a1ab0f471a4b458eff2514e4b5f ] In commit bf13718bc57a ("powerpc: show registers when unwinding interrupt frames") we changed our stack dumping logic to show the full registers whenever we find an interrupt frame on the stack. However we didn't notice that on 64-bit this doesn't show the final frame, ie. the interrupt that brought us in from userspace, whereas on 32-bit it does. That is due to confusion about the size of that last frame. The code in show_stack() calls validate_sp(), passing it STACK_INT_FRAME_SIZE to check the sp is at least that far below the top of the stack. However on 64-bit that size is too large for the final frame, because it includes the red zone, but we don't allocate a red zone for the first frame. So add a new define that encodes the correct size for 32-bit and 64-bit, and use it in show_stack(). This results in the full trace being shown on 64-bit, eg: sysrq: Trigger a crash Kernel panic - not syncing: sysrq triggered crash CPU: 0 PID: 83 Comm: sh Not tainted 5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty #649 Call Trace: [c00000000a1c3ac0] [c000000000897b70] dump_stack+0xc4/0x114 (unreliable) [c00000000a1c3b00] [c00000000014334c] panic+0x178/0x41c [c00000000a1c3ba0] [c00000000094e600] sysrq_handle_crash+0x40/0x50 [c00000000a1c3c00] [c00000000094ef98] __handle_sysrq+0xd8/0x210 [c00000000a1c3ca0] [c00000000094f820] write_sysrq_trigger+0x100/0x188 [c00000000a1c3ce0] [c0000000005559dc] proc_reg_write+0x10c/0x1b0 [c00000000a1c3d10] [c000000000479950] vfs_write+0xf0/0x360 [c00000000a1c3d60] [c000000000479d9c] ksys_write+0x7c/0x140 [c00000000a1c3db0] [c00000000002bf5c] system_call_exception+0x19c/0x2c0 [c00000000a1c3e10] [c00000000000d35c] system_call_common+0xec/0x278 --- interrupt: c00 at 0x7fff9fbab428 NIP: 00007fff9fbab428 LR: 000000001000b724 CTR: 0000000000000000 REGS: c00000000a1c3e80 TRAP: 0c00 Not tainted (5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty) MSR: 900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 22002884 XER: 00000000 IRQMASK: 0 GPR00: 0000000000000004 00007fffc3cb8960 00007fff9fc59900 0000000000000001 GPR04: 000000002a4b32d0 0000000000000002 0000000000000063 0000000000000063 GPR08: 000000002a4b32d0 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fff9fcca9a0 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000100b8fd0 GPR20: 000000002a4b3485 00000000100b8f90 0000000000000000 0000000000000000 GPR24: 000000002a4b0440 00000000100e77b8 0000000000000020 000000002a4b32d0 GPR28: 0000000000000001 0000000000000002 000000002a4b32d0 0000000000000001 NIP [00007fff9fbab428] 0x7fff9fbab428 LR [000000001000b724] 0x1000b724 --- interrupt: c00 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210209141627.2898485-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/pci: Add ppc_md.discover_phbs()Oliver O'Halloran2021-03-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5537fcb319d016ce387f818dd774179bc03217f5 ] On many powerpc platforms the discovery and initalisation of pci_controllers (PHBs) happens inside of setup_arch(). This is very early in boot (pre-initcalls) and means that we're initialising the PHB long before many basic kernel services (slab allocator, debugfs, a real ioremap) are available. On PowerNV this causes an additional problem since we map the PHB registers with ioremap(). As of commit d538aadc2718 ("powerpc/ioremap: warn on early use of ioremap()") a warning is printed because we're using the "incorrect" API to setup and MMIO mapping in searly boot. The kernel does provide early_ioremap(), but that is not intended to create long-lived MMIO mappings and a seperate warning is printed by generic code if early_ioremap() mappings are "leaked." This is all fixable with dumb hacks like using early_ioremap() to setup the initial mapping then replacing it with a real ioremap later on in boot, but it does raise the question: Why the hell are we setting up the PHB's this early in boot? The old and wise claim it's due to "hysterical rasins." Aside from amused grapes there doesn't appear to be any real reason to maintain the current behaviour. Already most of the newer embedded platforms perform PHB discovery in an arch_initcall and between the end of setup_arch() and the start of initcalls none of the generic kernel code does anything PCI related. On powerpc scanning PHBs occurs in a subsys_initcall so it should be possible to move the PHB discovery to a core, postcore or arch initcall. This patch adds the ppc_md.discover_phbs hook and a core_initcall stub that calls it. The core_initcalls are the earliest to be called so this will any possibly issues with dependency between initcalls. This isn't just an academic issue either since on pseries and PowerNV EEH init occurs in an arch_initcall and depends on the pci_controllers being available, similarly the creation of pci_dns occurs at core_initcall_sync (i.e. between core and postcore initcalls). These problems need to be addressed seperately. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Make discover_phbs() static] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201103043523.916109-1-oohall@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/kexec_file: fix FDT size estimation for kdump kernelHari Bathini2021-03-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | commit 2377c92e37fe97bc5b365f55cf60f56dfc4849f5 upstream. On systems with large amount of memory, loading kdump kernel through kexec_file_load syscall may fail with the below error: "Failed to update fdt with linux,drconf-usable-memory property" This happens because the size estimation for kdump kernel's FDT does not account for the additional space needed to setup usable memory properties. Fix it by accounting for the space needed to include linux,usable-memory & linux,drconf-usable-memory properties while estimating kdump kernel's FDT size. Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve map") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/161243826811.119001.14083048209224609814.stgit@hbathini Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/uaccess: Avoid might_fault() when user access is enabledAlexey Kardashevskiy2021-03-041-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7d506ca97b665b95e698a53697dad99fae813c1a ] The amount of code executed with enabled user space access (unlocked KUAP) should be minimal. However with CONFIG_PROVE_LOCKING or CONFIG_DEBUG_ATOMIC_SLEEP enabled, might_fault() calls into various parts of the kernel, and may even end up replaying interrupts which in turn may access user space and forget to restore the KUAP state. The problem places are: 1. strncpy_from_user (and similar) which unlock KUAP and call unsafe_get_user -> __get_user_allowed -> __get_user_nocheck() with do_allow=false to skip KUAP as the caller took care of it. 2. __unsafe_put_user_goto() which is called with unlocked KUAP. eg: WARNING: CPU: 30 PID: 1 at arch/powerpc/include/asm/book3s/64/kup.h:324 arch_local_irq_restore+0x160/0x190 NIP arch_local_irq_restore+0x160/0x190 LR lock_is_held_type+0x140/0x200 Call Trace: 0xc00000007f392ff8 (unreliable) ___might_sleep+0x180/0x320 __might_fault+0x50/0xe0 filldir64+0x2d0/0x5d0 call_filldir+0xc8/0x180 ext4_readdir+0x948/0xb40 iterate_dir+0x1ec/0x240 sys_getdents64+0x80/0x290 system_call_exception+0x160/0x280 system_call_common+0xf0/0x27c Change __get_user_nocheck() to look at `do_allow` to decide whether to skip might_fault(). Since strncpy_from_user/etc call might_fault() anyway before unlocking KUAP, there should be no visible change. Drop might_fault() in __unsafe_put_user_goto() as it is only called from unsafe_put_user(), which already has KUAP unlocked. Since keeping might_fault() is still desirable for debugging, add calls to it in user_[read|write]_access_begin(). That also allows us to drop the is_kernel_addr() test, because there should be no code using user_[read|write]_access_begin() in order to access a kernel address. Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [mpe: Combine with related patch from myself, merge change logs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210204121612.32721-1-aik@ozlabs.ru Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/64s: fix scv entry fallback flush vs interruptNicholas Piggin2021-01-272-0/+23
| | | | | | | | | | | | | | | | | | | | | commit 08685be7761d69914f08c3d6211c543a385a5b9c upstream. The L1D flush fallback functions are not recoverable vs interrupts, yet the scv entry flush runs with MSR[EE]=1. This can result in a timer (soft-NMI) or MCE or SRESET interrupt hitting here and overwriting the EXRFI save area, which ends up corrupting userspace registers for scv return. Fix this by disabling RI and EE for the scv entry fallback flush. Fixes: f79643787e0a0 ("powerpc/64s: flush L1D on kernel entry") Cc: stable@vger.kernel.org # 5.9+ which also have flush L1D patch backport Reported-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210111062408.287092-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* local64.h: make <asm/local64.h> mandatoryRandy Dunlap2021-01-121-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 87dbc209ea04645fd2351981f09eff5d23f8e2e9 ] Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and remove all arch/*/include/asm/local64.h arch-specific files since they only #include <asm-generic/local64.h>. This fixes build errors on arch/c6x/ and arch/nios2/ for block/blk-iocost.c. Build-tested on 21 of 25 arch-es. (tools problems on the others) Yes, we could even rename <asm-generic/local64.h> to <linux/local64.h> and change all #includes to use <linux/local64.h> instead. Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/8xx: Fix early debug when SMC1 is relocatedChristophe Leroy2020-12-301-0/+1
| | | | | | | | | | | | | | | | | | | | commit 1e78f723d6a52966bfe3804209dbf404fdc9d3bb upstream. When SMC1 is relocated and early debug is selected, the board hangs is ppc_md.setup_arch(). This is because ones the microcode has been loaded and SMC1 relocated, early debug writes in the weed. To allow smooth continuation, the SMC1 parameter RAM set up by the bootloader have to be copied into the new location. Fixes: 43db76f41824 ("powerpc/8xx: Add microcode patch to move SMC parameter RAM.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b2f71f39eca543f1e4ec06596f09a8b12235c701.1607076683.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/feature: Add CPU_FTR_NOEXECUTE to G2_LEChristophe Leroy2020-12-301-1/+1
| | | | | | | | | | | | | | commit 197493af414ee22427be3343637ac290a791925a upstream. G2_LE has a 603 core, add CPU_FTR_NOEXECUTE. Fixes: 385e89d5b20f ("powerpc/mm: add exec protection on powerpc 603") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/39a530ee41d83f49747ab3af8e39c056450b9b4d.1602489653.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()Christophe Leroy2020-12-301-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1891ef21d92c4801ea082ee8ed478e304ddc6749 upstream. fls() and fls64() are using __builtin_ctz() and _builtin_ctzll(). On powerpc, those builtins trivially use ctlzw and ctlzd power instructions. Allthough those instructions provide the expected result with input argument 0, __builtin_ctz() and __builtin_ctzll() are documented as undefined for value 0. The easiest fix would be to use fls() and fls64() functions defined in include/asm-generic/bitops/builtin-fls.h and include/asm-generic/bitops/fls64.h, but GCC output is not optimal: 00000388 <testfls>: 388: 2c 03 00 00 cmpwi r3,0 38c: 41 82 00 10 beq 39c <testfls+0x14> 390: 7c 63 00 34 cntlzw r3,r3 394: 20 63 00 20 subfic r3,r3,32 398: 4e 80 00 20 blr 39c: 38 60 00 00 li r3,0 3a0: 4e 80 00 20 blr 000003b0 <testfls64>: 3b0: 2c 03 00 00 cmpwi r3,0 3b4: 40 82 00 1c bne 3d0 <testfls64+0x20> 3b8: 2f 84 00 00 cmpwi cr7,r4,0 3bc: 38 60 00 00 li r3,0 3c0: 4d 9e 00 20 beqlr cr7 3c4: 7c 83 00 34 cntlzw r3,r4 3c8: 20 63 00 20 subfic r3,r3,32 3cc: 4e 80 00 20 blr 3d0: 7c 63 00 34 cntlzw r3,r3 3d4: 20 63 00 40 subfic r3,r3,64 3d8: 4e 80 00 20 blr When the input of fls(x) is a constant, just check x for nullity and return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction directly. For fls64() on PPC64, do the same but with __builtin_clzll() and cntlzd instruction. On PPC32, lets take the generic fls64() which will use our fls(). The result is as expected: 00000388 <testfls>: 388: 7c 63 00 34 cntlzw r3,r3 38c: 20 63 00 20 subfic r3,r3,32 390: 4e 80 00 20 blr 000003a0 <testfls64>: 3a0: 2c 03 00 00 cmpwi r3,0 3a4: 40 82 00 10 bne 3b4 <testfls64+0x14> 3a8: 7c 83 00 34 cntlzw r3,r4 3ac: 20 63 00 20 subfic r3,r3,32 3b0: 4e 80 00 20 blr 3b4: 7c 63 00 34 cntlzw r3,r3 3b8: 20 63 00 40 subfic r3,r3,64 3bc: 4e 80 00 20 blr Fixes: 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_atMathieu Desnoyers2020-12-302-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | commit d85be8a49e733dcd23674aa6202870d54bf5600d upstream. The placeholder for instruction selection should use the second argument's operand, which is %1, not %0. This could generate incorrect assembly code if the memory addressing of operand %0 is a different form from that of operand %1. Also remove the %Un placeholder because having %Un placeholders for two operands which are based on the same local var (ptep) doesn't make much sense. By the way, it doesn't change the current behaviour because "<>" constraint is missing for the associated "=m". [chleroy: revised commit log iaw segher's comments and removed %U0] Fixes: 9bf2b5cdc5fe ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support") Cc: <stable@vger.kernel.org> # v2.6.28+ Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/96354bd77977a6a933fe9020da57629007fdb920.1603358942.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/32s: Fix cleanup_cpu_mmu_context() compile bugMichael Ellerman2020-12-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c1bea0a840ac75dca19bc6aa05575a33eb9fd058 ] Currently pmac32_defconfig with SMP=y doesn't build: arch/powerpc/platforms/powermac/smp.c: error: implicit declaration of function 'cleanup_cpu_mmu_context' It would be nice for consistency if all platforms clear mm_cpumask and flush TLBs on unplug, but the TLB invalidation bug described in commit 01b0f0eae081 ("powerpc/64s: Trim offlined CPUs from mm_cpumasks") only applies to 64s and for now we only have the TLB flush code for that platform. So just add an empty version for 32-bit Book3S. Fixes: 01b0f0eae081 ("powerpc/64s: Trim offlined CPUs from mm_cpumasks") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log based on comments from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32Christophe Leroy2020-12-301-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 78665179e569c7e1fe102fb6c21d0f5b6951f084 ] On 8xx, we get the following features: [ 0.000000] cpu_features = 0x0000000000000100 [ 0.000000] possible = 0x0000000000000120 [ 0.000000] always = 0x0000000000000000 This is not correct. As CONFIG_PPC_8xx is mutually exclusive with all other configurations, the three lines should be equal. The problem is due to CPU_FTRS_GENERIC_32 which is taken when CONFIG_BOOK3S_32 is NOT selected. This CPU_FTRS_GENERIC_32 is pointless because there is no generic configuration supporting all 32 bits but book3s/32. Remove this pointless generic features definition to unbreak the calculation of 'possible' features and 'always' features. Fixes: 76bc080ef5a3 ("[POWERPC] Make default cputable entries reflect selected CPU family") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/76a85f30bf981d1aeaae00df99321235494da254.1604426550.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
* Merge tag 'powerpc-5.10-5' of ↵Linus Torvalds2020-12-051-0/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.10: - Three commits fixing possible missed TLB invalidations for multi-threaded processes when CPUs are hotplugged in and out. - A fix for a host crash triggerable by host userspace (qemu) in KVM on Power9. - A fix for a host crash in machine check handling when running HPT guests on a HPT host. - One commit fixing potential missed TLB invalidations when using the hash MMU on Power9 or later. - A regression fix for machines with CPUs on node 0 but no memory. Thanks to Aneesh Kumar K.V, Cédric Le Goater, Greg Kurz, Milan Mohanty, Milton Miller, Nicholas Piggin, Paul Mackerras, and Srikar Dronamraju" * tag 'powerpc-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/powernv: Fix memory corruption when saving SLB entries on MCE KVM: PPC: Book3S HV: XIVE: Fix vCPU id sanity check powerpc/numa: Fix a regression on memoryless node 0 powerpc/64s: Trim offlined CPUs from mm_cpumasks kernel/cpu: add arch override for clear_tasks_mm_cpumask() mm handling powerpc/64s/pseries: Fix hash tlbiel_all_isa300 for guest kernels powerpc/64s: Fix hash ISA v3.0 TLBIEL instruction generation
| * powerpc/64s: Trim offlined CPUs from mm_cpumasksNicholas Piggin2020-11-271-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When offlining a CPU, powerpc/64s does not flush TLBs, rather it just leaves the CPU set in mm_cpumasks, so it continues to receive TLBIEs to manage its TLBs. However the exit_flush_lazy_tlbs() function expects that after returning, all CPUs (except self) have flushed TLBs for that mm, in which case TLBIEL can be used for this flush. This breaks for offline CPUs because they don't get the IPI to flush their TLB. This can lead to stale translations. Fix this by clearing the CPU from mm_cpumasks, then flushing all TLBs before going offline. These offlined CPU bits stuck in the cpumask also prevents the cpumask from being trimmed back to local mode, which means continual broadcast IPIs or TLBIEs are needed for TLB flushing. This patch prevents that situation too. A cast of many were involved in working this out, but in particular Milton, Aneesh, Paul made key discoveries. Fixes: 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: Milton Miller <miltonm@us.ibm.com> Debugged-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-5-npiggin@gmail.com
* | Merge tag 'asm-generic-fixes-5.10-2' of ↵Linus Torvalds2020-11-272-0/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fix from Arnd Bergmann: "Add correct MAX_POSSIBLE_PHYSMEM_BITS setting to asm-generic. This is a single bugfix for a bug that Stefan Agner found on 32-bit Arm, but that exists on several other architectures" * tag 'asm-generic-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
| * | arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where neededArnd Bergmann2020-11-162-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stefan Agner reported a bug when using zsram on 32-bit Arm machines with RAM above the 4GB address boundary: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = a27bd01c [00000000] *pgd=236a0003, *pmd=1ffa64003 Internal error: Oops: 207 [#1] SMP ARM Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1 Hardware name: BCM2711 PC is at zs_map_object+0x94/0x338 LR is at zram_bvec_rw.constprop.0+0x330/0xa64 pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013 sp : e376bbe0 ip : 00000000 fp : c1e2921c r10: 00000002 r9 : c1dda730 r8 : 00000000 r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000 r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5383d Table: 235c2a80 DAC: fffffffd Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6) Stack: (0xe376bbe0 to 0xe376c000) As it turns out, zsram needs to know the maximum memory size, which is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture. The same problem will be hit on all 32-bit architectures that have a physical address space larger than 4GB and happen to not enable sparsemem and include asm/sparsemem.h from asm/pgtable.h. After the initial discussion, I suggested just always defining MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is set, or provoking a build error otherwise. This addresses all configurations that can currently have this runtime bug, but leaves all other configurations unchanged. I looked up the possible number of bits in source code and datasheets, here is what I found: - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never support more than 32 bits, even though supersections in theory allow up to 40 bits as well. - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5 XPA supports up to 60 bits in theory, but 40 bits are more than anyone will ever ship - On PowerPC, there are three different implementations of 36 bit addressing, but 32-bit is used without CONFIG_PTE_64BIT - On RISC-V, the normal page table format can support 34 bit addressing. There is no highmem support on RISC-V, so anything above 2GB is unused, but it might be useful to eventually support CONFIG_ZRAM for high pages. Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library") Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS") Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: Stefan Agner <stefan@agner.ch> Tested-by: Stefan Agner <stefan@agner.ch> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/ Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* | | Merge tag 'powerpc-5.10-4' of ↵Linus Torvalds2020-11-271-0/+2
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.10: - regression fix for a boot failure on some 32-bit machines. - fix for host crashes in the KVM system reset handling. - fix for a possible oops in the KVM XIVE interrupt handling on Power9. - fix for host crashes triggerable via the KVM emulated MMIO handling when running HPT guests. - a couple of small build fixes. Thanks to Andreas Schwab, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Greg Kurz, Greg Kurz, Németh Márton, Nicholas Piggin, Nick Desaulniers, Serge Belyshev, and Stephen Rothwell" * tag 'powerpc-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix allnoconfig build since uaccess flush powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context powerpc: Drop -me200 addition to build flags KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y powerpc/32s: Use relocation offset when setting early hash table
| * | powerpc/64s: Fix allnoconfig build since uaccess flushStephen Rothwell2020-11-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h. Otherwise the build fails with eg: arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class 66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key); Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> [mpe: Massage change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201123184016.693fe464@canb.auug.org.au
| * | Merge tag 'powerpc-cve-2020-4788' into fixesMichael Ellerman2020-11-236-31/+103
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups.
* | | mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exportsDan Williams2020-11-222-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'powerpc-cve-2020-4788' of ↵Linus Torvalds2020-11-196-31/+103
|\ \ \ | |/ / |/| / | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes for CVE-2020-4788. From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups" * tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations selftests/powerpc: refactor entry and rfi_flush tests selftests/powerpc: entry flush test powerpc: Only include kup-radix.h for 64-bit Book3S powerpc/64s: flush L1D after user accesses powerpc/64s: flush L1D on kernel entry selftests/powerpc: rfi_flush: disable entry flush if present
| * powerpc: Only include kup-radix.h for 64-bit Book3SMichael Ellerman2020-11-192-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In kup.h we currently include kup-radix.h for all 64-bit builds, which includes Book3S and Book3E. The latter doesn't make sense, Book3E never uses the Radix MMU. This has worked up until now, but almost by accident, and the recent uaccess flush changes introduced a build breakage on Book3E because of the bad structure of the code. So disentangle things so that we only use kup-radix.h for Book3S. This requires some more stubs in kup.h and fixing an include in syscall_64.c. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/64s: flush L1D after user accessesNicholas Piggin2020-11-196-29/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/64s: flush L1D on kernel entryNicholas Piggin2020-11-194-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/numa: Fix build when CONFIG_NUMA=nScott Cheloha2020-11-061-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | Add a non-NUMA definition for of_drconf_to_nid_single() to topology.h so we have one even if powerpc/mm/numa.c is not compiled. On a non-NUMA kernel the appropriate node id is always first_online_node. Fixes: 72cdd117c449 ("pseries/hotplug-memory: hot-add: skip redundant LMB lookup") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201105223040.3612663-1-cheloha@linux.ibm.com
* | powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entryChristophe Leroy2020-11-053-37/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When _PAGE_ACCESSED is not set, a minor fault is expected. To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED into the L2 entry valid bit. To simplify the processing and reduce the number of instructions in TLB miss exceptions, manage it as an APG bit and get it next to _PAGE_GUARDED bit to allow a copy in one go. Then declare the corresponding groups as handling all accesses as user accesses. As the PP bits always define user as No Access, it will generate a fault. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu
* | powerpc: Use asm_goto_volatile for put_user()Michael Ellerman2020-11-051-2/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andreas reported that commit ee0a49a6870e ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()") broke CLONE_CHILD_SETTID. Further inspection showed that the put_user() in schedule_tail() was missing entirely, the store not emitted by the compiler. <.schedule_tail>: mflr r0 std r0,16(r1) stdu r1,-112(r1) bl <.finish_task_switch> ld r9,2496(r3) cmpdi cr7,r9,0 bne cr7,<.schedule_tail+0x60> ld r3,392(r13) ld r9,1392(r3) cmpdi cr7,r9,0 beq cr7,<.schedule_tail+0x3c> li r4,0 li r5,0 bl <.__task_pid_nr_ns> nop bl <.calculate_sigpending> nop addi r1,r1,112 ld r0,16(r1) mtlr r0 blr nop nop nop bl <.__balance_callback> b <.schedule_tail+0x1c> Notice there are no stores other than to the stack. There should be a stw in there for the store to current->set_child_tid. This is only seen with GCC 4.9 era compilers (tested with 4.9.3 and 4.9.4), and only when CONFIG_PPC_KUAP is disabled. When CONFIG_PPC_KUAP=y, the inline asm that's part of the isync() and mtspr() inlined via allow_user_access() seems to be enough to avoid the bug. We already have a macro to work around this (or a similar bug), called asm_volatile_goto which includes an empty asm block to tickle the compiler into generating the right code. So use that. With this applied the code generation looks more like it will work: <.schedule_tail>: mflr r0 std r31,-8(r1) std r0,16(r1) stdu r1,-144(r1) std r3,112(r1) bl <._mcount> nop ld r3,112(r1) bl <.finish_task_switch> ld r9,2624(r3) cmpdi cr7,r9,0 bne cr7,<.schedule_tail+0xa0> ld r3,2408(r13) ld r31,1856(r3) cmpdi cr7,r31,0 beq cr7,<.schedule_tail+0x80> li r4,0 li r5,0 bl <.__task_pid_nr_ns> nop li r9,-1 clrldi r9,r9,12 cmpld cr7,r31,r9 bgt cr7,<.schedule_tail+0x80> lis r9,16 rldicr r9,r9,32,31 subf r9,r31,r9 cmpldi cr7,r9,3 ble cr7,<.schedule_tail+0x80> li r9,0 stw r3,0(r31) <-- stw nop bl <.calculate_sigpending> nop addi r1,r1,144 ld r0,16(r1) ld r31,-8(r1) mtlr r0 blr nop bl <.__balance_callback> b <.schedule_tail+0x30> Fixes: ee0a49a6870e ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()") Reported-by: Andreas Schwab <schwab@linux-m68k.org> Tested-by: Andreas Schwab <schwab@linux-m68k.org> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201104111742.672142-1-mpe@ellerman.id.au
* treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches2020-10-252-2/+2
| | | | | | | | | | | | | | | | | | | | Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'powerpc-5.10-2' of ↵Linus Torvalds2020-10-243-3/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - A fix for undetected data corruption on Power9 Nimbus <= DD2.1 in the emulation of VSX loads. The affected CPUs were not widely available. - Two fixes for machine check handling in guests under PowerVM. - A fix for our recent changes to SMP setup, when CONFIG_CPUMASK_OFFSTACK=y. - Three fixes for races in the handling of some of our powernv sysfs attributes. - One change to remove TM from the set of Power10 CPU features. - A couple of other minor fixes. Thanks to: Aneesh Kumar K.V, Christophe Leroy, Ganesh Goudar, Jordan Niethe, Mahesh Salgaonkar, Michael Neuling, Oliver O'Halloran, Qian Cai, Srikar Dronamraju, Vasant Hegde. * tag 'powerpc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries: Avoid using addr_to_pfn in real mode powerpc/uaccess: Don't use "m<>" constraint with GCC 4.9 powerpc/eeh: Fix eeh_dev_check_failure() for PE#0 powerpc/64s: Remove TM from Power10 features selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation powerpc/powernv/dump: Handle multiple writes to ack attribute powerpc/powernv/dump: Fix race while processing OPAL dump powerpc/smp: Use GFP_ATOMIC while allocating tmp mask powerpc/smp: Remove unnecessary variable powerpc/mce: Avoid nmi_enter/exit in real mode on pseries hash powerpc/opal_elog: Handle multiple writes to ack attribute
| * powerpc/uaccess: Don't use "m<>" constraint with GCC 4.9Christophe Leroy2020-10-222-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 4.9 sometimes fails to build with "m<>" constraint in inline assembly. CC lib/iov_iter.o In file included from ./arch/powerpc/include/asm/cmpxchg.h:6:0, from ./arch/powerpc/include/asm/atomic.h:11, from ./include/linux/atomic.h:7, from ./include/linux/crypto.h:15, from ./include/crypto/hash.h:11, from lib/iov_iter.c:2: lib/iov_iter.c: In function 'iovec_from_user.part.30': ./arch/powerpc/include/asm/uaccess.h:287:2: error: 'asm' operand has impossible constraints __asm__ __volatile__( \ ^ ./include/linux/compiler.h:78:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ ./arch/powerpc/include/asm/uaccess.h:583:34: note: in expansion of macro 'unsafe_op_wrap' #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e) ^ ./arch/powerpc/include/asm/uaccess.h:329:10: note: in expansion of macro '__get_user_asm' case 4: __get_user_asm(x, (u32 __user *)ptr, retval, "lwz"); break; \ ^ ./arch/powerpc/include/asm/uaccess.h:363:3: note: in expansion of macro '__get_user_size_allowed' __get_user_size_allowed(__gu_val, __gu_addr, __gu_size, __gu_err); \ ^ ./arch/powerpc/include/asm/uaccess.h:100:2: note: in expansion of macro '__get_user_nocheck' __get_user_nocheck((x), (ptr), sizeof(*(ptr)), false) ^ ./arch/powerpc/include/asm/uaccess.h:583:49: note: in expansion of macro '__get_user_allowed' #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e) ^ lib/iov_iter.c:1663:3: note: in expansion of macro 'unsafe_get_user' unsafe_get_user(len, &uiov[i].iov_len, uaccess_end); ^ make[1]: *** [scripts/Makefile.build:283: lib/iov_iter.o] Error 1 Define a UPD_CONSTR macro that is "<>" by default and only "" with GCC prior to GCC 5. Fixes: fcf1f26895a4 ("powerpc/uaccess: Add pre-update addressing to __put_user_asm_goto()") Fixes: 2f279eeb68b8 ("powerpc/uaccess: Add pre-update addressing to __get_user_asm() and __put_user_asm()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/212d3bc4a52ca71523759517bb9c61f7e477c46a.1603179582.git.christophe.leroy@csgroup.eu
| * powerpc/64s: Remove TM from Power10 featuresJordan Niethe2020-10-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | ISA v3.1 removes transactional memory and hence it should not be present in cpu_features or cpu_user_features2. Remove CPU_FTR_TM_COMP from CPU_FTRS_POWER10. Remove PPC_FEATURE2_HTM_COMP and PPC_FEATURE2_HTM_NOSC_COMP from COMMON_USER2_POWER10. Fixes: a3ea40d5c736 ("powerpc: Add POWER10 architected mode") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200827035529.900-1-jniethe5@gmail.com
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2020-10-231-0/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
| * \ Merge tag 'kvm-ppc-next-5.10-1' of ↵Paolo Bonzini2020-09-221-0/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM update for 5.10 - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes
| | * | KVM: PPC: Book3S HV: XICS: Replace the 'destroy' method by a 'release' methodGreg Kurz2020-09-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similarly to what was done with XICS-on-XIVE and XIVE native KVM devices with commit 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method"), convert the historical XICS KVM device to implement the 'release' method. This is needed to run nested guests with an in-kernel IRQ chip. A typical POWER9 guest can select XICS or XIVE during boot, which requires to be able to destroy and to re-create the KVM device. Only the historical XICS KVM device is available under pseries at the current time and it still uses the legacy 'destroy' method. Switching to 'release' means that vCPUs might still be running when the device is destroyed. In order to avoid potential use-after-free, the kvmppc_xics structure is allocated on first usage and kept around until the VM exits. The same pointer is used each time a KVM XICS device is being created, but this is okay since we only have one per VM. Clear the ICP of each vCPU with vcpu->mutex held. This ensures that the next time the vCPU resumes execution, it won't be going into the XICS code anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | | | Merge tag 'kbuild-v5.10' of ↵Linus Torvalds2020-10-221-0/+8
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Support 'make compile_commands.json' to generate the compilation database more easily, avoiding stale entries - Support 'make clang-analyzer' and 'make clang-tidy' for static checks using clang-tidy - Preprocess scripts/modules.lds.S to allow CONFIG options in the module linker script - Drop cc-option tests from compiler flags supported by our minimal GCC/Clang versions - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y - Use sha1 build id for both BFD linker and LLD - Improve deb-pkg for reproducible builds and rootless builds - Remove stale, useless scripts/namespace.pl - Turn -Wreturn-type warning into error - Fix build error of deb-pkg when CONFIG_MODULES=n - Replace 'hostname' command with more portable 'uname -n' - Various Makefile cleanups * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kbuild: Use uname for LINUX_COMPILE_HOST detection kbuild: Only add -fno-var-tracking-assignments for old GCC versions kbuild: remove leftover comment for filechk utility treewide: remove DISABLE_LTO kbuild: deb-pkg: clean up package name variables kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n kbuild: enforce -Werror=return-type scripts: remove namespace.pl builddeb: Add support for all required debian/rules targets builddeb: Enable rootless builds builddeb: Pass -n to gzip for reproducible packages kbuild: split the build log of kallsyms kbuild: explicitly specify the build id style scripts/setlocalversion: make git describe output more reliable kbuild: remove cc-option test of -Werror=date-time kbuild: remove cc-option test of -fno-stack-check kbuild: remove cc-option test of -fno-strict-overflow kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan kbuild: do not create built-in objects for external module builds ...