summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/include/asm
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/64s: Prevent fallthrough to hash TLB flush when using radixBenjamin Gray2023-02-171-2/+2
| | | | | | | | | | | | | | | In the fix reconnecting hash__tlb_flush() to tlb_flush() the void return on radix__tlb_flush() was not restored and subsequently falls through to the restored hash__tlb_flush(). Guard hash__tlb_flush() under an else to prevent this. Fixes: 1665c027afb2 ("powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()") Reported-by: "Erhard F." <erhard_f@mailbox.org> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230217011434.115554-1-bgray@linux.ibm.com
* powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()Michael Ellerman2023-02-021-0/+2
| | | | | | | | | | | | | | | | | Commit baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions") removed some empty hash MMU flushing routines, but got a bit overeager and also removed the call to hash__tlb_flush() from tlb_flush(). In regular use this doesn't lead to any noticable breakage, which is a little concerning. Presumably there are flushes happening via other paths such as arch_leave_lazy_mmu_mode(), and/or a bit of luck. Fix it by reinstating the call to hash__tlb_flush(). Fixes: baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230131111407.806770-1-mpe@ellerman.id.au
* powerpc/64: Fix perf profiling asynchronous interrupt handlersNicholas Piggin2023-01-301-12/+29
| | | | | | | | | | | | | | | | Interrupt entry sets the soft mask to IRQS_ALL_DISABLED to match the hard irq disabled state. So when should_hard_irq_enable() returns true because we want PMI interrupts in irq handlers, MSR[EE] is enabled but PMIs just get soft-masked. Fix this by clearing IRQS_PMI_DISABLED before enabling MSR[EE]. This also tidies some of the warnings, no need to duplicate them in both should_hard_irq_enable() and do_hard_irq_enable(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230121100156.2824054-1-npiggin@gmail.com
* powerpc/64s: Fix local irq disable when PMIs are disabledNicholas Piggin2023-01-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | When PMI interrupts are soft-masked, local_irq_save() will clear the PMI mask bit, allowing PMIs in and causing a race condition. This causes a deadlock in native_hpte_insert via hash_preload, which depends on PMIs being disabled since commit 8b91cee5eadd ("powerpc/64s/hash: Make hash faults work in NMI context"). native_hpte_insert calls local_irq_save(). It's possible the lpar hash code is also affected when tracing is enabled because __trace_hcall_entry() calls local_irq_save(). Fix this by making arch_local_irq_save() _or_ the IRQS_DISABLED bit into the mask. This was found with the stress_hpt option with a kbuild workload running together with `perf record -g`. Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them") Fixes: 8b91cee5eadd ("powerpc/64s/hash: Make hash faults work in NMI context") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Just take the fix without the new warning] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230121095352.2823517-1-npiggin@gmail.com
* powerpc/imc-pmu: Fix use of mutex in IRQs disabled sectionKajol Jain2023-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event. Command to trigger the warning: # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5 Performance counter stats for 'sleep 5': 0 thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ 5.002117947 seconds time elapsed 0.000131000 seconds user 0.001063000 seconds sys Below is snippet of the warning in dmesg: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec preempt_count: 2, expected: 0 4 locks held by perf-exec/2869: #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90 #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0 #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510 #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510 irq event stamp: 4806 hardirqs last enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0 hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510 softirqs last enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61 Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV Call Trace: dump_stack_lvl+0x98/0xe0 (unreliable) __might_resched+0x2f8/0x310 __mutex_lock+0x6c/0x13f0 thread_imc_event_add+0xf4/0x1b0 event_sched_in+0xe0/0x210 merge_sched_in+0x1f0/0x600 visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0 ctx_flexible_sched_in+0xcc/0x140 ctx_sched_in+0x20c/0x2a0 ctx_resched+0x104/0x1c0 perf_event_exec+0x340/0x510 begin_new_exec+0x730/0xef0 load_elf_binary+0x3f8/0x1e10 ... do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0 WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0 CPU: 36 PID: 2869 Comm: sleep Tainted: G W 6.2.0-rc2-00011-g1247637727f2 #61 Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV NIP: c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670 REGS: c00000004d2134e0 TRAP: 0700 Tainted: G W (6.2.0-rc2-00011-g1247637727f2) MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 48002824 XER: 00000000 CFAR: c00000000013fb64 IRQMASK: 1 The above warning triggered because the current imc-pmu code uses mutex lock in interrupt disabled sections. The function mutex_lock() internally calls __might_resched(), which will check if IRQs are disabled and in case IRQs are disabled, it will trigger the warning. Fix the issue by changing the mutex lock to spinlock. Fixes: 8f95faaac56c ("powerpc/powernv: Detect and create IMC device") Reported-by: Michael Petlan <mpetlan@redhat.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Fix comments, trim oops in change log, add reported-by tags] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com
* Merge tag 'powerpc-6.2-1' of ↵Linus Torvalds2022-12-1931-296/+578
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add powerpc qspinlock implementation optimised for large system scalability and paravirt. See the merge message for more details - Enable objtool to be built on powerpc to generate mcount locations - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is restricted to the patching CPU - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI - Sanitise user registers on interrupt entry on 64-bit Book3S - Many other small features and fixes Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, and Wolfram Sang. * tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits) powerpc/code-patching: Fix oops with DEBUG_VM enabled powerpc/qspinlock: Fix 32-bit build powerpc/prom: Fix 32-bit build powerpc/rtas: mandate RTAS syscall filtering powerpc/rtas: define pr_fmt and convert printk call sites powerpc/rtas: clean up includes powerpc/rtas: clean up rtas_error_log_max initialization powerpc/pseries/eeh: use correct API for error log size powerpc/rtas: avoid scheduling in rtas_os_term() powerpc/rtas: avoid device tree lookups in rtas_os_term() powerpc/rtasd: use correct OF API for event scan rate powerpc/rtas: document rtas_call() powerpc/pseries: unregister VPA when hot unplugging a CPU powerpc/pseries: reset the RCU watchdogs after a LPM powerpc: Take in account addition CPU node when building kexec FDT powerpc: export the CPU node count powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state powerpc/dts/fsl: Fix pca954x i2c-mux node names cxl: Remove unnecessary cxl_pci_window_alignment() selftests/powerpc: Fix resource leaks ...
| * Merge branch 'topic/objtool' into nextMichael Ellerman2022-12-083-1/+12
| |\ | | | | | | | | | | | | Merge the powerpc objtool support, which we were keeping in a topic branch in case of any merge conflicts.
| | * objtool/powerpc: Enable objtool to be built on ppcSathvika Vasireddy2022-11-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds [stub] implementations for required functions, inorder to enable objtool build on powerpc. [Christophe Leroy: powerpc: Add missing asm/asm.h for objtool, Use local variables for type and imm in arch_decode_instruction(), Adapt len for prefixed instructions.] Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-16-sv@linux.ibm.com
| | * powerpc: Override __ALIGN and __ALIGN_STR macrosSathvika Vasireddy2022-11-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a subsequent patch, we would want to annotate powerpc assembly functions with SYM_FUNC_START_LOCAL macro. This macro depends on __ALIGN macro. The default expansion of __ALIGN macro is: #define __ALIGN .align 4,0x90 So, override __ALIGN and __ALIGN_STR macros to use the same alignment as that of the existing _GLOBAL macro. Also, do not pad with 0x90, because repeated 0x90s are not a nop or trap on powerpc. Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-3-sv@linux.ibm.com
| | * powerpc: Fix __WARN_FLAGS() for use with ObjtoolSathvika Vasireddy2022-11-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1e688dd2a3d675 ("powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto") updated __WARN_FLAGS() to use asm goto, and added a call to 'unreachable()' after the asm goto for optimal code generation. With CONFIG_OBJTOOL enabled, 'annotate_unreachable()' statement in 'unreachable()' tries to note down the location of the subsequent instruction in a separate elf section to aid code flow analysis. However, on powerpc, this results in gcc emitting a call to a symbol of size 0. This results in objtool complaining of "unannotated intra-function call" since the target symbol is not a valid function call destination. Objtool wants this annotation for code flow analysis, which we are not yet enabling on powerpc. As such, expand the call to 'unreachable()' in __WARN_FLAGS() without annotate_unreachable(): barrier_before_unreachable(); __builtin_unreachable(); This still results in optimal code generation for __WARN_FLAGS(), while getting rid of the objtool warning. We still need barrier_before_unreachable() to work around gcc bugs 82365 and 106751: - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106751 Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-2-sv@linux.ibm.com
| * | powerpc/rtas: document rtas_call()Nathan Lynch2022-12-071-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rtas_call() has a complex calling convention, non-standard return values, and many users. Add kernel-doc for it and remove the less structured commentary from rtas.h. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-2-nathanl@linux.ibm.com
| * | powerpc: export the CPU node countLaurent Dufour2022-12-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At boot time, the FDT is parsed to compute the number of CPUs. In addition count the number of CPU nodes and export it. This is useful when building the FDT for a kexeced kernel since we need to take in account the CPU node added since the boot time during CPU hotplug operations. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221110180619.15796-2-ldufour@linux.ibm.com
| * | powerpc/code-patching: Remove protection against patching init addresses ↵Christophe Leroy2022-12-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | after init Once init section is freed, attempting to patch init code ends up in the weed. Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") protected patch_instruction() against that, but it is the responsibility of the caller to ensure that the patched memory is valid. All callers have now been verified and fixed so the check can be removed. This improves ftrace activation by about 2% on 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/504310828f473d424e2ed229eff57bf075f52796.1669969781.git.christophe.leroy@csgroup.eu
| * | powerpc/ftrace: fix syscall tracing on PPC64_ELF_ABI_V1Michael Jeanson2022-12-021-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In v5.7 the powerpc syscall entry/exit logic was rewritten in C, on PPC64_ELF_ABI_V1 this resulted in the symbols in the syscall table changing from their dot prefixed variant to the non-prefixed ones. Since ftrace prefixes a dot to the syscall names when matching them to build its syscall event list, this resulted in no syscall events being available. Remove the PPC64_ELF_ABI_V1 specific version of arch_syscall_match_sym_name to have the same behavior across all powerpc variants. Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201161442.2127231-1-mjeanson@efficios.com
| * | powerpc/64: Add interrupt register sanitisation macrosRohan McLure2022-12-021-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Include in asm/ppc_asm.h macros to be used in multiple successive patches to implement zeroising architected registers in interrupt handlers. Registers will be sanitised in this fashion in future patches to reduce the speculation influence of user-controlled register values. These mitigations will be configurable through the CONFIG_INTERRUPT_SANITIZE_REGISTERS Kconfig option. Included are macros for conditionally zeroising registers and restoring as required with the mitigation enabled. With the mitigation disabled, non-volatiles must be restored on demand at separate locations to those required by the mitigation. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201071019.1953023-2-rmclure@linux.ibm.com
| * | Merge branch 'topic/qspinlock' into nextMichael Ellerman2022-12-025-60/+215
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge Nick's powerpc qspinlock implementation. From his cover letter: This replaces the generic queued spinlock code (like s390 does) with our own implementation. Generic PV qspinlock code is causing latency / starvation regressions on large systems that are resulting in hard lockups reported (mostly in pathoogical cases). The generic qspinlock code has a number of issues important for powerpc hardware and hypervisors that aren't easily solved without changing code that would impact other architectures. Follow s390's lead and implement our own for now. Issues for powerpc using generic qspinlocks: - The previous lock value should not be loaded with simple loads, and need not be passed around from previous loads or cmpxchg results, because powerpc uses ll/sc-style atomics which can perform more complex operations that do not require this. powerpc implementations tend to prefer loads use larx for improved coherency performance. - The queueing process should absolutely minimise the number of stores to the lock word to reduce exclusive coherency probes, important for large system scalability. The pending logic is counter productive here. - Non-atomic unlock for paravirt locks is important (atomic instructions tend to still be more expensive than x86 CPUs). - Yielding to the lock owner is important in the oversubscribed paravirt case, which requires storing the owner CPU in the lock word. - More control of lock stealing for the paravirt case is important to keep latency down on large systems. - The lock acquisition operation should always be made with a special variant of atomic instructions with the lock hint bit set, including (especially) in the queueing paths. This is more a matter of adding more arch lock helpers so not an insurmountable problem for generic code.
| | * | powerpc/qspinlock: add compile-time tuning adjustmentsNicholas Piggin2022-12-021-3/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds compile-time options that allow the EH lock hint bit to be enabled or disabled, and adds some new options that may or may not help matters. To help with experimentation and tuning. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-18-npiggin@gmail.com
| | * | powerpc/qspinlock: provide accounting and options for sleepy locksNicholas Piggin2022-12-021-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Finding the owner or a queued waiter on a lock with a preempted vcpu is indicative of an oversubscribed guest causing the lock to get into trouble. Provide some options to detect this situation and have new CPUs avoid queueing for a longer time (more steal iterations) to minimise the problems caused by vcpu preemption on the queue. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-17-npiggin@gmail.com
| | * | powerpc/qspinlock: allow lock stealing in trylock and lock fastpathNicholas Piggin2022-12-021-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change allows trylock to steal the lock. It also allows the initial lock attempt to steal the lock rather than bailing out and going to the slow path. This gives trylock more strength: without this a continually-contended lock will never permit a trylock to succeed. With this change, the trylock has a small but non-zero chance. It also gives the lock fastpath most of the benefit of passing the reservation back through to the steal loop in the slow path without the complexity. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-13-npiggin@gmail.com
| | * | powerpc/qspinlock: store owner CPU in lock wordNicholas Piggin2022-12-022-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Store the owner CPU number in the lock word so it may be yielded to, as powerpc's paravirtualised simple spinlocks do. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-7-npiggin@gmail.com
| | * | powerpc/qspinlock: theft prevention to control latencyNicholas Piggin2022-12-021-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Give the queue head the ability to stop stealers. After a number of spins without successfully acquiring the lock, the queue head sets this, which halts stealing and will assure it is the next owner. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-6-npiggin@gmail.com
| | * | powerpc/qspinlock: allow new waiters to steal the lock before queueingNicholas Piggin2022-12-021-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow new waiters to "steal" the lock before queueing. That is, to acquire it while other CPUs have queued. This particularly helps paravirt performance when physical CPUs are oversubscribed, by keeping the lock from becoming a strict FIFO and vCPU preemption causing queue train wrecks. The new __queued_spin_trylock_steal() function is put in qspinlock.h to save having to move it, because it will be used there by a later change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-5-npiggin@gmail.com
| | * | powerpc/qspinlock: convert atomic operations to assemblyNicholas Piggin2022-12-022-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This uses more optimal ll/sc style access patterns (rather than cmpxchg), and also sets the EH=1 lock hint on those operations which acquire ownership of the lock. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-4-npiggin@gmail.com
| | * | powerpc/qspinlock: use a half-word store to unlock to avoid larx/stcx.Nicholas Piggin2022-12-022-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first 16 bits of the lock are only modified by the owner, and other modifications always use atomic operations on the entire 32 bits, so unlocks can use plain stores on the 16 bits. This is the same kind of optimisation done by core qspinlock code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-3-npiggin@gmail.com
| | * | powerpc/qspinlock: add mcs queueing for contended waitersNicholas Piggin2022-12-022-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This forms the basis of the qspinlock slow path. Like generic qspinlocks and unlike the vanilla MCS algorithm, the lock owner does not participate in the queue, only waiters. The first waiter spins on the lock word, then when the lock is released it takes ownership and unqueues the next waiter. This is how qspinlocks can be implemented with the spinlock API -- lock owners don't need a node, only waiters do. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-2-npiggin@gmail.com
| | * | powerpc/qspinlock: powerpc qspinlock implementationNicholas Piggin2022-12-025-66/+44
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a powerpc specific implementation of queued spinlocks. This is the build framework with a very simple (non-queued) spinlock implementation to begin with. Later changes add queueing, and other features and optimisations one-at-a-time. It is done this way to more easily see how the queued spinlocks are built, and to make performance and correctness bisects more useful. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop paravirt.h & processor.h changes to fix 32-bit build] [mpe: Fix 32-bit build of qspinlock.o & disallow GENERIC_LOCKBREAK per Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/CONLLQB6DCJU.2ZPOS7T6S5GRR@bobo
| * | powerpc: remove STACK_FRAME_OVERHEADNicholas Piggin2022-12-021-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is equal to STACK_FRAME_MIN_SIZE on 32-bit and 64-bit ELFv1, and no longer used in 64-bit ELFv2, so replace STACK_FRAME_OVERHEAD occurrences with STACK_FRAME_MIN_SIZE. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-18-npiggin@gmail.com
| * | powerpc/64: ELFv2 use minimal stack frames in int and switch frame sizesNicholas Piggin2022-12-021-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adjust the ELFv2 interrupt and switch frames to the minimum C ABI size, plus pt_regs, plus 16 bytes for the aligned regs marker for the int frame (and the switch frame needs to match that because it uses the same regs offset as the int frame). This saves 80 bytes of kernel stack per interrupt. It's the principle of getting our accounting right that's more important than the practical saving. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-17-npiggin@gmail.com
| * | powerpc: split validate_sp into two functionsNicholas Piggin2022-12-021-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most callers just want to validate an arbitrary kernel stack pointer, some need a particular size. Make the size case the exceptional one with an extra function. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-15-npiggin@gmail.com
| * | powerpc: add a define for the switch frame size and regs offsetNicholas Piggin2022-12-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is open-coded in process.c, ppc32 uses a different define with the same value, and the C definition is name differently which makes it an extra indirection to grep for. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-12-npiggin@gmail.com
| * | powerpc: add a define for the user interrupt frame sizeNicholas Piggin2022-12-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The user interrupt frame is a different size from the kernel frame, so give it its own name. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-11-npiggin@gmail.com
| * | powerpc: Rename STACK_FRAME_MARKER and derive it from frame offsetNicholas Piggin2022-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a count of longs from the stack pointer to the regs marker. Rename it to make it more distinct from the other byte offsets. It can be derived from the byte offset definitions just added. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-10-npiggin@gmail.com
| * | powerpc: add a definition for the marker offset within the interrupt frameNicholas Piggin2022-12-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a constant rather than open-code the offset for the "regs" marker. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-9-npiggin@gmail.com
| * | powerpc: add definition for pt_regs offset within an interrupt frameNicholas Piggin2022-12-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a common offset that currently uses the overloaded STACK_FRAME_OVERHEAD constant. It's easier to read and more flexible to use a specific regs offset for this. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-8-npiggin@gmail.com
| * | powerpc/64: Remove asm interrupt tracing call helpersNicholas Piggin2022-12-021-58/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | These are now unused. Remove. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-3-npiggin@gmail.com
| * | powerpc/tlb: Add local flush for page given mm_struct and psizeBenjamin Gray2022-11-303-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a local TLB flush operation that works given an mm_struct, VA to flush, and page size representation. Most implementations mirror the surrounding code. The book3s/32/tlbflush.h implementation is left as a BUILD_BUG because it is more complicated and not required for anything as yet. This removes the need to create a vm_area_struct, which the temporary patching mm work does not need. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-8-bgray@linux.ibm.com
| * | powerpc/mm: Remove flush_all_mm, local_flush_all_mmBenjamin Gray2022-11-302-37/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These functions were introduced for "cxl: Enable global TLBIs for cxl contexts" [1], which ended up using them for Radix only. They were never implemented on Hash (and creating an implementation appears to be difficult), so nothing can actually rely on them. They behave differently to the existing surrounding functions too, in that they actually need to do something on Hash. The other functions are primarily for use in generic code that expects their definitions, but Hash updates the TLB during PTE updates. After replacing the only usage with the Radix specific version, there are no more users of these functions, and given they are not implemented anyway it is safe to delete them. [1]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20170903181513.29635-1-fbarrat@linux.vnet.ibm.com/ Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-7-bgray@linux.ibm.com
| * | cxl: Use radix__flush_all_mm instead of generic flush_all_mmBenjamin Gray2022-11-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic implementation of this function isn't really generic (Hash is not implemented). Unfortunately, the runtime warnings cannot be replaced with BUILD_BUG's, so it seems safer not to provide a stub in the first place. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-6-bgray@linux.ibm.com
| * | powerpc/mm: Remove empty hash__ functionsBenjamin Gray2022-11-302-46/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The empty hash__* functions are unnecessary. The empty definitions were introduced when 64-bit Hash support was added, as the functions were still used in generic code. These empty definitions were prefixed with hash__ when Radix support was added, and new wrappers with the original names were added that selected the Radix or Hash version based on radix_enabled(). But the hash__ prefixed functions were not part of a public interface, so there is no need to include them for compatibility with anything. Generic code will use the non-prefixed wrappers, and Hash specific code will know that there is no point in calling them (or even worse, call them and expect them to do something). Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-5-bgray@linux.ibm.com
| * | powerpc: Allow clearing and restoring registers independent of saved ↵Jordan Niethe2022-11-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | breakpoint state For the coming temporary mm used for instruction patching, the breakpoint registers need to be cleared to prevent them from accidentally being triggered. As soon as the patching is done, the breakpoints will be restored. The breakpoint state is stored in the per-cpu variable current_brk[]. Add a suspend_breakpoints() function which will clear the breakpoint registers without touching the state in current_brk[]. Add a pair function restore_breakpoints() which will move the state in current_brk[] back to the registers. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-2-bgray@linux.ibm.com
| * | powerpc/ps3: mark ps3_system_bus_type staticChristoph Hellwig2022-11-301-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ps3_system_bus_type is only used inside of system-bus.c, so remove the external declaration and the very outdated comment next to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221122072225.423432-1-hch@lst.de
| * | Merge branch 'fixes' into nextMichael Ellerman2022-11-303-0/+14
| |\ \ | | | | | | | | | | | | | | | | Merge our fixes branch to bring in some changes that are prerequisites for work in next.
| * \ \ Merge branch 'topic/ppc-kvm' into nextMichael Ellerman2022-11-301-0/+12
| |\ \ \ | | | | | | | | | | | | | | | Merge our KVM topic branch.
| | * | | KVM: PPC: Book3E: Fix CONFIG_TRACE_IRQFLAGS supportNicholas Piggin2022-11-301-0/+12
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 32-bit does not trace_irqs_off() to match the trace_irqs_on() call in kvmppc_fix_ee_before_entry(). This can lead to irqs being enabled twice in the trace, and the irqs-off region between guest exit and the host enabling local irqs again is not properly traced. 64-bit code does call this, but from asm code where volatiles are live and so incorrectly get clobbered. Move the irq reconcile into C to fix both problems. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-2-npiggin@gmail.com
| * | | powerpc/pseries: Fix the H_CALL error code in PLPKS driverNayna Jain2022-11-241-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAPR Spec defines H_P1 actually as H_PARAMETER and maps H_ABORTED to a different numerical value. Fix the error codes as per PAPR Specification. Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore") Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221106205839.600442-3-nayna@linux.ibm.com
| * | | powerpc: remove the last remnants of cputime_tNicholas Piggin2022-11-241-16/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cputime_t was a core kernel type, removed by commits ed5c8c854f2b..b672592f0221. As explained in commit b672592f0221 ("sched/cputime: Remove generic asm headers"), the final cleanup is for the arch to provide cputime_to_nsec[s](). Commit ade7667a981b ("powerpc: Add cputime_to_nsecs()") did that, but justdidn't remove the then-unused cputime_to_usecs(), cputime_t type, and associated remnants. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006105653.115829-1-npiggin@gmail.com
| * | | powerpc/8xx: Reverse order entries are written by __set_pte_at()Christophe Leroy2022-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the time being, with 16k pages __set_pte_at() writes table entries in reverse order: 294: 91 49 00 0c stw r10,12(r9) 298: 91 49 00 08 stw r10,8(r9) 29c: 91 49 00 04 stw r10,4(r9) 2a0: 91 49 00 00 stw r10,0(r9) Allthough there should be no impact at all as it stays in a single cacheline, reverse the writing in a more natural order. 288: 91 49 00 0c stw r10,0(r9) 28c: 91 49 00 08 stw r10,4(r9) 290: 91 49 00 04 stw r10,8(r9) 294: 91 49 00 00 stw r10,12(r9) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/67c3b5d44edfec054234ea9b4d05fc4b4f7f8a0e.1664346554.git.christophe.leroy@csgroup.eu
| * | | powerpc/8xx: Simplify pte_update() with 16k pagesChristophe Leroy2022-11-241-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While looking at code generated for code patching, I saw that pte_clear generated: 2d8: 38 a0 00 00 li r5,0 2dc: 38 e0 10 00 li r7,4096 2e0: 39 00 20 00 li r8,8192 2e4: 39 40 30 00 li r10,12288 2e8: 90 a9 00 00 stw r5,0(r9) 2ec: 90 e9 00 04 stw r7,4(r9) 2f0: 91 09 00 08 stw r8,8(r9) 2f4: 91 49 00 0c stw r10,12(r9) With 16k pages, only the first entry is used by the kernel, so no need to adapt the address of other entries. Only duplicate the first entry for hardware. Now it is: 2cc: 39 40 00 00 li r10,0 2d0: 91 49 00 00 stw r10,0(r9) 2d4: 91 49 00 04 stw r10,4(r9) 2d8: 91 49 00 08 stw r10,8(r9) 2dc: 91 49 00 0c stw r10,12(r9) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/65f76300de07091a59a042a3db2d0ce9b939a05c.1664346532.git.christophe.leroy@csgroup.eu
| * | | powerpc/kvm: Remove unused references for MMCR3/SIER2/SIER3 registersKajol Jain2022-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 57dc0eed73ca ("KVM: PPC: Book3S HV P9: Implement PMU save/restore in C") removed the PMU save/restore functions from assembly code and implemented these functions in C, for power9 and later platforms. After the code refactoring, Performance Monitoring Unit (PMU) registers became part of "p9_host_os_sprs" structure and now this structure is used to save/restore pmu host registers, for power9 and later platfroms. But we still have old unused registers references. Patch removes unused host_mmcr references for Monitor Mode Control Register 3 (MMCR3)/ Sampled Instruction Event Register 2 (SIER2)/ SIER3 registers from "struct kvmppc_host_state". Fixes: 57dc0eed73ca ("KVM: PPC: Book3S HV P9: Implement PMU save/restore in C") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220916105736.268153-3-disgoel@linux.vnet.ibm.com
| * | | powerpc: add compile-time support for lbarx, lharxNicholas Piggin2022-11-241-1/+230
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ISA v2.06 (POWER7 and up) as well as e6500 support lbarx and lharx. Add a compile option that allows code to use it, and add support in cmpxchg and xchg 8 and 16 bit values without shifting and masking. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220909052312.63916-1-npiggin@gmail.com