summaryrefslogtreecommitdiffstats
path: root/arch/powerpc
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/tm: P9 disable transactionally suspended sigcontextsMichael Neuling2017-10-213-0/+11
| | | | | | | | | | | | | | | | | | | | | Unfortunately userspace can construct a sigcontext which enables suspend. Thus userspace can force Linux into a path where trechkpt is executed. This patch blocks this from happening on POWER9 by sanity checking sigcontexts passed in. ptrace doesn't have this problem as only MSR SE and BE can be changed via ptrace. This patch also adds a number of WARN_ON()s in case we ever enter suspend when we shouldn't. This should not happen, but if it does the symptoms are soft lockup warnings which are not obviously TM related, so the WARN_ON()s should make it obvious what's happening. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Enable TM without suspend if possibleMichael Ellerman2017-10-216-0/+42
| | | | | | | | | | | | | | | Some Power9 revisions can run in a mode where TM operates without suspended state. If we find ourself on a CPU that might be in this mode, we query OPAL to check, and if so we reenable TM in CPU features, and enable a new user feature to signal to userspace that we are in this mode. We do not enable the "normal" user feature, PPC_FEATURE2_HTM, but we do enable PPC_FEATURE2_HTM_NOSC because that indicates to userspace that the kernel will abort transactions on syscall entry, which is true regardless of the suspend mode. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Add PPC_FEATURE2_HTM_NO_SUSPENDMichael Ellerman2017-10-201-0/+1
| | | | | | | | | | | | | | Some CPUs can operate in a mode where TM (Transactional Memory) is enabled but the suspended state of TM is disabled. In this mode tsuspend does not enter suspended state, instead the transaction is aborted. Similarly any other event that would lead to suspended state instead aborts the transaction. There is also an ABI change, in that in this mode processes are not allowed to sigreturn with an MSR that would lead to suspended state, Linux will instead return an error to the sigreturn syscall. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/tm: Add commandline option to disable hardware transactional memoryCyril Bur2017-10-201-0/+31
| | | | | | | | | | | | | | | | | | | Currently the kernel relies on firmware to inform it whether or not the CPU supports HTM and as long as the kernel was built with CONFIG_PPC_TRANSACTIONAL_MEM=y then it will allow userspace to make use of the facility. There may be situations where it would be advantageous for the kernel to not allow userspace to use HTM, currently the only way to achieve this is to recompile the kernel with CONFIG_PPC_TRANSACTIONAL_MEM=n. This patch adds a simple commandline option so that HTM can be disabled at boot time. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Simplify to a bool, move to prom.c, put doco in the right place. Always disable, regardless of initial state, to avoid user confusion.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge branch 'topic/ppc-kvm' into nextMichael Ellerman2017-10-205-50/+14
|\ | | | | | | Bring in some KVM commits we need (the TM one in particular).
| * KVM: PPC: Tie KVM_CAP_PPC_HTM to the user-visible TM featureMichael Ellerman2017-10-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we use CPU_FTR_TM to decide if the CPU/kernel can support TM (Transactional Memory), and if it's true we advertise that to Qemu (or similar) via KVM_CAP_PPC_HTM. PPC_FEATURE2_HTM is the user-visible feature bit, which indicates that the CPU and kernel can support TM. Currently CPU_FTR_TM and PPC_FEATURE2_HTM always have the same value, either true or false, so using the former for KVM_CAP_PPC_HTM is correct. However some Power9 CPUs can operate in a mode where TM is enabled but TM suspended state is disabled. In this mode CPU_FTR_TM is true, but PPC_FEATURE2_HTM is false. Instead a different PPC_FEATURE2 bit is set, to indicate that this different mode of TM is available. It is not safe to let guests use TM as-is, when the CPU is in this mode. So to prevent that from happening, use PPC_FEATURE2_HTM to determine the value of KVM_CAP_PPC_HTM. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * Revert "KVM: PPC: Book3S HV: POWER9 does not require secondary thread ↵Paul Mackerras2017-10-194-48/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | management" This reverts commit 94a04bc25a2c6296bd0c5e82c10e8231c2b11f77. In order to run HPT guests on a radix POWER9 host, we will have to run the host in single-threaded mode, because POWER9 processors do not currently support running some threads of a core in HPT mode while others are in radix mode ("mixed mode"). That means that we will need the same mechanisms that are used on POWER8 to make the secondary threads available to KVM, which were disabled on POWER9 by commit 94a04bc25a2c. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/vphn: Fix numa update end-loop bugMichael Bringmann2017-10-161-2/+8
| | | | | | | | | | | | | | | | | | | | | | powerpc/vphn: On Power systems with shared configurations of CPUs and memory, there are some issues with the association of additional CPUs and memory to nodes when hot-adding resources. This patch fixes an end-of-updates processing problem observed occasionally in numa_update_cpu_topology(). Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/hotplug: Improve responsiveness of hotplug changeMichael Bringmann2017-10-163-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | powerpc/hotplug: On Power systems with shared configurations of CPUs and memory, there are some issues with the association of additional CPUs and memory to nodes when hot-adding resources. During hotplug CPU operations, this patch resets the timer on topology update work function to a small value to better ensure that the CPU topology is detected and configured sooner. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/vphn: Improve recognition of PRRN/VPHNMichael Bringmann2017-10-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | powerpc/vphn: On Power systems with shared configurations of CPUs and memory, there are some issues with the association of additional CPUs and memory to nodes when hot-adding resources. This patch updates the initialization checks to independently recognize PRRN or VPHN support. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/vphn: Update CPU topology when VPHN enabledMichael Bringmann2017-10-161-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | powerpc/vphn: On Power systems with shared configurations of CPUs and memory, there are some issues with the association of additional CPUs and memory to nodes when hot-adding resources. This patch corrects the currently broken capability to set the topology for shared CPUs in LPARs. At boot time for shared CPU lpars, the topology for each CPU was being set to node zero. Now when numa_update_cpu_topology() is called appropriately, the Virtual Processor Home Node (VPHN) capabilities information provided by the pHyp allows the appropriate node in the shared configuration to be selected for the CPU. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mce: hookup memory_failure for UE errorsBalbir Singh2017-10-161-3/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are in user space and hit a UE error, we now have the basic infrastructure to walk the page tables and find out the effective address that was accessed, since the DAR is not valid. We use a work_queue content to hookup the bad pfn, any other context causes problems, since memory_failure itself can call into schedule() via lru_drain_ bits. We could probably poison the struct page to avoid a race between detection and taking corrective action. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mce: Hookup ierror (instruction) UE errorsBalbir Singh2017-10-161-3/+19
| | | | | | | | | | | | | | | | | | | | Hookup instruction errors (UE) for memory offling via memory_failure() in a manner similar to load/store errors (derror). Since we have access to the NIP, the conversion is a one step process in this case. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mce: Hookup derror (load/store) UE errorsBalbir Singh2017-10-165-8/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract physical_address for UE errors by walking the page tables for the mm and address at the NIP, to extract the instruction. Then use the instruction to find the effective address via analyse_instr(). We might have page table walking races, but we expect them to be rare, the physical address extraction is best effort. The idea is to then hook up this infrastructure to memory failure eventually. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mce: Align the print of physical address betterBalbir Singh2017-10-161-1/+1
| | | | | | | | | | | | | | | | Use the same alignment as Effective address. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mce: Remove unused function get_mce_fault_addr()Balbir Singh2017-10-162-41/+0
| | | | | | | | | | | | | | | | | | | | There are no users of get_mce_fault_addr() since commit 1363875bdb63 ("powerpc/64s: fix handling of non-synchronous machine checks") removed the last usage. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/modules: Use WARN_ON() in stub_for_addr()Kamalesh Babulal2017-10-131-1/+2
| | | | | | | | | | | | | | | | Use WARN_ON(), while running out of stubs in stub_for_addr() and abort loading of the module instead of BUG_ON(). Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: get_wchan(): solve possible race scenario due to parallel wakeupKautuk Consul2017-10-061-1/+2
| | | | | | | | | | | | | | | | | | Add a check for p->state == TASK_RUNNING so that any wake-ups on task_struct p in the interim lead to 0 being returned by get_wchan(). Signed-off-by: Kautuk Consul <kautuk.consul.1980@gmail.com> [mpe: Confirmed other architectures do similar] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: Always initialize input array when calling epapr_hypercall()Seth Forshee2017-10-061-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several callers to epapr_hypercall() pass an uninitialized stack allocated array for the input arguments, presumably because they have no input arguments. However this can produce errors like this one arch/powerpc/include/asm/epapr_hcalls.h:470:42: error: 'in' may be used uninitialized in this function [-Werror=maybe-uninitialized] unsigned long register r3 asm("r3") = in[0]; ~~^~~ Fix callers to this function to always zero-initialize the input arguments array to prevent this. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: Add PPC_EMULATED_STATS to powernv_defconfigMichael Neuling2017-10-061-0/+1
| | | | | | | | | | | | | | This is useful, especially for developers. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/xmon: Add option to show uptime informationGuilherme G. Piccoli2017-10-061-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It might be useful to quickly get the uptime of a running system on xmon, without needing to grab data from memory and doing math on struct addresses. For example, it'd be useful to check for how long after a crash a system is on xmon shell or if some test was started after the first test crashed (and this 2nd test crashed too into xmon). This small patch adds the 'U' command, to accomplish this. Suggested-by: Murilo Fossa Vicentini <muvic@linux.vnet.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> [mpe: Display units (seconds), add sync()/__delay() sequence] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/powernv: Make opal_event_shutdown() callable from IRQ contextMichael Ellerman2017-10-061-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In opal_event_shutdown() we free all the IRQs hanging off the opal_event_irqchip. However it's not safe to do so if we're called from IRQ context, because free_irq() wants to synchronise versus IRQ context. This can lead to warnings and a stuck system. For example from sysrq-b: Trying to free IRQ 17 from IRQ context! ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1461 __free_irq+0x398/0x8d0 ... NIP __free_irq+0x398/0x8d0 LR __free_irq+0x394/0x8d0 Call Trace: __free_irq+0x394/0x8d0 (unreliable) free_irq+0xa4/0x140 opal_event_shutdown+0x128/0x180 opal_shutdown+0x1c/0xb0 pnv_shutdown+0x20/0x40 machine_restart+0x38/0x90 emergency_restart+0x28/0x40 sysrq_handle_reboot+0x24/0x40 __handle_sysrq+0x198/0x590 hvc_poll+0x48c/0x8c0 hvc_handle_interrupt+0x1c/0x50 __handle_irq_event_percpu+0xe8/0x6e0 handle_irq_event_percpu+0x34/0xe0 handle_irq_event+0xc4/0x210 handle_level_irq+0x250/0x770 generic_handle_irq+0x5c/0xa0 opal_handle_events+0x11c/0x240 opal_interrupt+0x38/0x50 __handle_irq_event_percpu+0xe8/0x6e0 handle_irq_event_percpu+0x34/0xe0 handle_irq_event+0xc4/0x210 handle_fasteoi_irq+0x174/0xa10 generic_handle_irq+0x5c/0xa0 __do_irq+0xbc/0x4e0 call_do_irq+0x14/0x24 do_IRQ+0x18c/0x540 hardware_interrupt_common+0x158/0x180 We can avoid that by using disable_irq_nosync() rather than free_irq(). Although it doesn't fully free the IRQ, it should be sufficient when we're shutting down, particularly in an emergency. Add an in_interrupt() check and use free_irq() when we're shutting down normally. It's probably OK to use disable_irq_nosync() in that case too, but for now it's safer to leave that behaviour as-is. Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events") Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/jprobes: Validate break handler invocation as being due to a ↵Naveen N. Rao2017-10-051-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | jprobe_return() Fix a circa 2005 FIXME by implementing a check to ensure that we actually got into the jprobe break handler() due to the trap in jprobe_return(). Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/jprobes: Disable preemption when triggered through ftraceNaveen N. Rao2017-10-051-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KPROBES_SANITY_TEST throws the below splat when CONFIG_PREEMPT is enabled: Kprobe smoke test: started DEBUG_LOCKS_WARN_ON(val > preempt_count()) ------------[ cut here ]------------ WARNING: CPU: 19 PID: 1 at kernel/sched/core.c:3094 preempt_count_sub+0xcc/0x140 Modules linked in: CPU: 19 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc7-nnr+ #97 task: c0000000fea80000 task.stack: c0000000feb00000 NIP: c00000000011d3dc LR: c00000000011d3d8 CTR: c000000000a090d0 REGS: c0000000feb03400 TRAP: 0700 Not tainted (4.13.0-rc7-nnr+) MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 28000282 XER: 00000000 CFAR: c00000000015aa18 SOFTE: 0 <snip> NIP preempt_count_sub+0xcc/0x140 LR preempt_count_sub+0xc8/0x140 Call Trace: preempt_count_sub+0xc8/0x140 (unreliable) kprobe_handler+0x228/0x4b0 program_check_exception+0x58/0x3b0 program_check_common+0x16c/0x170 --- interrupt: 0 at kprobe_target+0x8/0x20 LR = init_test_probes+0x248/0x7d0 kp+0x0/0x80 (unreliable) livepatch_handler+0x38/0x74 init_kprobes+0x1d8/0x208 do_one_initcall+0x68/0x1d0 kernel_init_freeable+0x298/0x374 kernel_init+0x24/0x160 ret_from_kernel_thread+0x5c/0x70 Instruction dump: 419effdc 3d22001b 39299240 81290000 2f890000 409effc8 3c82ffcb 3c62ffcb 3884bc68 3863bc18 4803d5fd 60000000 <0fe00000> 4bffffa8 60000000 60000000 ---[ end trace 432dd46b4ce3d29f ]--- Kprobe smoke test: passed successfully The issue is that we aren't disabling preemption in kprobe_ftrace_handler(). Disable it. Fixes: ead514d5fb30a0 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Trim oops a little for formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/kprobes: Fix warnings from __this_cpu_read() on preempt kernelsNaveen N. Rao2017-10-041-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kamalesh pointed out that we are getting the below call traces with livepatched functions when we enable CONFIG_PREEMPT: [ 495.470721] BUG: using __this_cpu_read() in preemptible [00000000] code: cat/8394 [ 495.471167] caller is is_current_kprobe_addr+0x30/0x90 [ 495.471171] CPU: 4 PID: 8394 Comm: cat Tainted: G K 4.13.0-rc7-nnr+ #95 [ 495.471173] Call Trace: [ 495.471178] [c00000008fd9b960] [c0000000009f039c] dump_stack+0xec/0x160 (unreliable) [ 495.471184] [c00000008fd9b9a0] [c00000000059169c] check_preemption_disabled+0x15c/0x170 [ 495.471187] [c00000008fd9ba30] [c000000000046460] is_current_kprobe_addr+0x30/0x90 [ 495.471191] [c00000008fd9ba60] [c00000000004e9a0] ftrace_call+0x1c/0xb8 [ 495.471195] [c00000008fd9bc30] [c000000000376fd8] seq_read+0x238/0x5c0 [ 495.471199] [c00000008fd9bcd0] [c0000000003cfd78] proc_reg_read+0x88/0xd0 [ 495.471203] [c00000008fd9bd00] [c00000000033e5d4] __vfs_read+0x44/0x1b0 [ 495.471206] [c00000008fd9bd90] [c0000000003402ec] vfs_read+0xbc/0x1b0 [ 495.471210] [c00000008fd9bde0] [c000000000342138] SyS_read+0x68/0x110 [ 495.471214] [c00000008fd9be30] [c00000000000bc6c] system_call+0x58/0x6c Commit c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for jprobes") introduced a helper is_current_kprobe_addr() to help determine if the current function has been livepatched or if it has a jprobe installed, both of which modify the NIP. This was subsequently renamed to __is_active_jprobe(). In the case of a jprobe, kprobe_ftrace_handler() disables pre-emption before calling into setjmp_pre_handler() which returns without disabling pre-emption. This is done to ensure that the jprobe handler won't disappear beneath us if the jprobe is unregistered between the setjmp_pre_handler() and the subsequent longjmp_break_handler() called from the jprobe handler. Due to this, we can use __this_cpu_read() in __is_active_jprobe() with the pre-emption check as we know that pre-emption will be disabled. However, if this function has been livepatched, we are still doing this check and when we do so, pre-emption won't necessarily be disabled. This results in the call trace shown above. Fix this by only invoking __is_active_jprobe() when pre-emption is disabled. And since we now guard this within a pre-emption check, we can instead use raw_cpu_read() to get the current_kprobe value skipping the check done by __this_cpu_read(). Fixes: c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for jprobes") Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/kprobes: Clean up jprobe detection in livepatch handlerNaveen N. Rao2017-10-044-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit c05b8c4474c03 ("powerpc/kprobes: Skip livepatch_handler() for jprobes"), we added a helper is_current_kprobe_addr() to help detect if the modified regs->nip was due to a jprobe or livepatch. Masami felt that the function name was not quite clear. To that end, this patch renames is_current_kprobe_addr() to __is_active_jprobe() and adds a comment to (hopefully) better clarify the purpose of this helper. The helper has also now been moved to kprobes-ftrace.c so that it is only available for KPROBES_ON_FTRACE. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/kprobes: Do not suppress instruction emulation if a single run failedNaveen N. Rao2017-10-041-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | Currently, we disable instruction emulation if emulate_step() fails for any reason. However, such failures could be transient and specific to a particular run. Instead, only disable instruction emulation if we have never been able to emulate this. If we had emulated this instruction successfully at least once, then we single step only this probe hit and continue to try emulating the instruction in subsequent probe hits. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/kprobes: Some cosmetic updates to try_to_emulate()Naveen N. Rao2017-10-041-2/+2
| | | | | | | | | | | | | | | | | | | | 1. This is only used in kprobes.c, so make it static. 2. Remove the un-necessary (ret == 0) comparison in the else clause. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/configs: Add Skiroot defconfigJoel Stanley2017-10-041-0/+232
| | | | | | | | | | | | | | | | | | | | This configuration is used by the OpenPower firmware for it's Linux-as-bootloader implementation. Also known as the Petitboot kernel, this configuration broke in 4.12 (CPU_HOTPLUG=n), so add it to the upstream tree in order to get better coverage. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/lib/sstep: Fix fixed-point shift instructions that set CA32Sandipan Das2017-10-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the emulated behaviour of existing fixed-point shift right algebraic instructions that are supposed to set both the CA and CA32 bits of XER when running on a system that is compliant with POWER ISA v3.0 independent of whether the system is executing in 32-bit mode or 64-bit mode. The following instructions are affected: * Shift Right Algebraic Word Immediate (srawi[.]) * Shift Right Algebraic Word (sraw[.]) * Shift Right Algebraic Doubleword Immediate (sradi[.]) * Shift Right Algebraic Doubleword (srad[.]) Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()") Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/lib/sstep: Fix fixed-point arithmetic instructions that set CA32Sandipan Das2017-10-041-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are existing fixed-point arithmetic instructions that always set the CA bit of XER to reflect the carry out of bit 0 in 64-bit mode and out of bit 32 in 32-bit mode. In ISA v3.0, these instructions also always set the CA32 bit of XER to reflect the carry out of bit 32. This fixes the emulated behaviour of such instructions when running on a system that is compliant with POWER ISA v3.0. The following instructions are affected: * Add Immediate Carrying (addic) * Add Immediate Carrying and Record (addic.) * Subtract From Immediate Carrying (subfic) * Add Carrying (addc[.]) * Subtract From Carrying (subfc[.]) * Add Extended (adde[.]) * Subtract From Extended (subfe[.]) * Add to Minus One Extended (addme[.]) * Subtract From Minus One Extended (subfme[.]) * Add to Zero Extended (addze[.]) * Subtract From Zero Extended (subfze[.]) Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()") Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/lib/sstep: Add XER bits introduced in POWER ISA v3.0Sandipan Das2017-10-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds definitions for the OV32 and CA32 bits of XER that were introduced in POWER ISA v3.0. There are some existing instructions that currently set the OV and CA bits based on certain conditions. The emulation behaviour of all these instructions needs to be updated to set these new bits accordingly. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/powermac: Use setup_timer() helperAllen Pais2017-10-041-3/+1
| | | | | | | | | | | | | | | | Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/6xx: Use setup_timer() helperAllen Pais2017-10-041-2/+1
| | | | | | | | | | | | | | | | Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/oprofile: Use setup_timer() helperAllen Pais2017-10-041-6/+2
| | | | | | | | | | | | | | | | Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/powernv: Use early_radix_enabled in POWER9 tlb flushNicholas Piggin2017-10-041-1/+1
| | | | | | | | | | | | | | | | This code is used at boot and machine checks, so it should be using early_radix_enabled() (which is usable any time). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESETNicholas Piggin2017-10-045-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows MSR[EE]=0 lockups to be detected on an OPAL (bare metal) system similarly to the hcall NMI IPI on pseries guests, when the platform/firmware supports it. This is an example of CPU10 spinning with interrupts hard disabled: Watchdog CPU:32 detected Hard LOCKUP other CPUS:10 Watchdog CPU:10 Hard LOCKUP CPU: 10 PID: 4410 Comm: bash Not tainted 4.13.0-rc7-00074-ge89ce1f89f62-dirty #34 task: c0000003a82b4400 task.stack: c0000003af55c000 NIP: c0000000000a7b38 LR: c000000000659044 CTR: c0000000000a7b00 REGS: c00000000fd23d80 TRAP: 0100 Not tainted (4.13.0-rc7-00074-ge89ce1f89f62-dirty) MSR: 90000000000c1033 <SF,HV,ME,IR,DR,RI,LE> CR: 28422222 XER: 20000000 CFAR: c0000000000a7b38 SOFTE: 0 GPR00: c000000000659044 c0000003af55fbb0 c000000001072a00 0000000000000078 GPR04: c0000003c81b5c80 c0000003c81cc7e8 9000000000009033 0000000000000000 GPR08: 0000000000000000 c0000000000a7b00 0000000000000001 9000000000001003 GPR12: c0000000000a7b00 c00000000fd83200 0000000010180df8 0000000010189e60 GPR16: 0000000010189ed8 0000000010151270 000000001018bd88 000000001018de78 GPR20: 00000000370a0668 0000000000000001 00000000101645e0 0000000010163c10 GPR24: 00007fffd14d6294 00007fffd14d6290 c000000000fba6f0 0000000000000004 GPR28: c000000000f351d8 0000000000000078 c000000000f4095c 0000000000000000 NIP [c0000000000a7b38] sysrq_handle_xmon+0x38/0x40 LR [c000000000659044] __handle_sysrq+0xe4/0x270 Call Trace: [c0000003af55fbd0] [c000000000659044] __handle_sysrq+0xe4/0x270 [c0000003af55fc70] [c000000000659810] write_sysrq_trigger+0x70/0xa0 [c0000003af55fca0] [c0000000003da650] proc_reg_write+0xb0/0x110 [c0000003af55fcf0] [c0000000003423bc] __vfs_write+0x6c/0x1b0 [c0000003af55fd90] [c000000000344398] vfs_write+0xd8/0x240 [c0000003af55fde0] [c00000000034632c] SyS_write+0x6c/0x110 [c0000003af55fe30] [c00000000000b220] system_call+0x58/0x6c Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use kernel types for opal_signal_system_reset()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/64s: Implement system reset idle wakeup reasonNicholas Piggin2017-10-041-3/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to wake from idle due to a system reset exception, in which case the CPU takes a system reset interrupt to wake from idle, with system reset as the wakeup reason. The regular (not idle wakeup) system reset interrupt handler must be invoked in this case, otherwise the system reset interrupt is lost. Handle the system reset interrupt immediately after CPU state has been restored. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/xmon: Avoid tripping SMP hardlockup watchdogNicholas Piggin2017-10-041-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | The SMP hardlockup watchdog cross-checks other CPUs for lockups, which causes xmon headaches because it's assuming interrupts hard disabled means no watchdog troubles. Try to improve that by calling touch_nmi_watchdog() in obvious places where secondaries are spinning. Also annotate these spin loops with spin_begin/end calls. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/watchdog: Do not trigger SMP crash from touch_nmi_watchdogNicholas Piggin2017-10-041-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In xmon, touch_nmi_watchdog() is not expected to be checking that other CPUs have not touched the watchdog, so the code will just call touch_nmi_watchdog() once before re-enabling hard interrupts. Just update our CPU's state, and ignore apparently stuck SMP threads. Arguably touch_nmi_watchdog should check for SMP lockups, and callers should be fixed, but that's not trivial for the input code of xmon. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/watchdog: Do not backtrace locked CPUs twice if allcpus backtrace is ↵Nicholas Piggin2017-10-041-8/+11
| | | | | | | | | | | | | | | | | | | | | | enabled If sysctl_hardlockup_all_cpu_backtrace is enabled, there is no need to IPI stuck CPUs for backtrace before trigger_allbutself_cpu_backtrace(), which does the same thing again. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/watchdog: Do not panic from locked CPU's IPI handlerNicholas Piggin2017-10-041-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | The SMP watchdog will detect locked CPUs and IPI them to print a backtrace and registers. If panic on hard lockup is enabled, do not panic from this handler, because that can cause recursion into the IPI layer during the panic. The caller already panics in this case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | cxl: Enable global TLBIs for cxl contextsFrederic Barrat2017-09-282-9/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PSL and nMMU need to see all TLB invalidations for the memory contexts used on the adapter. For the hash memory model, it is done by making all TLBIs global as soon as the cxl driver is in use. For radix, we need something similar, but we can refine and only convert to global the invalidations for contexts actually used by the device. The new mm_context_add_copro() API increments the 'active_cpus' count for the contexts attached to the cxl adapter. As soon as there's more than 1 active cpu, the TLBIs for the context become global. Active cpu count must be decremented when detaching to restore locality if possible and to avoid overflowing the counter. The hash memory model support is somewhat limited, as we can't decrement the active cpus count when mm_context_remove_copro() is called, because we can't flush the TLB for a mm on hash. So TLBIs remain global on hash. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Fixes: f24be42aab37 ("cxl: Add psl9 specific code") Tested-by: Alistair Popple <alistair@popple.id.au> [mpe: Fold in updated comment on the barrier from Fred] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mm: Export flush_all_mm()Frederic Barrat2017-09-284-2/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | With the optimizations introduced by commit a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes"), flush_tlb_mm() no longer flushes the page walk cache (PWC) with radix. This patch introduces flush_all_mm(), which flushes everything, TLB and PWC, for a given mm. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> [mpe: Add a WARN_ON_ONCE() in the empty hash routines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/64s: Add workaround for P9 vector CI load issueMichael Neuling2017-09-277-5/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POWER9 DD2.1 and earlier has an issue where some cache inhibited vector load will return bad data. The workaround is two part, one firmware/microcode part triggers HMI interrupts when hitting such loads, the other part is this patch which then emulates the instructions in Linux. The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and lxvh8x. When an instruction triggers the HMI, all threads in the core will be sent to the HMI handler, not just the one running the vector load. In general, these spurious HMIs are detected by the emulation code and we just return back to the running process. Unfortunately, if a spurious interrupt occurs on a vector load that's to normal memory we have no way to detect that it's spurious (unless we walk the page tables, which is very expensive). In this case we emulate the load but we need do so using a vector load itself to ensure 128bit atomicity is preserved. Some additional debugfs emulated instruction counters are added also. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/powernv: Rework EEH initialization on powernvBenjamin Herrenschmidt2017-09-265-60/+40
|/ | | | | | | | | | | | | | | | | | | | | | | | | Remove the post_init callback which is only used by powernv, we can just call it explicitly from the powernv code. This partially kills the ability to "disable" eeh at runtime via debugfs as this was calling that same callback again, but this is both unused and broken in several ways. If we want to revive it, we need to create a dedicated enable/disable callback on the backend that does the right thing. Let the bulk of eeh initialize normally at core_initcall() like it does on pseries by removing the hack in eeh_init() that delays it. Instead we make sure our eeh->probe cleanly bails out of the PEs haven't been created yet and we force a re-probe where we used to call eeh_init() again. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Fix parent_dn reference leak in add_dt_node()Tyrel Datwyler2017-09-211-1/+3
| | | | | | | | | | | | | | A reference to the parent device node is held by add_dt_node() for the node to be added. If the call to dlpar_configure_connector() fails add_dt_node() returns ENOENT and that reference is not freed. Add a call to of_node_put(parent_dn) prior to bailing out after a failed dlpar_configure_connector() call. Fixes: 8d5ff320766f ("powerpc/pseries: Make dlpar_configure_connector parent node aware") Cc: stable@vger.kernel.org # v3.12+ Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Fix "OF: ERROR: Bad of_node_put() on /cpus" during DLPARTyrel Datwyler2017-09-212-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 215ee763f8cb ("powerpc: pseries: remove dlpar_attach_node dependency on full path") reworked dlpar_attach_node() to no longer look up the parent node "/cpus", but instead to have the parent node passed by the caller in the function parameter list. As a result dlpar_attach_node() is no longer responsible for freeing the reference to the parent node. However, commit 215ee763f8cb failed to remove the of_node_put(parent) call in dlpar_attach_node(), or to take into account that the reference to the parent in the caller dlpar_cpu_add() needs to be held until after dlpar_attach_node() returns. As a result doing repeated cpu add/remove dlpar operations will eventually result in the following error: OF: ERROR: Bad of_node_put() on /cpus CPU: 0 PID: 10896 Comm: drmgr Not tainted 4.13.0-autotest #1 Call Trace: dump_stack+0x15c/0x1f8 (unreliable) of_node_release+0x1a4/0x1c0 kobject_put+0x1a8/0x310 kobject_del+0xbc/0xf0 __of_detach_node_sysfs+0x144/0x210 of_detach_node+0xf0/0x180 dlpar_detach_node+0xc4/0x120 dlpar_cpu_remove+0x280/0x560 dlpar_cpu_release+0xbc/0x1b0 arch_cpu_release+0x6c/0xb0 cpu_release_store+0xa0/0x100 dev_attr_store+0x68/0xa0 sysfs_kf_write+0xa8/0xf0 kernfs_fop_write+0x2cc/0x400 __vfs_write+0x5c/0x340 vfs_write+0x1a8/0x3d0 SyS_write+0xa8/0x1a0 system_call+0x58/0x6c Fix the issue by removing the of_node_put(parent) call from dlpar_attach_node(), and ensuring that the reference to the parent node is properly held and released by the caller dlpar_cpu_add(). Fixes: 215ee763f8cb ("powerpc: pseries: remove dlpar_attach_node dependency on full path") Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> [mpe: Add a comment in the code and frob the change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/eeh: Create PHB PEs after EEH is initializedBenjamin Herrenschmidt2017-09-212-18/+4
| | | | | | | | | | | | Otherwise we end up not yet having computed the right diag data size on powernv where EEH initialization is delayed, thus causing memory corruption later on when calling OPAL. Fixes: 5cb1f8fdddb7 ("powerpc/powernv/pci: Dynamically allocate PHB diag data") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/kprobes: Update optprobes to use emulate_update_regs()Naveen N. Rao2017-09-201-1/+3
| | | | | | | | | | | | | | | | | | | Optprobes depended on an updated regs->nip from analyse_instr() to identify the location to branch back from the optprobes trampoline. However, since commit 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs"), analyse_instr() doesn't update the registers anymore. Due to this, we end up branching back from the optprobes trampoline to the same branch into the trampoline resulting in a loop. Fix this by calling out to emulate_update_regs() before using the nip. Additionally, explicitly compare the return value from analyse_instr() to 1, rather than just checking for !0 so as to guard against any future changes to analyse_instr() that may result in -1 being returned in more scenarios. Fixes: 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>