summaryrefslogtreecommitdiffstats
path: root/arch/powerpc
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/pseries: Remove prrn_work workqueueNathan Fontenot2019-04-201-14/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cd24e457fd8b2d087d9236700c8d2957054598bf ] When a PRRN event is received we are already running in a worker thread. Instead of spawning off another worker thread on the prrn_work workqueue to handle the PRRN event we can just call the PRRN handler routine directly. With this update we can also pass the scope variable for the PRRN event directly to the handler instead of it being a global variable. This patch fixes the following oops mnessage we are seeing in PRRN testing: Oops: Bad kernel stack pointer, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes, External 54 CPU: 7 PID: 18967 Comm: kworker/u96:0 Tainted: G X 4.4.126-94.22-default #1 Workqueue: pseries hotplug workque pseries_hp_work_fn task: c000000775367790 ti: c00000001ebd4000 task.ti: c00000070d140000 NIP: 0000000000000000 LR: 000000001fb3d050 CTR: 0000000000000000 REGS: c00000001ebd7d40 TRAP: 0700 Tainted: G X (4.4.126-94.22-default) MSR: 8000000102081000 <41,VEC,ME5 CR: 28000002 XER: 20040018 4 CFAR: 000000001fb3d084 40 419 1 3 GPR00: 000000000000000040000000000010007 000000001ffff400 000000041fffe200 GPR04: 000000000000008050000000000000000 000000001fb15fa8 0000000500000500 GPR08: 000000000001f40040000000000000001 0000000000000000 000005:5200040002 GPR12: 00000000000000005c000000007a05400 c0000000000e89f8 000000001ed9f668 GPR16: 000000001fbeff944000000001fbeff94 000000001fb545e4 0000006000000060 GPR20: ffffffffffffffff4ffffffffffffffff 0000000000000000 0000000000000000 GPR24: 00000000000000005400000001fb3c000 0000000000000000 000000001fb1b040 GPR28: 000000001fb240004000000001fb440d8 0000000000000008 0000000000000000 NIP [0000000000000000] 5 (null) LR [000000001fb3d050] 031fb3d050 Call Trace: 4 Instruction dump: 4 5:47 12 2 XXXXXXXX XXXXXXXX XXXXX4XX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXX5XX XXXXXXXX 60000000 60000000 60000000 60000000 ---[ end trace aa5627b04a7d9d6b ]--- 3NMI watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [kworker/27:0:13903] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes, External CPU: 27 PID: 13903 Comm: kworker/27:0 Tainted: G D X 4.4.126-94.22-default #1 Workqueue: events prrn_work_fn task: c000000747cfa390 ti: c00000074712c000 task.ti: c00000074712c000 NIP: c0000000008002a8 LR: c000000000090770 CTR: 000000000032e088 REGS: c00000074712f7b0 TRAP: 0901 Tainted: G D X (4.4.126-94.22-default) MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22482044 XER: 20040000 CFAR: c0000000008002c4 SOFTE: 1 GPR00: c000000000090770 c00000074712fa30 c000000000f09800 c000000000fa1928 6:02 GPR04: c000000775f5e000 fffffffffffffffe 0000000000000001 c000000000f42db8 GPR08: 0000000000000001 0000000080000007 0000000000000000 0000000000000000 GPR12: 8006210083180000 c000000007a14400 NIP [c0000000008002a8] _raw_spin_lock+0x68/0xd0 LR [c000000000090770] mobility_rtas_call+0x50/0x100 Call Trace: 59 5 [c00000074712fa60] [c000000000090770] mobility_rtas_call+0x50/0x100 [c00000074712faf0] [c000000000090b08] pseries_devicetree_update+0xf8/0x530 [c00000074712fc20] [c000000000031ba4] prrn_work_fn+0x34/0x50 [c00000074712fc40] [c0000000000e0390] process_one_work+0x1a0/0x4e0 [c00000074712fcd0] [c0000000000e0870] worker_thread+0x1a0/0x6105:57 2 [c00000074712fd80] [c0000000000e8b18] kthread+0x128/0x150 [c00000074712fe30] [c0000000000096f8] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 2c090000 40c20010 7d40192d 40c2fff0 7c2004ac 2fa90000 40de0018 5:540030 3 e8010010 ebe1fff8 7c0803a6 4e800020 <7c210b78> e92d0000 89290009 792affe3 Signed-off-by: John Allen <jallen@linux.ibm.com> Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEMBreno Leitao2019-04-171-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 897bc3df8c5aebb54c32d831f917592e873d0559 upstream. Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") moved a code block around and this block uses a 'msr' variable outside of the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when CONFIG_PPC_TRANSACTION_MEM is not defined. error: 'msr' undeclared (first use in this function) This is not causing a compilation error in the mainline kernel, because 'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set: #define MSR_TM_ACTIVE(x) 0 This patch just fixes this issue avoiding the 'msr' variable usage outside the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the MSR_TM_ACTIVE() definition. Cc: stable@vger.kernel.org Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/pseries: Perform full re-add of CPU for topology update post-migrationNathan Fontenot2019-04-053-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 81b61324922c67f73813d8a9c175f3c153f6a1c6 ] On pseries systems, performing a partition migration can result in altering the nodes a CPU is assigned to on the destination system. For exampl, pre-migration on the source system CPUs are in node 1 and 3, post-migration on the destination system CPUs are in nodes 2 and 3. Handling the node change for a CPU can cause corruption in the slab cache if we hit a timing where a CPUs node is changed while cache_reap() is invoked. The corruption occurs because the slab cache code appears to rely on the CPU and slab cache pages being on the same node. The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c does not prevent us from hitting this scenario. Changing the device tree property update notification handler that recognizes an affinity change for a CPU to do a full DLPAR remove and add of the CPU instead of dynamically changing its node resolves this issue. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com> Tested-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/64s: Clear on-stack exception marker upon exception returnNicolai Stange2019-04-051-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit eddd0b332304d554ad6243942f87c2fcea98c56b ] The ppc64 specific implementation of the reliable stacktracer, save_stack_trace_tsk_reliable(), bails out and reports an "unreliable trace" whenever it finds an exception frame on the stack. Stack frames are classified as exception frames if the STACK_FRAME_REGS_MARKER magic, as written by exception prologues, is found at a particular location. However, as observed by Joe Lawrence, it is possible in practice that non-exception stack frames can alias with prior exception frames and thus, that the reliable stacktracer can find a stale STACK_FRAME_REGS_MARKER on the stack. It in turn falsely reports an unreliable stacktrace and blocks any live patching transition to finish. Said condition lasts until the stack frame is overwritten/initialized by function call or other means. In principle, we could mitigate this by making the exception frame classification condition in save_stack_trace_tsk_reliable() stronger: in addition to testing for STACK_FRAME_REGS_MARKER, we could also take into account that for all exceptions executing on the kernel stack - their stack frames's backlink pointers always match what is saved in their pt_regs instance's ->gpr[1] slot and that - their exception frame size equals STACK_INT_FRAME_SIZE, a value uncommonly large for non-exception frames. However, while these are currently true, relying on them would make the reliable stacktrace implementation more sensitive towards future changes in the exception entry code. Note that false negatives, i.e. not detecting exception frames, would silently break the live patching consistency model. Furthermore, certain other places (diagnostic stacktraces, perf, xmon) rely on STACK_FRAME_REGS_MARKER as well. Make the exception exit code clear the on-stack STACK_FRAME_REGS_MARKER for those exceptions running on the "normal" kernel stack and returning to kernelspace: because the topmost frame is ignored by the reliable stack tracer anyway, returns to userspace don't need to take care of clearing the marker. Furthermore, as I don't have the ability to test this on Book 3E or 32 bits, limit the change to Book 3S and 64 bits. Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model") Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callbackAneesh Kumar K.V2019-04-051-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5330367fa300742a97e20e953b1f77f48392faae ] After we ALIGN up the address we need to make sure we didn't overflow and resulted in zero address. In that case, we need to make sure that the returned address is greater than mmap_min_addr. This fixes selftest va_128TBswitch --run-hugetlb reporting failures when run as non root user for mmap(-1, MAP_HUGETLB) The bug is that a non-root user requesting address -1 will be given address 0 which will then fail, whereas they should have been given something else that would have succeeded. We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address with this change. So we think this is not a security issue, because it only affects whether we choose an address below mmap_min_addr, not whether we actually allow that address to be mapped. ie. there are existing capability checks to prevent a user mapping below mmap_min_addr and those will still be honoured even without this fix. Fixes: 484837601d4d ("powerpc/mm: Add radix support for hugetlb") Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpcNathan Chancellor2019-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e7140639b1de65bba435a6bd772d134901141f86 ] When building with -Wsometimes-uninitialized, Clang warns: arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (cpu_has_feature(CPU_FTRS_POWER9)) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here if (opcode == NULL) ^~~~~~ arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its condition is always true if (cpu_has_feature(CPU_FTRS_POWER9)) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable 'opcode' to silence this warning const struct powerpc_opcode *opcode; ^ = NULL 1 warning generated. This warning seems to make no sense on the surface because opcode is set to NULL right below this statement. However, there is a comma instead of semicolon to end the dialect assignment, meaning that the opcode assignment only happens in the if statement. Properly terminate that line so that Clang no longer warns. Fixes: 5b102782c7f4 ("powerpc/xmon: Enable disassembly files (compilation changes)") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tablesAlexey Kardashevskiy2019-04-052-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 11f5acce2fa43b015a8120fa7620fa4efd0a2952 ] We store 2 multilevel tables in iommu_table - one for the hardware and one with the corresponding userspace addresses. Before allocating the tables, the iommu_table_group_ops::get_table_size() hook returns the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts the locked_vm counter correctly. When the table is actually allocated, the amount of allocated memory is stored in iommu_table::it_allocated_size and used to decrement the locked_vm counter when we release the memory used by the table; .get_table_size() and .create_table() calculate it independently but the result is expected to be the same. However the allocator does not add the userspace table size to .it_allocated_size so when we destroy the table because of VFIO PCI unplug (i.e. VFIO container is gone but the userspace keeps running), we decrement locked_vm by just a half of size of memory we are releasing. To make things worse, since we enabled on-demand allocation of indirect levels, it_allocated_size contains only the amount of memory actually allocated at the table creation time which can just be a fraction. It is not a problem with incrementing locked_vm (as get_table_size() value is used) but it is with decrementing. As the result, we leak locked_vm and may not be able to allocate more IOMMU tables after few iterations of hotplug/unplug. This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table() hook to what pnv_pci_ioda2_get_table_size() returns so from now on we have a single place which calculates the maximum memory a table can occupy. The original meaning of it_allocated_size is somewhat lost now though. We do not ditch it_allocated_size whatsoever here and we do not call get_table_size() from vfio_iommu_spapr_tce.c when decrementing locked_vm as we may have multiple IOMMU groups per container and even though they all are supposed to have the same get_table_size() implementation, there is a small chance for failure or confusion. Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/64: Fix memcmp reading past the end of src/destMichael Ellerman2019-04-031-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d9470757398a700d9450a43508000bcfd010c7a4 upstream. Chandan reported that fstests' generic/026 test hit a crash: BUG: Unable to handle kernel data access at 0xc00000062ac40000 Faulting instruction address: 0xc000000000092240 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries CPU: 0 PID: 27828 Comm: chacl Not tainted 5.0.0-rc2-next-20190115-00001-g6de6dba64dda #1 NIP: c000000000092240 LR: c00000000066a55c CTR: 0000000000000000 REGS: c00000062c0c3430 TRAP: 0300 Not tainted (5.0.0-rc2-next-20190115-00001-g6de6dba64dda) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 44000842 XER: 20000000 CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: 40000000 IRQMASK: 0 GPR00: 0000000000000000 c00000062c0c36c0 c0000000017f4c00 c00000000121a660 GPR04: c00000062ac3fff9 0000000000000004 0000000000000020 00000000275b19c4 GPR08: 000000000000000c 46494c4500000000 5347495f41434c5f c0000000026073a0 GPR12: 0000000000000000 c0000000027a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c00000062ea70020 c00000062c0c38d0 0000000000000002 0000000000000002 GPR24: c00000062ac3ffe8 00000000275b19c4 0000000000000001 c00000062ac30000 GPR28: c00000062c0c38d0 c00000062ac30050 c00000062ac30058 0000000000000000 NIP memcmp+0x120/0x690 LR xfs_attr3_leaf_lookup_int+0x53c/0x5b0 Call Trace: xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable) xfs_da3_node_lookup_int+0x32c/0x5a0 xfs_attr_node_addname+0x170/0x6b0 xfs_attr_set+0x2ac/0x340 __xfs_set_acl+0xf0/0x230 xfs_set_acl+0xd0/0x160 set_posix_acl+0xc0/0x130 posix_acl_xattr_set+0x68/0x110 __vfs_setxattr+0xa4/0x110 __vfs_setxattr_noperm+0xac/0x240 vfs_setxattr+0x128/0x130 setxattr+0x248/0x600 path_setxattr+0x108/0x120 sys_setxattr+0x28/0x40 system_call+0x5c/0x70 Instruction dump: 7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c050000 4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040 The instruction dump decodes as: subfic r6,r5,8 rlwinm r6,r6,3,0,28 ldbrx r9,0,r3 ldbrx r10,0,r4 <- Which shows us doing an 8 byte load from c00000062ac3fff9, which crosses the page boundary at c00000062ac40000 and faults. It's not OK for memcmp to read past the end of the source or destination buffers if that would cross a page boundary, because we don't know that the next page is mapped. As pointed out by Segher, we can read past the end of the source or destination as long as we don't cross a 4K boundary, because that's our minimum page size on all platforms. The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get there we know that s1 is 8-byte aligned and we have at least 1 byte to read, so a single 8-byte load won't read past the end of s1 and cross a page boundary. But we have to be more careful with s2. So check if it's within 8 bytes of a 4K boundary and if so go to the byte-by-byte loop. Fixes: 2d9ee327adce ("powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()") Cc: stable@vger.kernel.org # v4.19+ Reported-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Tested-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/pseries/energy: Use OF accessor functions to read ibm,drc-indexesGautham R. Shenoy2019-04-031-9/+18
| | | | | | | | | | | | | | | | | | | | | | | commit ce9afe08e71e3f7d64f337a6e932e50849230fc2 upstream. In cpu_to_drc_index() in the case when FW_FEATURE_DRC_INFO is absent, we currently use of_read_property() to obtain the pointer to the array corresponding to the property "ibm,drc-indexes". The elements of this array are of type __be32, but are accessed without any conversion to the OS-endianness, which is buggy on a Little Endian OS. Fix this by using of_property_read_u32_index() accessor function to safely read the elements of the array. Fixes: e83636ac3334 ("pseries/drc-info: Search DRC properties for CPU indexes") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Pavithra R. Prakash <pavrampu@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> [mpe: Make the WARN_ON a WARN_ON_ONCE so it's not retriggerable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: bpf: Fix generation of load/store DW instructionsNaveen N. Rao2019-04-035-18/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 86be36f6502c52ddb4b85938145324fd07332da1 upstream. Yauheni Kaliuta pointed out that PTR_TO_STACK store/load verifier test was failing on powerpc64 BE, and rightfully indicated that the PPC_LD() macro is not masking away the last two bits of the offset per the ISA, resulting in the generation of 'lwa' instruction instead of the intended 'ld' instruction. Segher also pointed out that we can't simply mask away the last two bits as that will result in loading/storing from/to a memory location that was not intended. This patch addresses this by using ldx/stdx if the offset is not word-aligned. We load the offset into a temporary register (TMP_REG_2) and use that as the index register in a subsequent ldx/stdx. We fix PPC_LD() macro to mask off the last two bits, but enhance PPC_BPF_LL() and PPC_BPF_STL() to factor in the offset value and generate the proper instruction sequence. We also convert all existing users of PPC_LD() and PPC_STD() to use these macros. All existing uses of these macros have been audited to ensure that TMP_REG_2 can be clobbered. Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF") Cc: stable@vger.kernel.org # v4.9+ Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/security: Fix spectre_v2 reportingMichael Ellerman2019-04-031-15/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 92edf8df0ff2ae86cc632eeca0e651fd8431d40d upstream. When I updated the spectre_v2 reporting to handle software count cache flush I got the logic wrong when there's no software count cache enabled at all. The result is that on systems with the software count cache flush disabled we print: Mitigation: Indirect branch cache disabled, Software count cache flush Which correctly indicates that the count cache is disabled, but incorrectly says the software count cache flush is enabled. The root of the problem is that we are trying to handle all combinations of options. But we know now that we only expect to see the software count cache flush enabled if the other options are false. So split the two cases, which simplifies the logic and fixes the bug. We were also missing a space before "(hardware accelerated)". The result is we see one of: Mitigation: Indirect branch serialisation (kernel only) Mitigation: Indirect branch cache disabled Mitigation: Software count cache flush Mitigation: Software count cache flush (hardware accelerated) Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Fix the flush of branch predictor.Christophe Leroy2019-04-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | commit 27da80719ef132cf8c80eb406d5aeb37dddf78cc upstream. The commit identified below adds MC_BTB_FLUSH macro only when CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error on some configs (seen several times with kisskb randconfig_defconfig) arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush' make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1 make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2 make[1]: *** [Makefile:1043: arch/powerpc] Error 2 make: *** [Makefile:152: sub-make] Error 2 This patch adds a blank definition of MC_BTB_FLUSH for other cases. Fixes: 10c5e83afd4a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)") Cc: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'Diana Craciun2019-04-031-6/+12
| | | | | | | | | | | | | commit 039daac5526932ec731e4499613018d263af8b3e upstream. Fixed the following build warning: powerpc-linux-gnu-ld: warning: orphan section `__btb_flush_fixup' from `arch/powerpc/kernel/head_44x.o' being placed in section `__btb_flush_fixup'. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Update Spectre v2 reportingDiana Craciun2019-04-031-1/+4
| | | | | | | | | | | commit dfa88658fb0583abb92e062c7a9cd5a5b94f2a46 upstream. Report branch predictor state flush as a mitigation for Spectre variant 2. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is usedDiana Craciun2019-04-031-0/+1
| | | | | | | | | | | commit 3bc8ea8603ae4c1e09aca8de229ad38b8091fcb3 upstream. If the user choses not to use the mitigations, replace the code sequence with nops. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Flush branch predictor when entering KVMDiana Craciun2019-04-031-0/+4
| | | | | | | | | | | | commit e7aa61f47b23afbec41031bc47ca8d6cb6516abc upstream. Switching from the guest to host is another place where the speculative accesses can be exploited. Flush the branch predictor when entering KVM. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)Diana Craciun2019-04-032-0/+21
| | | | | | | | | | | | | | | | commit 7fef436295bf6c05effe682c8797dfcb0deb112a upstream. In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e.the kernel is entered), the branch predictor state is flushed. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)Diana Craciun2019-04-033-1/+37
| | | | | | | | | | | | | | | | commit 10c5e83afd4a3f01712d97d3bb1ae34d5b74a185 upstream. In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e. the kernel is entered), the branch predictor state is flushed. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Add nospectre_v2 command line argumentDiana Craciun2019-04-032-0/+26
| | | | | | | | | | | commit f633a8ad636efb5d4bba1a047d4a0f1ef719aa06 upstream. When the command line argument is present, the Spectre variant 2 mitigations are disabled. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Emulate SPRN_BUCSR registerDiana Craciun2019-04-031-0/+7
| | | | | | | | | | | | | | | commit 98518c4d8728656db349f875fcbbc7c126d4c973 upstream. In order to flush the branch predictor the guest kernel performs writes to the BUCSR register which is hypervisor privilleged. However, the branch predictor is flushed at each KVM entry, so the branch predictor has been already flushed, so just return as soon as possible to guest. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> [mpe: Tweak comment formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Add macro to flush the branch predictorDiana Craciun2019-04-031-0/+10
| | | | | | | | | | | commit 1cbf8990d79ff69da8ad09e8a3df014e1494462b upstream. The BUCSR register can be used to invalidate the entries in the branch prediction mechanisms. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fsl: Add infrastructure to fixup branch predictor flushDiana Craciun2019-04-034-0/+45
| | | | | | | | | | | | | | | | commit 76a5eaa38b15dda92cd6964248c39b5a6f3a4e9d upstream. In order to protect against speculation attacks (Spectre variant 2) on NXP PowerPC platforms, the branch predictor should be flushed when the privillege level is changed. This patch is adding the infrastructure to fixup at runtime the code sections that are performing the branch predictor flush depending on a boot arg parameter which is added later in a separate patch. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038Michael Ellerman2019-03-272-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b5b4453e7912f056da1ca7572574cada32ecb60c upstream. Jakub Drnec reported: Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years. Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce. Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other than booting with rtc at >2038 and then running ntp. When this happens, anything with timers (e.g. openjdk) breaks rather badly. And included a test case (slightly edited for brevity): #define _POSIX_C_SOURCE 199309L #include <stdio.h> #include <time.h> #include <stdlib.h> #include <unistd.h> long get_time(void) { struct timespec tp; clock_gettime(CLOCK_MONOTONIC, &tp); return tp.tv_sec + tp.tv_nsec / 1000000000; } int main(void) { long last = get_time(); while(1) { long now = get_time(); if (now < last) { printf("clock went backwards by %ld seconds!\n", last - now); } last = now; sleep(1); } return 0; } Which when run concurrently with: # date -s 2040-1-1 # date -s 2037-1-1 Will detect the clock going backward. The root cause is that wtom_clock_sec in struct vdso_data is only a 32-bit signed value, even though we set its value to be equal to tk->wall_to_monotonic.tv_sec which is 64-bits. Because the monotonic clock starts at zero when the system boots the wall_to_montonic.tv_sec offset is negative for current and future dates. Currently on a freshly booted system the offset will be in the vicinity of negative 1.5 billion seconds. However if the wall clock is set past the Y2038 boundary, the offset from wall to monotonic becomes less than negative 2^31, and no longer fits in 32-bits. When that value is assigned to wtom_clock_sec it is truncated and becomes positive, causing the VDSO assembly code to calculate CLOCK_MONOTONIC incorrectly. That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which it is not meant to do. Worse, if the time is then set back before the Y2038 boundary CLOCK_MONOTONIC will jump backward. We can fix it simply by storing the full 64-bit offset in the vdso_data, and using that in the VDSO assembly code. We also shuffle some of the fields in vdso_data to avoid creating a hole. The original commit that added the CLOCK_MONOTONIC support to the VDSO did actually use a 64-bit value for wtom_clock_sec, see commit a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") (Nov 2005). However just 3 days later it was converted to 32-bits in commit 0c37ec2aa88b ("[PATCH] powerpc: vdso fixes (take #2)"), and the bug has existed since then AFAICS. Fixes: 0c37ec2aa88b ("[PATCH] powerpc: vdso fixes (take #2)") Cc: stable@vger.kernel.org # v2.6.15+ Link: http://lkml.kernel.org/r/HaC.ZfES.62bwlnvAvMP.1STMMj@seznam.cz Reported-by: Jakub Drnec <jaydee@email.cz> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: Call kvm_arch_memslots_updated() before updating memslotsSean Christopherson2019-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream. kvm_arch_memslots_updated() is at this point in time an x86-specific hook for handling MMIO generation wraparound. x86 stashes 19 bits of the memslots generation number in its MMIO sptes in order to avoid full page fault walks for repeat faults on emulated MMIO addresses. Because only 19 bits are used, wrapping the MMIO generation number is possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that the generation has changed so that it can invalidate all MMIO sptes in case the effective MMIO generation has wrapped so as to avoid using a stale spte, e.g. a (very) old spte that was created with generation==0. Given that the purpose of kvm_arch_memslots_updated() is to prevent consuming stale entries, it needs to be called before the new generation is propagated to memslots. Invalidating the MMIO sptes after updating memslots means that there is a window where a vCPU could dereference the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO spte that was created with (pre-wrap) generation==0. Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/traps: Fix the message printed when stack overflowsChristophe Leroy2019-03-231-2/+2
| | | | | | | | | | | | | | | | | | | | commit 9bf3d3c4e4fd82c7174f4856df372ab2a71005b9 upstream. Today's message is useless: [ 42.253267] Kernel stack overflow in process (ptrval), r1=c65500b0 This patch fixes it: [ 66.905235] Kernel stack overflow in process sh[356], r1=c65560b0 Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Use task_pid_nr()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/traps: fix recoverability of machine check handling on book3s/32Christophe Leroy2019-03-231-4/+4
| | | | | | | | | | | | | | | | commit 0bbea75c476b77fa7d7811d6be911cc7583e640f upstream. Looks like book3s/32 doesn't set RI on machine check, so checking RI before calling die() will always be fatal allthought this is not an issue in most cases. Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt") Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of machine_check interrupts") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configurationAneesh Kumar K.V2019-03-231-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 35f2806b481f5b9207f25e1886cba5d1c4d12cc7 upstream. We added runtime allocation of 16G pages in commit 4ae279c2c96a ("powerpc/mm/hugetlb: Allow runtime allocation of 16G.") That was done to enable 16G allocation on PowerNV and KVM config. In case of KVM config, we mostly would have the entire guest RAM backed by 16G hugetlb pages for this to work. PAPR do support partial backing of guest RAM with hugepages via ibm,expected#pages node of memory node in the device tree. This means rest of the guest RAM won't be backed by 16G contiguous pages in the host and hence a hash page table insertion can fail in such case. An example error message will look like hash-mmu: mm: Hashing failure ! EA=0x7efc00000000 access=0x8000000000000006 current=readback hash-mmu: trap=0x300 vsid=0x67af789 ssize=1 base psize=14 psize 14 pte=0xc000000400000386 readback[12260]: unhandled signal 7 at 00007efc00000000 nip 00000000100012d0 lr 000000001000127c code 2 This patch address that by preventing runtime allocation of 16G hugepages in LPAR config. To allocate 16G hugetlb one need to kernel command line hugepagesz=16G hugepages=<number of 16G pages> With radix translation mode we don't run into this issue. This change will prevent runtime allocation of 16G hugetlb pages on kvm with hash translation mode. However, with the current upstream it was observed that 16G hugetlbfs backed guest doesn't boot at all. We observe boot failure with the below message: [131354.647546] KVM: map_vrma at 0 failed, ret=-4 That means this patch is not resulting in an observable regression. Once we fix the boot issue with 16G hugetlb backed memory, we need to use ibm,expected#pages memory node attribute to indicate 16G page reservation to the guest. This will also enable partial backing of guest RAM with 16G pages. Fixes: 4ae279c2c96a ("powerpc/mm/hugetlb: Allow runtime allocation of 16G.") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/ptrace: Simplify vr_get/set() to avoid GCC warningMichael Ellerman2019-03-231-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ca6d5149d2ad0a8d2f9c28cbe379802260a0a5e0 upstream. GCC 8 warns about the logic in vr_get/set(), which with -Werror breaks the build: In function ‘user_regset_copyin’, inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:628:9: include/linux/regset.h:295:4: error: ‘memcpy’ offset [-527, -529] is out of the bounds [0, 16] of object ‘vrsave’ with type ‘union <anonymous>’ [-Werror=array-bounds] arch/powerpc/kernel/ptrace.c: In function ‘vr_set’: arch/powerpc/kernel/ptrace.c:623:5: note: ‘vrsave’ declared here } vrsave; This has been identified as a regression in GCC, see GCC bug 88273. However we can avoid the warning and also simplify the logic and make it more robust. Currently we pass -1 as end_pos to user_regset_copyout(). This says "copy up to the end of the regset". The definition of the regset is: [REGSET_VMX] = { .core_note_type = NT_PPC_VMX, .n = 34, .size = sizeof(vector128), .align = sizeof(vector128), .active = vr_active, .get = vr_get, .set = vr_set }, The end is calculated as (n * size), ie. 34 * sizeof(vector128). In vr_get/set() we pass start_pos as 33 * sizeof(vector128), meaning we can copy up to sizeof(vector128) into/out-of vrsave. The on-stack vrsave is defined as: union { elf_vrreg_t reg; u32 word; } vrsave; And elf_vrreg_t is: typedef __vector128 elf_vrreg_t; So there is no bug, but we rely on all those sizes lining up, otherwise we would have a kernel stack exposure/overwrite on our hands. Rather than relying on that we can pass an explict end_pos based on the sizeof(vrsave). The result should be exactly the same but it's more obviously not over-reading/writing the stack and it avoids the compiler warning. Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: Mathieu Malaterre <malat@debian.org> Cc: stable@vger.kernel.org Tested-by: Mathieu Malaterre <malat@debian.org> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guestMark Cave-Ayland2019-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fe1ef6bcdb4fca33434256a802a3ed6aacf0bd2f upstream. Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the host kernel. Leaving FE0/1 enabled means unrelated processes might receive FPEs when they're not expecting them and crash. In particular if this happens to init the host will then panic. eg (transcribed): qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0 systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Reinstate these bits to the MSR bitmask to enable MacOS guests to run under 32-bit KVM-PR once again without issue. Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exitPaul Mackerras2019-03-233-25/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 19f8a5b5be2898573a5e1dc1db93e8d40117606a upstream. Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg() inside pnv_cpu_offline(), with the aim of changing the LPCR value in the SLW image to disable wakeups from the decrementer while a CPU is offline. However, pnv_cpu_offline() gets called each time a secondary CPU thread is woken up to participate in running a KVM guest, that is, not just when a CPU is offlined. Since opal_slw_set_reg() is a very slow operation (with observed execution times around 20 milliseconds), this means that an offline secondary CPU can often be busy doing the opal_slw_set_reg() call when the primary CPU wants to grab all the secondary threads so that it can run a KVM guest. This leads to messages like "KVM: couldn't grab CPU n" being printed and guest execution failing. There is no need to reprogram the SLW image on every KVM guest entry and exit. So that we do it only when a CPU is really transitioning between online and offline, this moves the calls to pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self(). Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/83xx: Also save/restore SPRG4-7 during suspendChristophe Leroy2019-03-231-7/+27
| | | | | | | | | | | | | | commit 36da5ff0bea2dc67298150ead8d8471575c54c7d upstream. The 83xx has 8 SPRG registers and uses at least SPRG4 for DTLB handling LRU. Fixes: 2319f1239592 ("powerpc/mm: e300c2/c3/c4 TLB errata workaround") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/powernv: Make opal log only readable by rootJordan Niethe2019-03-231-1/+1
| | | | | | | | | | | | | | | | | | commit 7b62f9bd2246b7d3d086e571397c14ba52645ef1 upstream. Currently the opal log is globally readable. It is kernel policy to limit the visibility of physical addresses / kernel pointers to root. Given this and the fact the opal log may contain this information it would be better to limit the readability to root. Fixes: bfc36894a48b ("powerpc/powernv: Add OPAL message log interface") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/wii: properly disable use of BATs when requested.Christophe Leroy2019-03-231-0/+4
| | | | | | | | | | | | | | | | | | commit 6d183ca8baec983dc4208ca45ece3c36763df912 upstream. 'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC deny the use of BATS for mapping memory. This patch makes sure that the specific wii RAM mapping function takes it into account as well. Fixes: de32400dd26e ("wii: use both mem1 and mem2 as ram") Cc: stable@vger.kernel.org Reviewed-by: Jonathan Neuschafer <j.neuschaefer@gmx.net> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/32: Clear on-stack exception marker upon exception returnChristophe Leroy2019-03-231-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9580b71b5a7863c24a9bd18bcd2ad759b86b1eff upstream. Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order to avoid confusing stacktrace like the one below. Call Trace: [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable) [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180 [c0e9dd10] [c0895130] memchr+0x24/0x74 [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574 [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8 [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4 --- interrupt: c0e9df00 at 0x400f330 LR = init_stack+0x1f00/0x2000 [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable) [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108 [c0e9df50] [c0c15434] start_kernel+0x310/0x488 [c0e9dff0] [00003484] 0x3484 With this patch the trace becomes: Call Trace: [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable) [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180 [c0e9dd10] [c0895150] memchr+0x24/0x74 [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574 [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8 [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4 [c0e9de80] [c00ae3e4] printk+0xa8/0xcc [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108 [c0e9df50] [c0c15434] start_kernel+0x310/0x488 [c0e9dff0] [00003484] 0x3484 Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.Christophe Leroy2019-02-271-1/+2
| | | | | | | | | | | | | | | [ Upstream commit fb0bdec51a4901b7dd088de0a1e365e1b9f5cd21 ] Commit 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer dereference") moved the loading of r6 earlier in the code. As some functions are called inbetween, r6 needs to be loaded again with the address of swapper_pg_dir in order to set PTE pointers for the Abatron BDI. Fixes: 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer dereference") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/radix: Fix kernel crash with mremap()Aneesh Kumar K.V2019-02-152-15/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 579b9239c1f38665b21e8d0e6ee83ecc96dbd6bb upstream. With support for split pmd lock, we use pmd page pmd_huge_pte pointer to store the deposited page table. In those config when we move page tables we need to make sure we move the deposited page table to the correct pmd page. Otherwise this can result in crash when we withdraw of deposited page table because we can find the pmd_huge_pte NULL. eg: __split_huge_pmd+0x1070/0x1940 __split_huge_pmd+0xe34/0x1940 (unreliable) vma_adjust_trans_huge+0x110/0x1c0 __vma_adjust+0x2b4/0x9b0 __split_vma+0x1b8/0x280 __do_munmap+0x13c/0x550 sys_mremap+0x220/0x7e0 system_call+0x5c/0x70 Fixes: 675d995297d4 ("powerpc/book3s64: Enable split pmd ptlock.") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.Mahesh Salgaonkar2019-02-123-5/+14
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 0db6896ff6332ba694f1e61b93ae3b2640317633 ] For fadump to work successfully there should not be any holes in reserved memory ranges where kernel has asked firmware to move the content of old kernel memory in event of crash. Now that fadump uses CMA for reserved area, this memory area is now not protected from hot-remove operations unless it is cma allocated. Hence, fadump service can fail to re-register after the hot-remove operation, if hot-removed memory belongs to fadump reserved region. To avoid this make sure that memory from fadump reserved area is not hot-removable if fadump is registered. However, if user still wants to remove that memory, he can do so by manually stopping fadump service before hot-remove operation. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/mm: Fix reporting of kernel execute faults on the 8xxChristophe Leroy2019-02-121-1/+3
| | | | | | | | | | | | | | | [ Upstream commit ffca395b11c4a5a6df6d6345f794b0e3d578e2d0 ] On the 8xx, no-execute is set via PPP bits in the PTE. Therefore a no-exec fault generates DSISR_PROTFAULT error bits, not DSISR_NOEXEC_OR_G. This patch adds DSISR_PROTFAULT in the test mask. Fixes: d3ca587404b3 ("powerpc/mm: Fix reporting of kernel execute faults") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/powernv/ioda: Allocate indirect TCE levels of cached userspace ↵Alexey Kardashevskiy2019-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | addresses on demand [ Upstream commit bdbf649efe21173cae63b4b71db84176420f9039 ] The powernv platform maintains 2 TCE tables for VFIO - a hardware TCE table and a table with userspace addresses; the latter is used for marking pages dirty when corresponging TCEs are unmapped from the hardware table. a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand") enabled on-demand allocation of the hardware table, however it missed the other table so it has still been fully allocated at the boot time. This fixes the issue by allocating a single level, just like we do for the hardware table. Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/perf: Fix thresholding counter data for unknown typeMadhavan Srinivasan2019-02-121-1/+6
| | | | | | | | | | | | | | | | [ Upstream commit 17cfccc91545682513541924245abb876d296063 ] MMCRA[34:36] and MMCRA[38:44] expose the thresholding counter value. Thresholding counter can be used to count latency cycles such as load miss to reload. But threshold counter value is not relevant when the sampled instruction type is unknown or reserved. Patch to fix the thresholding counter value to zero when sampled instruction type is unknown or reserved. Fixes: 170a315f41c6('powerpc/perf: Support to export MMCRA[TEC*] field to userspace') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/uaccess: fix warning/error with access_ok()Christophe Leroy2019-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 05a4ab823983d9136a460b7b5e0d49ee709a6f86 ] With the following piece of code, the following compilation warning is encountered: if (_IOC_DIR(ioc) != _IOC_NONE) { int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ; if (!access_ok(verify, ioarg, _IOC_SIZE(ioc))) { drivers/platform/test/dev.c: In function 'my_ioctl': drivers/platform/test/dev.c:219:7: warning: unused variable 'verify' [-Wunused-variable] int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ; This patch fixes it by referencing 'type' in the macro allthough doing nothing with it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machinesSuraj Jitindar Singh2019-02-121-1/+4
| | | | | | | | | | | | | | | | | | [ Upstream commit 693ac10a88a2219bde553b2e8460dbec97e594e6 ] The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the availability of in kernel tce acceleration for vfio. However it is currently the case that this is only available on a powernv machine, not for a pseries machine. Thus make this capability dependent on having the cpu feature CPU_FTR_HVMODE. [paulus@ozlabs.org - fixed compilation for Book E.] Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/32: Add .data..Lubsan_data*/.data..Lubsan_type* sections explicitlyMathieu Malaterre2019-02-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit beba24ac59133cb36ecd03f9af9ccb11971ee20e ] When both `CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y` and `CONFIG_UBSAN=y` are set, link step typically produce numberous warnings about orphan section: + powerpc-linux-gnu-ld -EB -m elf32ppc -Bstatic --orphan-handling=warn --build-id --gc-sections -X -o .tmp_vmlinux1 -T ./arch/powerpc/kernel/vmlinux.lds --who le-archive built-in.a --no-whole-archive --start-group lib/lib.a --end-group powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data393' from `init/main.o' being placed in section `.data..Lubsan_data393'. powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data394' from `init/main.o' being placed in section `.data..Lubsan_data394'. ... powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_type11' from `init/main.o' being placed in section `.data..Lubsan_type11'. powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_type12' from `init/main.o' being placed in section `.data..Lubsan_type12'. ... This commit remove those warnings produced at W=1. Link: https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg135407.html Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/pseries: add of_node_put() in dlpar_detach_node()Frank Rowand2019-02-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5b3f5c408d8cc59b87e47f1ab9803dbd006e4a91 ] The previous commit, "of: overlay: add missing of_node_get() in __of_attach_node_sysfs" added a missing of_node_get() to __of_attach_node_sysfs(). This results in a refcount imbalance for nodes attached with dlpar_attach_node(). The calling sequence from dlpar_attach_node() to __of_attach_node_sysfs() is: dlpar_attach_node() of_attach_node() __of_attach_node_sysfs() For more detailed description of the node refcount, see commit 68baf692c435 ("powerpc/pseries: Fix of_node_put() underflow during DLPAR remove"). Tested-by: Alan Tull <atull@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/xmon: Fix invocation inside lock regionBreno Leitao2019-01-261-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ] Currently xmon needs to get devtree_lock (through rtas_token()) during its invocation (at crash time). If there is a crash while devtree_lock is being held, then xmon tries to get the lock but spins forever and never get into the interactive debugger, as in the following case: int *ptr = NULL; raw_spin_lock_irqsave(&devtree_lock, flags); *ptr = 0xdeadbeef; This patch avoids calling rtas_token(), thus trying to get the same lock, at crash time. This new mechanism proposes getting the token at initialization time (xmon_init()) and just consuming it at crash time. This would allow xmon to be possible invoked independent of devtree_lock being held or not. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/tm: Set MSR[TS] just prior to recheckpointBreno Leitao2019-01-132-15/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e1c3743e1a20647c53b719dbf28b48f45d23f2cd upstream. On a signal handler return, the user could set a context with MSR[TS] bits set, and these bits would be copied to task regs->msr. At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, several __get_user() are called and then a recheckpoint is executed. This is a problem since a page fault (in kernel space) could happen when calling __get_user(). If it happens, the process MSR[TS] bits were already set, but recheckpoint was not executed, and SPRs are still invalid. The page fault can cause the current process to be de-scheduled, with MSR[TS] active and without tm_recheckpoint() being called. More importantly, without TEXASR[FS] bit set also. Since TEXASR might not have the FS bit set, and when the process is scheduled back, it will try to reclaim, which will be aborted because of the CPU is not in the suspended state, and, then, recheckpoint. This recheckpoint will restore thread->texasr into TEXASR SPR, which might be zero, hitting a BUG_ON(). kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0] pc: c000000000054550: restore_gprs+0xb0/0x180 lr: 0000000000000000 sp: c00000041f157950 msr: 8000000100021033 current = 0xc00000041f143000 paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01 pid = 1021, comm = kworker/11:1 kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) enter ? for help [c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0 [c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0 [c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990 [c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0 [c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610 [c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140 [c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c This patch simply delays the MSR[TS] set, so, if there is any page fault in the __get_user() section, it does not have regs->msr[TS] set, since the TM structures are still invalid, thus avoiding doing TM operations for in-kernel exceptions and possible process reschedule. With this patch, the MSR[TS] will only be set just before recheckpointing and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in invalid state. Other than that, if CONFIG_PREEMPT is set, there might be a preemption just after setting MSR[TS] and before tm_recheckpoint(), thus, this block must be atomic from a preemption perspective, thus, calling preempt_disable/enable() on this code. It is not possible to move tm_recheckpoint to happen earlier, because it is required to get the checkpointed registers from userspace, with __get_user(), thus, the only way to avoid this undesired behavior is delaying the MSR[TS] set. The 32-bits signal handler seems to be safe this current issue, but, it might be exposed to the preemption issue, thus, disabling preemption in this chunk of code. Changes from v2: * Run the critical section with preempt_disable. Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") Cc: stable@vger.kernel.org (v3.9+) Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing"Greg Kroah-Hartman2019-01-132-29/+9
| | | | | | | | | | | | | | | This reverts commit a9935a12768851762089fda8e5a9daaf0231808e which is commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream. It breaks the powerpc build, so drop it from the tree until a fix goes upstream. Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Breno Leitao <leitao@debian.org> Cc: Michal Suchánek <msuchanek@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/boot: Set target when cross-compiling for clangJoel Stanley2019-01-131-0/+5
| | | | | | | | | | | | | | | | commit 813af51f5d30a2da6a2523c08465f9726e51772e upstream. Clang needs to be told which target it is building for when cross compiling. Link: https://github.com/ClangBuiltLinux/linux/issues/259 Signed-off-by: Joel Stanley <joel@jms.id.au> Tested-by: Daniel Axtens <dja@axtens.net> # powerpc 64-bit BE Acked-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Disable -Wbuiltin-requires-header when setjmp is usedJoel Stanley2019-01-132-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit aea447141c7e7824b81b49acd1bc785506fba46e upstream. The powerpc kernel uses setjmp which causes a warning when building with clang: In file included from arch/powerpc/xmon/xmon.c:51: ./arch/powerpc/include/asm/setjmp.h:15:13: error: declaration of built-in function 'setjmp' requires inclusion of the header <setjmp.h> [-Werror,-Wbuiltin-requires-header] extern long setjmp(long *); ^ ./arch/powerpc/include/asm/setjmp.h:16:13: error: declaration of built-in function 'longjmp' requires inclusion of the header <setjmp.h> [-Werror,-Wbuiltin-requires-header] extern void longjmp(long *, long); ^ This *is* the header and we're not using the built-in setjump but rather the one in arch/powerpc/kernel/misc.S. As the compiler warning does not make sense, it for the files where setjmp is used. Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> [mpe: Move subdir-ccflags in xmon/Makefile to not clobber -Werror] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: avoid -mno-sched-epilog on GCC 4.9 and newerNicholas Piggin2019-01-131-2/+6
| | | | | | | | | | commit 6977f95e63b9b3fb4a5973481a800dd9f48a1338 upstream. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>