summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* powerpc: powernv: Fix KCSAN datarace warnings on idle_state contentionRohan McLure2023-06-212-7/+10
| | | | | | | | | | | | | | | | | | | The idle_state entry in the PACA on PowerNV features a bit which is atomically tested and set through ldarx/stdcx. to be used as a spinlock. This lock then guards access to other bit fields of idle_state. KCSAN cannot differentiate between any of these bitfield accesses as they all are implemented by 8-byte store/load instructions, thus cores contending on the bit-lock appear to data race with modifications to idle_state. Separate the bit-lock entry from the data guarded by the lock to avoid the possibility of data races being detected by KCSAN. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230510033117.1395895-7-rmclure@linux.ibm.com
* powerpc: Mark [h]ssr_valid accesses in check_return_regs_validRohan McLure2023-06-212-10/+8
| | | | | | | | | | | | | | | | | Checks to see if the [H]SRR registers have been clobbered by (soft) NMI interrupts imply the possibility for a data race on the [h]srr_valid entries in the PACA. Annotate accesses to these fields with READ_ONCE, removing the need for the barrier. The diagnostic can use plain-access reads and writes, but annotate with data_race. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230510033117.1395895-5-rmclure@linux.ibm.com
* powerpc: qspinlock: Enforce qnode writes prior to publishing to queueRohan McLure2023-06-211-0/+7
| | | | | | | | | | | | | | | | | | Annotate the release barrier and memory clobber (in effect, producing a compiler barrier) in the publish_tail_cpu call. These barriers have the effect of ensuring that qnode attributes are all written to prior to publish the node to the waitqueue. Even while the initial write to the 'locked' attribute is guaranteed to terminate prior to the node being visible, KCSAN still complains that the write is reorderable by the compiler. Issue a kcsan_release() to inform KCSAN of the release barrier contained in publish_tail_cpu(). Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230510033117.1395895-3-rmclure@linux.ibm.com
* powerpc: qspinlock: Mark accesses to qnode lock checksRohan McLure2023-06-211-2/+2
| | | | | | | | | | | | The powerpc implementation of qspinlocks will both poll and spin on the bitlock guarding a qnode. Mark these accesses with READ_ONCE to convey to KCSAN that polling is intentional here. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230510033117.1395895-2-rmclure@linux.ibm.com
* powerpc/powernv/pci: Remove last IODA1 definesJoel Stanley2023-06-212-3/+3
| | | | | | | Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230613045202.294451-4-joel@jms.id.au
* powerpc/powernv/pci: Remove MVE codeJoel Stanley2023-06-213-27/+1
| | | | | | | | | | With IODA1 support gone the OPAL calls to set MVE are dead code. Remove them. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230613045202.294451-3-joel@jms.id.au
* powerpc/powernv/pci: Remove ioda1 supportJoel Stanley2023-06-213-455/+2
| | | | | | | | | | | | The final "VPL" Power7 boxes that were used for powernv bringup have been scrapped, meaning there are no machines with ioda1 left. This patch removes the obvious unused code. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230613045202.294451-2-joel@jms.id.au
* powerpc: 52xx: Make immr_id DT match tables staticRob Herring2023-06-212-2/+2
| | | | | | | | | | | | | | | | | In some builds, the mpc52xx_pm_prepare()/lite5200_pm_prepare() functions generate stack size warnings. The addition of 'struct resource' in commit 2500763dd3db ("powerpc: Use of_address_to_resource()") grew the stack size and is blamed for the warnings. However, the real issue is there's no reason the 'struct of_device_id immr_ids' DT match tables need to be on the stack as they are constant. Declare them as static to move them off the stack. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306130405.uTv5yOZD-lkp@intel.com/ Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230614171724.2403982-1-robh@kernel.org
* powerpc: mpc512x: Remove open coded "ranges" parsingRob Herring2023-06-211-32/+14
| | | | | | | | | | "ranges" is a standard property, and we have common helper functions for parsing it, so let's use the for_each_of_range() iterator. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609183232.1767050-1-robh@kernel.org
* powerpc: fsl_soc: Use of_range_to_resource() for "ranges" parsingRob Herring2023-06-211-12/+4
| | | | | | | | | | | "ranges" is a standard property with common parsing functions. Users shouldn't be implementing their own parsing of it. Refactor the FSL RapidIO "ranges" parsing to use of_range_to_resource() instead. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609183238.1767186-1-robh@kernel.org
* powerpc: fsl: Use of_property_read_reg() to parse "reg"Rob Herring2023-06-212-19/+5
| | | | | | | | | | Use the recently added of_property_read_reg() helper to get the untranslated "reg" address value. Signed-off-by: Rob Herring <robh@kernel.org> [mpe: Add required include of of_address.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609183151.1766261-1-robh@kernel.org
* powerpc: fsl_rio: Use of_range_to_resource() for "ranges" parsingRob Herring2023-06-211-27/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "ranges" is a standard property with common parsing functions. Users shouldn't be implementing their own parsing of it. Refactor the FSL RapidIO "ranges" parsing to use of_range_to_resource() instead. One change is the original code would look for "#size-cells" and "#address-cells" in the parent node if not found in the port child nodes. That is non-standard behavior and not necessary AFAICT. In 2011 in commit 54986964c13c ("powerpc/85xx: Update SRIO device tree nodes") there was an ABI break. The upstream .dts files have been correct since at least that point. Signed-off-by: Rob Herring <robh@kernel.org> [mpe: Remove now unused "cell" variable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609183244.1767325-1-robh@kernel.org "ranges" is a standard property with common parsing functions. Users shouldn't be implementing their own parsing of it. Refactor the FSL RapidIO "ranges" parsing to use of_range_to_resource() instead. One change is the original code would look for "#size-cells" and "#address-cells" in the parent node if not found in the port child nodes. That is non-standard behavior and not necessary AFAICT. In 2011 in commit 54986964c13c ("powerpc/85xx: Update SRIO device tree nodes") there was an ABI break. The upstream .dts files have been correct since at least that point. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609183244.1767325-1-robh@kernel.org
* macintosh: Use of_property_read_reg() to parse "reg"Rob Herring2023-06-211-8/+7
| | | | | | | | | | Use the recently added of_property_read_reg() helper to get the untranslated "reg" address value. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609182926.1763589-1-robh@kernel.org
* macintosh: Use of_address_to_resource()Rob Herring2023-06-212-26/+13
| | | | | | | | | | Replace open coded reading of "reg" and of_translate_address() calls with single call to of_address_to_resource(). Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230319163226.226583-1-robh@kernel.org
* powerpc: powermac: Use of_get_cpu_hwid() to read CPU node 'reg'Rob Herring2023-06-211-6/+6
| | | | | | | | | | Replace open coded reading of CPU nodes' "reg" properties with of_get_cpu_hwid() dedicated for this purpose. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230319145931.65499-1-robh@kernel.org
* powerpc: drop MPC85xx_CDS platform supportPaul Gortmaker2023-06-2110-1660/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MPC8541/8548/8555 Configurable Development System (CDS) were the vehicle used to provide evaluation of the 1st e500-v2 CPUs around 2007. Similar to the earlier MPC83xx-MDS systems we removed, the "brains" exist on a PCI-X card, but additional connectors exist to the right of the PCI-X slot, two structural metal pins are used to provide stability in a vertical ATX mounting, and the CPU is now on a daughter-card vs. a clamped down BGA. Given the extra complexity and risk of connector damage, the 8548CDS I had access to came pre-assembled in a basic white Antec case common for that era, and I'm inclined to assume that was the default. Power was typical "Pentium4" 2005 ATX - the main 20 pin connector went to the PCI ATX form factor backplane, and the 4 pin black/yellow went to the CPU card. Like previous evaluation boards, they attempted to provide break-out connectors for as many features as possible, and that made for a fairly complex looking system. In any case, these are over 15 years old, and fairly complex systems, originally made for a small group of industry related people, and made for use where quiet fan operation wasn't important. Given that, it makes sense to remove support from them in 2023. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230620043300.197546-3-paul.gortmaker@windriver.com
* powerpc: drop MPC8540_ADS and MPC8560_ADS platform supportPaul Gortmaker2023-06-218-1008/+0
| | | | | | | | | | | | | | | | | | | | | | | Based on the revision history in the manual(s), these e500-v1 platforms were first available around 2002. Like a lot of evaluation boards, they attempted to provide break-out connectors for all possible features, and that combined with four PCI-X slots (and the age/era) meant for a considerably large board. As I recall it, from a Linux point of view, the biggest difference between 8540 and 8560 was in the UART implementation, and that is reflected in a diff of the defconfigs. In any case, these are over 20 years old, and by today's standards only have a small amount of DDR1 memory, and were not widely available. Given that, it makes sense to remove support from them in 2023. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230620043300.197546-2-paul.gortmaker@windriver.com
* security/integrity: fix pointer to ESL data and its size on pseriesNayna Jain2023-06-211-14/+26
| | | | | | | | | | | | | | On PowerVM guest, variable data is prefixed with 8 bytes of timestamp. Extract ESL by stripping off the timestamp before passing to ESL parser. Fixes: 4b3e71e9a34c ("integrity/powerpc: Support loading keys from PLPKS") Cc: stable@vger.kenrnel.org # v6.3 Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230608120444.382527-1-nayna@linux.ibm.com
* powerpc/mm/dax: Fix the condition when checking if altmap vmemap can ↵Aneesh Kumar K.V2023-06-211-1/+1
| | | | | | | | | | | | | | | cross-boundary Without this fix, the last subsection vmemmap can end up in memory even if the namespace is created with -M mem and has sufficient space in the altmap area. Fixes: cf387d9644d8 ("libnvdimm/altmap: Track namespace boundaries in altmap") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616110826.344417-6-aneesh.kumar@linux.ibm.com
* powerpc/book3s64/mm: Use PAGE_KERNEL instead of opencodingAneesh Kumar K.V2023-06-211-2/+1
| | | | | | | | | | No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616110826.344417-5-aneesh.kumar@linux.ibm.com
* powerpc/book3s64/mm: Fix DirectMap stats in /proc/meminfoAneesh Kumar K.V2023-06-211-12/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On memory unplug reduce DirectMap page count correctly. root@ubuntu-guest:# grep Direct /proc/meminfo DirectMap4k: 0 kB DirectMap64k: 0 kB DirectMap2M: 115343360 kB DirectMap1G: 0 kB Before fix: root@ubuntu-guest:# ndctl disable-namespace all disabled 1 namespace root@ubuntu-guest:# grep Direct /proc/meminfo DirectMap4k: 0 kB DirectMap64k: 0 kB DirectMap2M: 115343360 kB DirectMap1G: 0 kB After fix: root@ubuntu-guest:# ndctl disable-namespace all disabled 1 namespace root@ubuntu-guest:# grep Direct /proc/meminfo DirectMap4k: 0 kB DirectMap64k: 0 kB DirectMap2M: 104857600 kB DirectMap1G: 0 kB Fixes: a2dc009afa9a ("powerpc/mm/book3s/radix: Add mapping statistics") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616110826.344417-4-aneesh.kumar@linux.ibm.com
* powerpc/mm/book3s64: Use pmdp_ptep helper instead of typecasting.Aneesh Kumar K.V2023-06-201-1/+1
| | | | | | | | | | No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616110826.344417-2-aneesh.kumar@linux.ibm.com
* powerpc: update ppc_save_regs to save current r1 in pt_regsAditya Gupta2023-06-191-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ppc_save_regs() skips one stack frame while saving the CPU register states. Instead of saving current R1, it pulls the previous stack frame pointer. When vmcores caused by direct panic call (such as `echo c > /proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the backtrace correctly. On further analysis, it was found that it was because of mismatch between r1 and NIP. GDB uses NIP to get current function symbol and uses corresponding debug info of that function to unwind previous frames, but due to the mismatching r1 and NIP, the unwinding does not work, and it fails to unwind to the 2nd frame and hence does not show the backtrace. GDB backtrace with vmcore of kernel without this patch: --------- (gdb) bt #0 0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>, newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69 #1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974 #2 0x0000000000000063 in ?? () #3 0xc000000003579320 in ?? () --------- Further analysis revealed that the mismatch occurred because "ppc_save_regs" was saving the previous stack's SP instead of the current r1. This patch fixes this by storing current r1 in the saved pt_regs. GDB backtrace with vmcore of patched kernel: -------- (gdb) bt #0 0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8) at ./arch/powerpc/include/asm/kexec.h:69 #1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974 #2 0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n") at kernel/panic.c:358 #3 0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155 #4 0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:602 #5 0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>, count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163 #6 0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>, buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340 #7 proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352 #8 0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00, buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582 #9 0xc0000000005b4264 in ksys_write (fd=<optimized out>, buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2) at fs/read_write.c:637 #10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>) at arch/powerpc/kernel/syscall.c:171 #11 0xc00000000000c270 in system_call_vectored_common () at arch/powerpc/kernel/interrupt_64.S:192 -------- Nick adds: So this now saves regs as though it was an interrupt taken in the caller, at the instruction after the call to ppc_save_regs, whereas previously the NIP was there, but R1 came from the caller's caller and that mismatch is what causes gdb's dwarf unwinder to go haywire. Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Fixes: d16a58f8854b1 ("powerpc: Improve ppc_save_regs()") Reivewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230615091047.90433-1-adityag@linux.ibm.com
* powerpc/ftrace: Disable ftrace on ppc32 if using clangNaveen N Rao2023-06-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | Ftrace on ppc32 expects a three instruction sequence at the beginning of each function when specifying -pg: mflr r0 stw r0,4(r1) bl _mcount This is the case with all supported versions of gcc. Clang however emits a branch to _mcount after the function prologue, similar to the pre -mprofile-kernel ABI on ppc64. This is not supported. Disable ftrace on ppc32 if using clang for now. This can be re-enabled later if clang picks up support for -fpatchable-function-entry on ppc32. Signed-off-by: Naveen N Rao <naveen@kernel.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/llvm/llvm-project/issues/63220 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609034501.407971-1-naveen@kernel.org
* powerpc/powernv/sriov: perform null check on iov before dereferencing iovColin Ian King2023-06-191-3/+3
| | | | | | | | | | | | | | | | | | | Currently pointer iov is being dereferenced before the null check of iov which can lead to null pointer dereference errors. Fix this by moving the iov null check before the dereferencing. Detected using cppcheck static analysis: linux/arch/powerpc/platforms/powernv/pci-sriov.c:597:12: warning: Either the condition '!iov' is redundant or there is possible null pointer dereference: iov. [nullPointerRedundantCheck] num_vfs = iov->num_vfs; ^ Fixes: 052da31d45fc ("powerpc/powernv/sriov: De-indent setup and teardown") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230608095849.1147969-1-colin.i.king@gmail.com
* selftests/powerpc/dexcr: Add DEXCR status utility lsdexcrBenjamin Gray2023-06-193-0/+144
| | | | | | | | | | | | | | | | | | | | | | | | Add a utility 'lsdexcr' to print the current DEXCR status. Useful for quickly checking the status such as when debugging test failures or verifying the new default DEXCR does what you want (for userspace at least). Example output: # ./lsdexcr uDEXCR: 04000000 (NPHIE) HDEXCR: 00000000 Effective: 04000000 (NPHIE) SBHE (0): clear (Speculative branch hint enable) IBRTPD (3): clear (Indirect branch recurrent target ...) SRAPD (4): clear (Subroutine return address ...) NPHIE * (5): set (Non-privileged hash instruction enable) PHIE (6): clear (Privileged hash instruction enable) DEXCR[NPHIE] enabled: hashst/hashchk working Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-12-bgray@linux.ibm.com
* selftests/powerpc/dexcr: Add hashst/hashchk testBenjamin Gray2023-06-199-0/+449
| | | | | | | | | | | | | Test the kernel DEXCR[NPHIE] interface and hashchk exception handling. Introduces with it a DEXCR utils library for common DEXCR operations. Volatile is used to prevent the compiler optimising away the signal tests. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-11-bgray@linux.ibm.com
* selftests/powerpc: Add more utility macrosBenjamin Gray2023-06-192-3/+26
| | | | | | | | | | | | | | Adds _MSG assertion variants to provide more context behind why a failure occurred. Also include unistd.h for _exit() and stdio.h for fprintf(), and move ARRAY_SIZE macro to utils.h. The _MSG variants and ARRAY_SIZE will be used by the following DEXCR selftests. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-10-bgray@linux.ibm.com
* Documentation: Document PowerPC kernel DEXCR interfaceBenjamin Gray2023-06-192-0/+59
| | | | | | | | Describe the DEXCR and document how to configure it. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-9-bgray@linux.ibm.com
* powerpc/ptrace: Expose HASHKEYR register to ptraceBenjamin Gray2023-06-194-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | The HASHKEYR register contains a secret per-process key to enable unique hashes per process. In general it should not be exposed to userspace at all and a regular process has no need to know its key. However, checkpoint restore in userspace (CRIU) functionality requires that a process be able to set the HASHKEYR of another process, otherwise existing hashes on the stack would be invalidated by a new random key. Exposing HASHKEYR in this way also makes it appear in core dumps, which is a security concern. Multiple threads may share a key, for example just after a fork() call, where the kernel cannot know if the child is going to return back along the parent's stack. If such a thread is coerced into making a core dump, then the HASHKEYR value will be readable and able to be used against all other threads sharing that key, effectively undoing any protection offered by hashst/hashchk. Therefore we expose HASHKEYR to ptrace when CONFIG_CHECKPOINT_RESTORE is enabled, providing a choice of increased security or migratable ROP protected processes. This is similar to how ARM exposes its PAC keys. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-8-bgray@linux.ibm.com
* powerpc/ptrace: Expose DEXCR and HDEXCR registers to ptraceBenjamin Gray2023-06-194-1/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | The DEXCR register is of interest when ptracing processes. Currently it is static, but eventually will be dynamically controllable by a process. If a process can control its own, then it is useful for it to be ptrace-able to (e.g., for checkpoint-restore functionality). It is also relevant to core dumps (the NPHIE aspect in particular), which use the ptrace mechanism (or is it the other way around?) to decide what to dump. The HDEXCR is useful here too, as the NPHIE aspect may be set in the HDEXCR without being set in the DEXCR. Although the HDEXCR is per-cpu and we don't track it in the task struct (it's useless in normal operation), it would be difficult to imagine why a hypervisor would set it to different values within a guest. A hypervisor cannot safely set NPHIE differently at least, as that would break programs. Expose a read-only view of the userspace DEXCR and HDEXCR to ptrace. The HDEXCR is always readonly, and is useful for diagnosing the core dumps (as the HDEXCR may set NPHIE without the DEXCR setting it). Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> [mpe: Use lower_32_bits() rather than open coding] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-7-bgray@linux.ibm.com
* powerpc/dexcr: Support userspace ROP protectionBenjamin Gray2023-06-192-0/+18
| | | | | | | | | | | | | | | The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR HASHKEYR to hold a key used in the hash calculation. This key should be different for each process to make it harder for a malicious process to recreate valid hash values for a victim process. Add support for storing a per-thread hash key, and setting/clearing HASHKEYR appropriately. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-6-bgray@linux.ibm.com
* powerpc/dexcr: Handle hashchk exceptionBenjamin Gray2023-06-192-0/+17
| | | | | | | | | | | | | | | | Recognise and pass the appropriate signal to the user program when a hashchk instruction triggers. This is independent of allowing configuration of DEXCR[NPHIE], as a hypervisor can enforce this aspect regardless of the kernel. The signal mirrors how ARM reports their similar check failure. For example, their FPAC handler in arch/arm64/kernel/traps.c do_el0_fpac() does this. When we fail to read the instruction that caused the fault we send a segfault, similar to how emulate_math() does it. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-5-bgray@linux.ibm.com
* powerpc/dexcr: Add initial Dynamic Execution Control Register (DEXCR) supportBenjamin Gray2023-06-195-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ISA 3.1B introduces the Dynamic Execution Control Register (DEXCR). It is a per-cpu register that allows control over various CPU behaviours including branch hint usage, indirect branch speculation, and hashst/hashchk support. Add some definitions and basic support for the DEXCR in the kernel. Right now it just * Initialises the DEXCR and HASHKEYR to a fixed value when a CPU onlines. * Clears them in reset_sprs(). * Detects when the NPHIE aspect is supported (the others don't get looked at in this series, so there's no need to waste a CPU_FTR on them). We initialise the HASHKEYR to ensure that all cores have the same key, so an HV enforced NPHIE + swapping cores doesn't randomly crash a process using hash instructions. The stores to HASHKEYR are unconditional because the ISA makes no mention of the SPR being missing if support for doing the hashes isn't present. So all that would happen is the HASHKEYR value gets ignored. This helps slightly if NPHIE detection fails; e.g., we currently only detect it on pseries. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Use simple values for DEXCR constants] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-4-bgray@linux.ibm.com
* powerpc/ptrace: Add missing <linux/regset.h> includeBenjamin Gray2023-06-191-0/+2
| | | | | | | | | | | | ptrace-decl.h uses user_regset_get2_fn (among other things) from regset.h. While all current users of ptrace-decl.h include regset.h before it anyway, it adds an implicit ordering dependency and breaks source tooling that tries to inspect ptrace-decl.h by itself. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-3-bgray@linux.ibm.com
* powerpc/book3s: Add missing <linux/sched.h> includeBenjamin Gray2023-06-191-0/+1
| | | | | | | | | | | | | | | | The functions here use struct task_struct fields, so need to import the full definition from <linux/sched.h>. The <asm/current.h> header that defines current only forward declares struct task_struct. Failing to include this <linux/sched.h> header leads to a compilation error when a translation unit does not also include <linux/sched.h> indirectly. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-2-bgray@linux.ibm.com
* powerpc/build: vdso linker warning for orphan sectionsNicholas Piggin2023-06-153-2/+8
| | | | | | | | | | Add --orphan-handlin for vdsos, and adjust vdso linker scripts to deal with orphan sections. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609051002.3342-1-npiggin@gmail.com
* powerpc/64s: Fix VAS mm use after freeNicholas Piggin2023-06-152-2/+2
| | | | | | | | | | | | | The refcount on mm is dropped before the coprocessor is detached. Reported-by: Sachin Sant <sachinp@linux.ibm.com> Fixes: 7bc6f71bdff5f ("powerpc/vas: Define and use common vas_window struct") Fixes: b22f2d88e435c ("powerpc/pseries/vas: Integrate API with open/close windows") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230607101024.14559-1-npiggin@gmail.com
* powerpc/64: Rename entry_64.S to prom_entry_64.SNicholas Piggin2023-06-153-34/+10
| | | | | | | | | | This file contains only the enter_prom implementation now. Trim includes and update header comment while we're here. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-7-npiggin@gmail.com
* powerpc: merge 32-bit and 64-bit _switch implementationNicholas Piggin2023-06-155-282/+273
| | | | | | | | | | | | | | The _switch stack frame setup are substantially the same, so are the comments. The difference in how the stack and current are switched, and other hardware and software housekeeping is done is moved into macros. Generated code should be unchanged. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Tweak include orer to fix compile errors on some configs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-6-npiggin@gmail.com
* powerpc/32: Rearrange _switch to prepare for 32/64 mergeNicholas Piggin2023-06-141-5/+5
| | | | | | | | | | Change the order of some operations and change some register numbers in preparation to merge 32-bit and 64-bit switch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-5-npiggin@gmail.com
* powerpc/32: Remove sync from _switchNicholas Piggin2023-06-141-7/+1
| | | | | | | | | | | | 64-bit has removed the sync from _switch since commit 9145effd626d1 ("powerpc/64: Drop explicit hwsync in context switch"). The same logic there should apply to 32-bit. Remove the sync and replace with a placeholder comment (32 and 64 will be merged with a later change). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-4-npiggin@gmail.com
* powerpc/64: Rearrange 64-bit _switch to prepare for 32/64 mergeNicholas Piggin2023-06-141-20/+18
| | | | | | | | | | | | | More some 64-bit specifics out from the function epilogue and rearrange this to be a bit neater, use 32-bit mem ops for CR save/restore, and change some register numbers. This is preparation to consolidate 32-bit and 64-bit switch code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-3-npiggin@gmail.com
* powerpc/64s: move stack SLB pinning out of line from _switchNicholas Piggin2023-06-141-51/+62
| | | | | | | | | | | | | The large hunk of SLB pinning in _switch asm code makes it more difficult to see everything else that's going on. It is a less important path now, so icache and fetch footprint overhead can be avoided. Move context switch stack SLB pinning out of line. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-2-npiggin@gmail.com
* powerpc/32s: Fix LLVM SMP buildNicholas Piggin2023-06-141-4/+4
| | | | | | | | | LLVM assembler does not recognise 3-operand cmpi, use cmpwi. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606131828.315427-1-npiggin@gmail.com
* powerpc/64s: Remove support for ELFv1 little endian userspaceNicholas Piggin2023-06-142-1/+11
| | | | | | | | | | | | ELFv2 was introduced together with little-endian. ELFv1 with LE has never been a thing. The GNU toolchain can create such a beast, but anyone doing that is a maniac who needs to be stopped so I consider this patch a feature. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606093832.199712-5-npiggin@gmail.com
* powerpc/64: Use -mprofile-kernel for big endian ELFv2 kernelsNicholas Piggin2023-06-142-7/+9
| | | | | | | | | | | | | | -mprofile-kernel is an optimised calling convention for mcount that Linux has only implemented with the ELFv2 ABI, so it was disabled for big endian kernels. However it does work with ELFv2 big endian, so let's allow that if the compiler supports it. Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606093832.199712-4-npiggin@gmail.com
* powerpc/64: Make ELFv2 the default for big-endian buildsNicholas Piggin2023-06-141-2/+4
| | | | | | | | | | | All supported toolchains now support ELFv2 on big-endian, so flip the default on this and hide the option behind EXPERT for the purpose of bug hunting. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606093832.199712-3-npiggin@gmail.com
* powerpc/64: Force ELFv2 when building with LLVM linkerNicholas Piggin2023-06-141-4/+2
| | | | | | | | | | | | | | | | | | The LLVM linker does not support ELFv1 at all, so BE kernels must be built with ELFv2. The LLD version check was added to be conservative, LLD simply fails to link ELFv1 entirely, effectively requiring LLD >= 15 and ELFv2 for BE builds. Instead remove that restriction until proven otherwise (LLD 14.0 links a booting ELFv2 BE vmlinux for me). The minimum GNU binutils has increased such that ELFv2 is always supported, so remove that check while we're here. Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606093832.199712-2-npiggin@gmail.com
* powerpc/build: Remove -pipe from compilation flagsNicholas Piggin2023-06-142-3/+3
| | | | | | | | | | | | | | | x86 removed -pipe in commit 437e88ab8f9e2 ("x86/build: Remove -pipe from KBUILD_CFLAGS") and the newer arm64 and riscv seem to have never used it, so that seems to be the way the world's going. Compile performance building defconfig on a POWER10 PowerNV system was in the noise after 10 builds each. No point in adding options unless they help something, so remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606064830.184083-1-npiggin@gmail.com