| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idle_state entry in the PACA on PowerNV features a bit which is
atomically tested and set through ldarx/stdcx. to be used as a spinlock.
This lock then guards access to other bit fields of idle_state. KCSAN
cannot differentiate between any of these bitfield accesses as they all
are implemented by 8-byte store/load instructions, thus cores contending
on the bit-lock appear to data race with modifications to idle_state.
Separate the bit-lock entry from the data guarded by the lock to avoid
the possibility of data races being detected by KCSAN.
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230510033117.1395895-7-rmclure@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Checks to see if the [H]SRR registers have been clobbered by (soft)
NMI interrupts imply the possibility for a data race on the
[h]srr_valid entries in the PACA. Annotate accesses to these fields with
READ_ONCE, removing the need for the barrier.
The diagnostic can use plain-access reads and writes, but annotate with
data_race.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230510033117.1395895-5-rmclure@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Annotate the release barrier and memory clobber (in effect, producing a
compiler barrier) in the publish_tail_cpu call. These barriers have the
effect of ensuring that qnode attributes are all written to prior to
publish the node to the waitqueue.
Even while the initial write to the 'locked' attribute is guaranteed to
terminate prior to the node being visible, KCSAN still complains that
the write is reorderable by the compiler. Issue a kcsan_release() to
inform KCSAN of the release barrier contained in publish_tail_cpu().
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230510033117.1395895-3-rmclure@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
The powerpc implementation of qspinlocks will both poll and spin on the
bitlock guarding a qnode. Mark these accesses with READ_ONCE to convey
to KCSAN that polling is intentional here.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230510033117.1395895-2-rmclure@linux.ibm.com
|
|
|
|
|
|
|
| |
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230613045202.294451-4-joel@jms.id.au
|
|
|
|
|
|
|
|
|
|
| |
With IODA1 support gone the OPAL calls to set MVE are dead code. Remove
them.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230613045202.294451-3-joel@jms.id.au
|
|
|
|
|
|
|
|
|
|
|
|
| |
The final "VPL" Power7 boxes that were used for powernv bringup have
been scrapped, meaning there are no machines with ioda1 left.
This patch removes the obvious unused code.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230613045202.294451-2-joel@jms.id.au
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some builds, the mpc52xx_pm_prepare()/lite5200_pm_prepare() functions
generate stack size warnings. The addition of 'struct resource' in commit
2500763dd3db ("powerpc: Use of_address_to_resource()") grew the stack size
and is blamed for the warnings. However, the real issue is there's no
reason the 'struct of_device_id immr_ids' DT match tables need to be on
the stack as they are constant. Declare them as static to move them off
the stack.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306130405.uTv5yOZD-lkp@intel.com/
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230614171724.2403982-1-robh@kernel.org
|
|
|
|
|
|
|
|
|
|
| |
"ranges" is a standard property, and we have common helper functions
for parsing it, so let's use the for_each_of_range() iterator.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230609183232.1767050-1-robh@kernel.org
|
|
|
|
|
|
|
|
|
|
|
| |
"ranges" is a standard property with common parsing functions. Users
shouldn't be implementing their own parsing of it. Refactor the FSL RapidIO
"ranges" parsing to use of_range_to_resource() instead.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230609183238.1767186-1-robh@kernel.org
|
|
|
|
|
|
|
|
|
|
| |
Use the recently added of_property_read_reg() helper to get the
untranslated "reg" address value.
Signed-off-by: Rob Herring <robh@kernel.org>
[mpe: Add required include of of_address.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230609183151.1766261-1-robh@kernel.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"ranges" is a standard property with common parsing functions. Users
shouldn't be implementing their own parsing of it. Refactor the FSL RapidIO
"ranges" parsing to use of_range_to_resource() instead.
One change is the original code would look for "#size-cells" and
"#address-cells" in the parent node if not found in the port child
nodes. That is non-standard behavior and not necessary AFAICT. In 2011
in commit 54986964c13c ("powerpc/85xx: Update SRIO device tree nodes")
there was an ABI break. The upstream .dts files have been correct since
at least that point.
Signed-off-by: Rob Herring <robh@kernel.org>
[mpe: Remove now unused "cell" variable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230609183244.1767325-1-robh@kernel.org
"ranges" is a standard property with common parsing functions. Users
shouldn't be implementing their own parsing of it. Refactor the FSL RapidIO
"ranges" parsing to use of_range_to_resource() instead.
One change is the original code would look for "#size-cells" and
"#address-cells" in the parent node if not found in the port child
nodes. That is non-standard behavior and not necessary AFAICT. In 2011
in commit 54986964c13c ("powerpc/85xx: Update SRIO device tree nodes")
there was an ABI break. The upstream .dts files have been correct since
at least that point.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230609183244.1767325-1-robh@kernel.org
|
|
|
|
|
|
|
|
|
|
| |
Use the recently added of_property_read_reg() helper to get the
untranslated "reg" address value.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230609182926.1763589-1-robh@kernel.org
|
|
|
|
|
|
|
|
|
|
| |
Replace open coded reading of "reg" and of_translate_address() calls with
single call to of_address_to_resource().
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230319163226.226583-1-robh@kernel.org
|
|
|
|
|
|
|
|
|
|
| |
Replace open coded reading of CPU nodes' "reg" properties with
of_get_cpu_hwid() dedicated for this purpose.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230319145931.65499-1-robh@kernel.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MPC8541/8548/8555 Configurable Development System (CDS) were the
vehicle used to provide evaluation of the 1st e500-v2 CPUs around 2007.
Similar to the earlier MPC83xx-MDS systems we removed, the "brains"
exist on a PCI-X card, but additional connectors exist to the right of
the PCI-X slot, two structural metal pins are used to provide stability
in a vertical ATX mounting, and the CPU is now on a daughter-card vs. a
clamped down BGA.
Given the extra complexity and risk of connector damage, the 8548CDS
I had access to came pre-assembled in a basic white Antec case common
for that era, and I'm inclined to assume that was the default.
Power was typical "Pentium4" 2005 ATX - the main 20 pin connector went
to the PCI ATX form factor backplane, and the 4 pin black/yellow went
to the CPU card.
Like previous evaluation boards, they attempted to provide break-out
connectors for as many features as possible, and that made for a fairly
complex looking system.
In any case, these are over 15 years old, and fairly complex systems,
originally made for a small group of industry related people, and made
for use where quiet fan operation wasn't important. Given that, it
makes sense to remove support from them in 2023.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230620043300.197546-3-paul.gortmaker@windriver.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the revision history in the manual(s), these e500-v1
platforms were first available around 2002.
Like a lot of evaluation boards, they attempted to provide break-out
connectors for all possible features, and that combined with four
PCI-X slots (and the age/era) meant for a considerably large board.
As I recall it, from a Linux point of view, the biggest difference
between 8540 and 8560 was in the UART implementation, and that is
reflected in a diff of the defconfigs.
In any case, these are over 20 years old, and by today's standards
only have a small amount of DDR1 memory, and were not widely available.
Given that, it makes sense to remove support from them in 2023.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230620043300.197546-2-paul.gortmaker@windriver.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On PowerVM guest, variable data is prefixed with 8 bytes of timestamp.
Extract ESL by stripping off the timestamp before passing to ESL parser.
Fixes: 4b3e71e9a34c ("integrity/powerpc: Support loading keys from PLPKS")
Cc: stable@vger.kenrnel.org # v6.3
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230608120444.382527-1-nayna@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cross-boundary
Without this fix, the last subsection vmemmap can end up in memory even if
the namespace is created with -M mem and has sufficient space in the altmap
area.
Fixes: cf387d9644d8 ("libnvdimm/altmap: Track namespace boundaries in altmap")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616110826.344417-6-aneesh.kumar@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
| |
No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616110826.344417-5-aneesh.kumar@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On memory unplug reduce DirectMap page count correctly.
root@ubuntu-guest:# grep Direct /proc/meminfo
DirectMap4k: 0 kB
DirectMap64k: 0 kB
DirectMap2M: 115343360 kB
DirectMap1G: 0 kB
Before fix:
root@ubuntu-guest:# ndctl disable-namespace all
disabled 1 namespace
root@ubuntu-guest:# grep Direct /proc/meminfo
DirectMap4k: 0 kB
DirectMap64k: 0 kB
DirectMap2M: 115343360 kB
DirectMap1G: 0 kB
After fix:
root@ubuntu-guest:# ndctl disable-namespace all
disabled 1 namespace
root@ubuntu-guest:# grep Direct /proc/meminfo
DirectMap4k: 0 kB
DirectMap64k: 0 kB
DirectMap2M: 104857600 kB
DirectMap1G: 0 kB
Fixes: a2dc009afa9a ("powerpc/mm/book3s/radix: Add mapping statistics")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616110826.344417-4-aneesh.kumar@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
| |
No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com <mailto:sachinp@linux.ibm.com>>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616110826.344417-2-aneesh.kumar@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ppc_save_regs() skips one stack frame while saving the CPU register states.
Instead of saving current R1, it pulls the previous stack frame pointer.
When vmcores caused by direct panic call (such as `echo c >
/proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
backtrace correctly. On further analysis, it was found that it was because
of mismatch between r1 and NIP.
GDB uses NIP to get current function symbol and uses corresponding debug
info of that function to unwind previous frames, but due to the
mismatching r1 and NIP, the unwinding does not work, and it fails to
unwind to the 2nd frame and hence does not show the backtrace.
GDB backtrace with vmcore of kernel without this patch:
---------
(gdb) bt
#0 0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>,
newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69
#1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974
#2 0x0000000000000063 in ?? ()
#3 0xc000000003579320 in ?? ()
---------
Further analysis revealed that the mismatch occurred because
"ppc_save_regs" was saving the previous stack's SP instead of the current
r1. This patch fixes this by storing current r1 in the saved pt_regs.
GDB backtrace with vmcore of patched kernel:
--------
(gdb) bt
#0 0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8)
at ./arch/powerpc/include/asm/kexec.h:69
#1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974
#2 0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n")
at kernel/panic.c:358
#3 0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155
#4 0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false)
at drivers/tty/sysrq.c:602
#5 0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>,
count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163
#6 0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>,
buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340
#7 proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>,
ppos=<optimized out>) at fs/proc/inode.c:352
#8 0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00,
buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>,
count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582
#9 0xc0000000005b4264 in ksys_write (fd=<optimized out>,
buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2)
at fs/read_write.c:637
#10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>)
at arch/powerpc/kernel/syscall.c:171
#11 0xc00000000000c270 in system_call_vectored_common ()
at arch/powerpc/kernel/interrupt_64.S:192
--------
Nick adds:
So this now saves regs as though it was an interrupt taken in the
caller, at the instruction after the call to ppc_save_regs, whereas
previously the NIP was there, but R1 came from the caller's caller and
that mismatch is what causes gdb's dwarf unwinder to go haywire.
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Fixes: d16a58f8854b1 ("powerpc: Improve ppc_save_regs()")
Reivewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230615091047.90433-1-adityag@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ftrace on ppc32 expects a three instruction sequence at the beginning of
each function when specifying -pg:
mflr r0
stw r0,4(r1)
bl _mcount
This is the case with all supported versions of gcc. Clang however emits
a branch to _mcount after the function prologue, similar to the pre
-mprofile-kernel ABI on ppc64. This is not supported.
Disable ftrace on ppc32 if using clang for now. This can be re-enabled
later if clang picks up support for -fpatchable-function-entry on ppc32.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/llvm/llvm-project/issues/63220
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230609034501.407971-1-naveen@kernel.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently pointer iov is being dereferenced before the null check of iov
which can lead to null pointer dereference errors. Fix this by moving the
iov null check before the dereferencing.
Detected using cppcheck static analysis:
linux/arch/powerpc/platforms/powernv/pci-sriov.c:597:12: warning: Either
the condition '!iov' is redundant or there is possible null pointer
dereference: iov. [nullPointerRedundantCheck]
num_vfs = iov->num_vfs;
^
Fixes: 052da31d45fc ("powerpc/powernv/sriov: De-indent setup and teardown")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230608095849.1147969-1-colin.i.king@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a utility 'lsdexcr' to print the current DEXCR status. Useful for
quickly checking the status such as when debugging test failures or
verifying the new default DEXCR does what you want (for userspace at
least). Example output:
# ./lsdexcr
uDEXCR: 04000000 (NPHIE)
HDEXCR: 00000000
Effective: 04000000 (NPHIE)
SBHE (0): clear (Speculative branch hint enable)
IBRTPD (3): clear (Indirect branch recurrent target ...)
SRAPD (4): clear (Subroutine return address ...)
NPHIE * (5): set (Non-privileged hash instruction enable)
PHIE (6): clear (Privileged hash instruction enable)
DEXCR[NPHIE] enabled: hashst/hashchk working
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-12-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test the kernel DEXCR[NPHIE] interface and hashchk exception handling.
Introduces with it a DEXCR utils library for common DEXCR operations.
Volatile is used to prevent the compiler optimising away the signal
tests.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-11-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds _MSG assertion variants to provide more context behind why a
failure occurred. Also include unistd.h for _exit() and stdio.h for
fprintf(), and move ARRAY_SIZE macro to utils.h.
The _MSG variants and ARRAY_SIZE will be used by the following
DEXCR selftests.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-10-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
| |
Describe the DEXCR and document how to configure it.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-9-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The HASHKEYR register contains a secret per-process key to enable unique
hashes per process. In general it should not be exposed to userspace
at all and a regular process has no need to know its key.
However, checkpoint restore in userspace (CRIU) functionality requires
that a process be able to set the HASHKEYR of another process, otherwise
existing hashes on the stack would be invalidated by a new random key.
Exposing HASHKEYR in this way also makes it appear in core dumps, which
is a security concern. Multiple threads may share a key, for example
just after a fork() call, where the kernel cannot know if the child is
going to return back along the parent's stack. If such a thread is
coerced into making a core dump, then the HASHKEYR value will be
readable and able to be used against all other threads sharing that key,
effectively undoing any protection offered by hashst/hashchk.
Therefore we expose HASHKEYR to ptrace when CONFIG_CHECKPOINT_RESTORE is
enabled, providing a choice of increased security or migratable ROP
protected processes. This is similar to how ARM exposes its PAC keys.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-8-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DEXCR register is of interest when ptracing processes. Currently it
is static, but eventually will be dynamically controllable by a process.
If a process can control its own, then it is useful for it to be
ptrace-able to (e.g., for checkpoint-restore functionality).
It is also relevant to core dumps (the NPHIE aspect in particular),
which use the ptrace mechanism (or is it the other way around?) to
decide what to dump. The HDEXCR is useful here too, as the NPHIE aspect
may be set in the HDEXCR without being set in the DEXCR. Although the
HDEXCR is per-cpu and we don't track it in the task struct (it's useless
in normal operation), it would be difficult to imagine why a hypervisor
would set it to different values within a guest. A hypervisor cannot
safely set NPHIE differently at least, as that would break programs.
Expose a read-only view of the userspace DEXCR and HDEXCR to ptrace.
The HDEXCR is always readonly, and is useful for diagnosing the core
dumps (as the HDEXCR may set NPHIE without the DEXCR setting it).
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
[mpe: Use lower_32_bits() rather than open coding]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-7-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR HASHKEYR
to hold a key used in the hash calculation. This key should be different
for each process to make it harder for a malicious process to recreate
valid hash values for a victim process.
Add support for storing a per-thread hash key, and setting/clearing
HASHKEYR appropriately.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-6-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recognise and pass the appropriate signal to the user program when a
hashchk instruction triggers. This is independent of allowing
configuration of DEXCR[NPHIE], as a hypervisor can enforce this aspect
regardless of the kernel.
The signal mirrors how ARM reports their similar check failure. For
example, their FPAC handler in arch/arm64/kernel/traps.c do_el0_fpac()
does this. When we fail to read the instruction that caused the fault
we send a segfault, similar to how emulate_math() does it.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-5-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ISA 3.1B introduces the Dynamic Execution Control Register (DEXCR). It
is a per-cpu register that allows control over various CPU behaviours
including branch hint usage, indirect branch speculation, and
hashst/hashchk support.
Add some definitions and basic support for the DEXCR in the kernel.
Right now it just
* Initialises the DEXCR and HASHKEYR to a fixed value when a CPU
onlines.
* Clears them in reset_sprs().
* Detects when the NPHIE aspect is supported (the others don't get
looked at in this series, so there's no need to waste a CPU_FTR
on them).
We initialise the HASHKEYR to ensure that all cores have the same key,
so an HV enforced NPHIE + swapping cores doesn't randomly crash a
process using hash instructions. The stores to HASHKEYR are
unconditional because the ISA makes no mention of the SPR being missing
if support for doing the hashes isn't present. So all that would happen
is the HASHKEYR value gets ignored. This helps slightly if NPHIE
detection fails; e.g., we currently only detect it on pseries.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Use simple values for DEXCR constants]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-4-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
ptrace-decl.h uses user_regset_get2_fn (among other things) from
regset.h. While all current users of ptrace-decl.h include regset.h
before it anyway, it adds an implicit ordering dependency and breaks
source tooling that tries to inspect ptrace-decl.h by itself.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-3-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The functions here use struct task_struct fields, so need to import
the full definition from <linux/sched.h>. The <asm/current.h> header
that defines current only forward declares struct task_struct.
Failing to include this <linux/sched.h> header leads to a compilation
error when a translation unit does not also include <linux/sched.h>
indirectly.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-2-bgray@linux.ibm.com
|
|
|
|
|
|
|
|
|
|
| |
Add --orphan-handlin for vdsos, and adjust vdso linker scripts to deal
with orphan sections.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230609051002.3342-1-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The refcount on mm is dropped before the coprocessor is detached.
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: 7bc6f71bdff5f ("powerpc/vas: Define and use common vas_window struct")
Fixes: b22f2d88e435c ("powerpc/pseries/vas: Integrate API with open/close windows")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230607101024.14559-1-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
| |
This file contains only the enter_prom implementation now.
Trim includes and update header comment while we're here.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606132447.315714-7-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The _switch stack frame setup are substantially the same, so are the
comments. The difference in how the stack and current are switched,
and other hardware and software housekeeping is done is moved into
macros.
Generated code should be unchanged.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Tweak include orer to fix compile errors on some configs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606132447.315714-6-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Change the order of some operations and change some register numbers in
preparation to merge 32-bit and 64-bit switch.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606132447.315714-5-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
64-bit has removed the sync from _switch since commit 9145effd626d1
("powerpc/64: Drop explicit hwsync in context switch"). The same
logic there should apply to 32-bit. Remove the sync and replace with
a placeholder comment (32 and 64 will be merged with a later change).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606132447.315714-4-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
More some 64-bit specifics out from the function epilogue and rearrange
this to be a bit neater, use 32-bit mem ops for CR save/restore, and
change some register numbers.
This is preparation to consolidate 32-bit and 64-bit switch code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606132447.315714-3-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The large hunk of SLB pinning in _switch asm code makes it more
difficult to see everything else that's going on. It is a less important
path now, so icache and fetch footprint overhead can be avoided.
Move context switch stack SLB pinning out of line.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606132447.315714-2-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
| |
LLVM assembler does not recognise 3-operand cmpi, use cmpwi.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606131828.315427-1-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
ELFv2 was introduced together with little-endian. ELFv1 with LE has
never been a thing. The GNU toolchain can create such a beast, but
anyone doing that is a maniac who needs to be stopped so I consider
this patch a feature.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606093832.199712-5-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-mprofile-kernel is an optimised calling convention for mcount that
Linux has only implemented with the ELFv2 ABI, so it was disabled for
big endian kernels. However it does work with ELFv2 big endian, so let's
allow that if the compiler supports it.
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606093832.199712-4-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
All supported toolchains now support ELFv2 on big-endian, so flip the
default on this and hide the option behind EXPERT for the purpose of
bug hunting.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606093832.199712-3-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LLVM linker does not support ELFv1 at all, so BE kernels must be
built with ELFv2. The LLD version check was added to be conservative,
LLD simply fails to link ELFv1 entirely, effectively requiring LLD >= 15
and ELFv2 for BE builds. Instead remove that restriction until proven
otherwise (LLD 14.0 links a booting ELFv2 BE vmlinux for me).
The minimum GNU binutils has increased such that ELFv2 is always
supported, so remove that check while we're here.
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606093832.199712-2-npiggin@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 removed -pipe in commit 437e88ab8f9e2 ("x86/build: Remove -pipe from
KBUILD_CFLAGS") and the newer arm64 and riscv seem to have never used it,
so that seems to be the way the world's going.
Compile performance building defconfig on a POWER10 PowerNV system
was in the noise after 10 builds each. No point in adding options unless
they help something, so remove it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606064830.184083-1-npiggin@gmail.com
|