summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/process.c
Commit message (Collapse)AuthorAgeFilesLines
* powerpc: Hide empty pt_regs at base of the stackMichael Ellerman2023-10-191-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A thread started via eg. user_mode_thread() runs in the kernel to begin with and then may later return to userspace. While it's running in the kernel it has a pt_regs at the base of its kernel stack, but that pt_regs is all zeroes. If the thread oopses in that state, it leads to an ugly stack trace with a big block of zero GPRs, as reported by Joel: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc7-00004-gf7757129e3de-dirty #3 Hardware name: IBM PowerNV (emulated by qemu) POWER9 0x4e1200 opal:v7.0 PowerNV Call Trace: [c0000000036afb00] [c0000000010dd058] dump_stack_lvl+0x6c/0x9c (unreliable) [c0000000036afb30] [c00000000013c524] panic+0x178/0x424 [c0000000036afbd0] [c000000002005100] mount_root_generic+0x250/0x324 [c0000000036afca0] [c0000000020057d0] prepare_namespace+0x2d4/0x344 [c0000000036afd20] [c0000000020049c0] kernel_init_freeable+0x358/0x3ac [c0000000036afdf0] [c0000000000111b0] kernel_init+0x30/0x1a0 [c0000000036afe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c --- interrupt: 0 at 0x0 NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000 REGS: c0000000036afe80 TRAP: 0000 Not tainted (6.5.0-rc7-00004-gf7757129e3de-dirty) MSR: 0000000000000000 <> CR: 00000000 XER: 00000000 CFAR: 0000000000000000 IRQMASK: 0 GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 NIP [0000000000000000] 0x0 LR [0000000000000000] 0x0 --- interrupt: 0 The all-zero pt_regs looks ugly and conveys no useful information, other than its presence. So detect that case and just show the presence of the frame by printing the interrupt marker, eg: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00126-g18e9506562a0-dirty #301 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries Call Trace: [c000000003aabb00] [c000000001143db8] dump_stack_lvl+0x6c/0x9c (unreliable) [c000000003aabb30] [c00000000014c624] panic+0x178/0x424 [c000000003aabbd0] [c0000000020050fc] mount_root_generic+0x250/0x324 [c000000003aabca0] [c0000000020057cc] prepare_namespace+0x2d4/0x344 [c000000003aabd20] [c0000000020049bc] kernel_init_freeable+0x358/0x3ac [c000000003aabdf0] [c0000000000111b0] kernel_init+0x30/0x1a0 [c000000003aabe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c --- interrupt: 0 at 0x0 To avoid ever suppressing a valid pt_regs make sure the pt_regs has a zero MSR and TRAP value, and is located at the very base of the stack. Fixes: 6895dfc04741 ("powerpc: copy_thread fill in interrupt frame marker and back chain") Reported-by: Joel Stanley <joel@jms.id.au> Reported-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230824064210.907266-1-mpe@ellerman.id.au
* powerpc/dexcr: Support userspace ROP protectionBenjamin Gray2023-06-191-0/+17
| | | | | | | | | | | | | | | The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR HASHKEYR to hold a key used in the hash calculation. This key should be different for each process to make it harder for a malicious process to recreate valid hash values for a victim process. Add support for storing a per-thread hash key, and setting/clearing HASHKEYR appropriately. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-6-bgray@linux.ibm.com
* powerpc: copy_thread don't set PPR in user interrupt frame regsNicholas Piggin2023-04-111-5/+0
| | | | | | | | | | | | syscalls do not set the PPR field in their interrupt frame and return from syscall always sets the default PPR for userspace, so setting the value in the ret_from_fork frame is not necessary and mildly inconsistent. Remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-9-npiggin@gmail.com
* powerpc: copy_thread don't set _TIF_RESTOREALLNicholas Piggin2023-04-111-2/+0
| | | | | | | | | | | In the kernel user thread path, don't set _TIF_RESTOREALL because the thread is required to call kernel_execve() before it returns, which will set _TIF_RESTOREALL if necessary via start_thread(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-8-npiggin@gmail.com
* powerpc: differentiate kthread from user kernel thread startNicholas Piggin2023-04-111-3/+4
| | | | | | | | | | | | | | | | | Kernel created user threads start similarly to kernel threads in that they call a kernel function after first returning from _switch, so they share ret_from_kernel_thread for this. Kernel threads never return from that function though, whereas user threads often do (although some don't, e.g., IO threads). Split these startup functions in two, and catch kernel threads that improperly return from their function. This is intended to make the complicated code a little bit easier to understand. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-7-npiggin@gmail.com
* powerpc: copy_thread differentiate kthreads and user mode threadsNicholas Piggin2023-04-111-36/+62
| | | | | | | | | | | | | | | | | | | | | When copy_thread is given a kernel function to run in arg->fn, this does not necessarily mean it is a kernel thread. User threads can be created this way (e.g., kernel_init, see also x86's copy_thread()). These threads run a kernel function which may call kernel_execve() and return, which returns like a userspace exec(2) syscall. Kernel threads are to be differentiated with PF_KTHREAD, will always have arg->fn set, and should never return from that function, instead calling kthread_exit() to exit. Create separate paths for the kthread and user kernel thread creation logic. The kthread path will never exit and does not require a user interrupt frame, so it gets a minimal stack frame. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-6-npiggin@gmail.com
* powerpc: use switch frame for ret_from_kernel_thread parametersNicholas Piggin2023-04-111-4/+9
| | | | | | | | | | | | | | The kernel thread path in copy_thread creates a user interrupt frame on stack and stores the function and arg parameters there, and ret_from_kernel_thread loads them. This is a slightly confusing way to overload that frame. Non-volatile registers are loaded from the switch frame, so the parameters can be stored there. The user interrupt frame is now only used by user threads when they return to user. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-4-npiggin@gmail.com
* powerpc: copy_thread make ret_from_fork register setup consistentNicholas Piggin2023-04-111-3/+0
| | | | | | | | | | | The ret_from_fork code for 64e and 32-bit set r3 for syscall_exit_prepare the same way that 64s does, so there should be no need to special-case them in copy_thread. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-3-npiggin@gmail.com
* powerpc: copy_thread remove unused pkey codeNicholas Piggin2023-04-111-11/+1
| | | | | | | | | | The pkey registers (AMR, IAMR) do not get loaded from the switch frame so it is pointless to save anything there. Remove the dead code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-2-npiggin@gmail.com
* powerpc: Fix a kernel-doc warningBo Liu2023-03-151-1/+1
| | | | | | | | | | | | | The current code provokes a kernel-doc warnings: arch/powerpc/kernel/process.c:1606: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20221101015452.3216-1-liubo03@inspur.com
* powerpc: Skip stack validation checking alternate stacks if they are not ↵Nicholas Piggin2023-02-101-0/+11
| | | | | | | | | | | | | allocated Stack validation in early boot can just bail out of checking alternate stacks if they are not validated yet. Checking against a NULL stack could cause NULLish pointer values to be considered valid. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221216115930.2667772-5-npiggin@gmail.com
* powerpc: Remove __kernel_text_address() in show_instructions()Christophe Leroy2023-02-101-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | That test was introducted in 2006 by commit 00ae36de49cc ("[POWERPC] Better check in show_instructions"). At that time, there was no BPF progs. As seen in message of commit 89d21e259a94 ("powerpc/bpf/32: Fix Oops on tail call tests"), when a page fault occurs in test_bpf.ko for instance, the code is dumped as XXXXXXXXs. Allthough __kernel_text_address() checks is_bpf_text_address(), it seems it is not enough. Today, show_instructions() uses get_kernel_nofault() to read the code, so there is no real need for additional verifications. ARM64 and x86 don't do any additional check before dumping instructions. Do the same and remove __kernel_text_address() in show_instructions(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4fd69ef7945518c3e27f96b95046a5c1468d35bf.1675245773.git.christophe.leroy@csgroup.eu
* Merge tag 'powerpc-6.2-1' of ↵Linus Torvalds2022-12-191-27/+70
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add powerpc qspinlock implementation optimised for large system scalability and paravirt. See the merge message for more details - Enable objtool to be built on powerpc to generate mcount locations - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is restricted to the patching CPU - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI - Sanitise user registers on interrupt entry on 64-bit Book3S - Many other small features and fixes Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, and Wolfram Sang. * tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits) powerpc/code-patching: Fix oops with DEBUG_VM enabled powerpc/qspinlock: Fix 32-bit build powerpc/prom: Fix 32-bit build powerpc/rtas: mandate RTAS syscall filtering powerpc/rtas: define pr_fmt and convert printk call sites powerpc/rtas: clean up includes powerpc/rtas: clean up rtas_error_log_max initialization powerpc/pseries/eeh: use correct API for error log size powerpc/rtas: avoid scheduling in rtas_os_term() powerpc/rtas: avoid device tree lookups in rtas_os_term() powerpc/rtasd: use correct OF API for event scan rate powerpc/rtas: document rtas_call() powerpc/pseries: unregister VPA when hot unplugging a CPU powerpc/pseries: reset the RCU watchdogs after a LPM powerpc: Take in account addition CPU node when building kexec FDT powerpc: export the CPU node count powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state powerpc/dts/fsl: Fix pca954x i2c-mux node names cxl: Remove unnecessary cxl_pci_window_alignment() selftests/powerpc: Fix resource leaks ...
| * powerpc: allow minimum sized kernel stack framesNicholas Piggin2022-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | This affects only 64-bit ELFv2 kernels, and reduces the minimum asm-created stack frame size from 112 to 32 byte on those kernels. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-16-npiggin@gmail.com
| * powerpc: split validate_sp into two functionsNicholas Piggin2022-12-021-9/+14
| | | | | | | | | | | | | | | | | | | | | | Most callers just want to validate an arbitrary kernel stack pointer, some need a particular size. Make the size case the exceptional one with an extra function. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-15-npiggin@gmail.com
| * powerpc: copy_thread add a back chain to the switch stack frameNicholas Piggin2022-12-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stack unwinders need LR and the back chain as a minimum. The switch stack uses regs->nip for its return pointer rather than lrsave, so that was not set in the fork frame, and neither was the back chain. This change sets those fields in the stack. With this and the previous change, a stack trace in the switch or interrupt stack goes from looking like this: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 3 PID: 90 Comm: systemd Not tainted NIP: c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff [ ... regs ... ] NIP [c000000000011060] _switch+0x160/0x17c LR [c000000000010f68] _switch+0x68/0x17c Call Trace: To this: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 0 PID: 93 Comm: systemd Not tainted NIP: c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff [ ... regs ... ] NIP [c000000000011060] _switch+0x160/0x17c LR [c000000000010f68] _switch+0x68/0x17c Call Trace: [c000000005a93e10] [c00000000000cdbc] ret_from_fork_scv+0x0/0x54 --- interrupt: 3000 at 0x7fffa72f56d8 NIP: 00007fffa72f56d8 LR: 0000000000000000 CTR: 0000000000000000 [ ... regs ... ] NIP [00007fffa72f56d8] 0x7fffa72f56d8 LR [0000000000000000] 0x0 --- interrupt: 3000 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-14-npiggin@gmail.com
| * powerpc: copy_thread fill in interrupt frame marker and back chainNicholas Piggin2022-12-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Backtraces will not recognise the fork system call interrupt without the regs marker. And regular interrupt entry from userspace creates the back chain to the user stack, so do this for the initial fork frame too, to be consistent. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-13-npiggin@gmail.com
| * powerpc: add a define for the switch frame size and regs offsetNicholas Piggin2022-12-021-4/+8
| | | | | | | | | | | | | | | | | | | | | | This is open-coded in process.c, ppc32 uses a different define with the same value, and the C definition is name differently which makes it an extra indirection to grep for. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-12-npiggin@gmail.com
| * powerpc: add a define for the user interrupt frame sizeNicholas Piggin2022-12-021-3/+3
| | | | | | | | | | | | | | | | | | | | The user interrupt frame is a different size from the kernel frame, so give it its own name. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-11-npiggin@gmail.com
| * powerpc: Rename STACK_FRAME_MARKER and derive it from frame offsetNicholas Piggin2022-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | This is a count of longs from the stack pointer to the regs marker. Rename it to make it more distinct from the other byte offsets. It can be derived from the byte offset definitions just added. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-10-npiggin@gmail.com
| * powerpc: add definition for pt_regs offset within an interrupt frameNicholas Piggin2022-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | This is a common offset that currently uses the overloaded STACK_FRAME_OVERHEAD constant. It's easier to read and more flexible to use a specific regs offset for this. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-8-npiggin@gmail.com
| * powerpc: Rearrange copy_thread child stack creationNicholas Piggin2022-12-021-5/+6
| | | | | | | | | | | | | | | | | | | | | | This makes it a bit clearer where the stack frame is created, and will allow easier use of some of the stack offset constants in a later change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-5-npiggin@gmail.com
| * powerpc: Allow clearing and restoring registers independent of saved ↵Jordan Niethe2022-11-301-3/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | breakpoint state For the coming temporary mm used for instruction patching, the breakpoint registers need to be cleared to prevent them from accidentally being triggered. As soon as the patching is done, the breakpoints will be restored. The breakpoint state is stored in the per-cpu variable current_brk[]. Add a suspend_breakpoints() function which will clear the breakpoint registers without touching the state in current_brk[]. Add a pair function restore_breakpoints() which will move the state in current_brk[] back to the registers. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-2-bgray@linux.ibm.com
| * powerpc: Print instruction dump on a single lineMichael Ellerman2022-11-241-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although the previous commit made the powerpc instruction dump usable with scripts/decodecode, there are still some problems. Because the dump is split across multiple lines, the script doesn't cope with printk timestamps or caller info. That can be fixed by printing the entire dump on one line, eg: [ 12.016307][ T112] --- interrupt: c00 [ 12.016605][ T112] Code: 4b7aae15 60000000 3d22016e 3c62ffec 39291160 38639bc0 e8890000 4b7aadf9 60000000 4bfffee8 7c0802a6 60000000 <0fe00000> 60420000 3c4c008f 384268a0 [ 12.017655][ T112] ---[ end trace 0000000000000000 ]--- That output can then be piped directly into scripts/decodecode and interpreted correctly. Printing the dump on a single line does produce a very long line, about 173 characters. That is still shorter than x86, which prints nearly 200 characters even without timestamps etc. All consoles I'm aware of will wrap the line if it's too long, so the length should not be a functional problem. If anything it should help on consoles like VGA by using less vertical space. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006032019.1128624-2-mpe@ellerman.id.au
| * powerpc: Make instruction dump work with scripts/decodecodeMichael Ellerman2022-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Matt reported that scripts/decodecode doesn't work for the instruction dump in the powerpc oops output. Although there are scripts around that can decode it, it would be preferable if the standard in-tree script worked. All other arches prefix the instruction dump with "Code:", and that's what the script looks for, so use that. The script then works as expected: $ CROSS_COMPILE=powerpc64le-linux-gnu- ./scripts/decodecode Code: fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 ^D All code ======== 0: f0 ff c1 fb std r30,-16(r1) 4: c1 ff 21 f8 stdu r1,-64(r1) 8: 78 1b 7d 7c mr r29,r3 ... Note that the script doesn't cope well with printk timestamps or printk caller info. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006032019.1128624-1-mpe@ellerman.id.au
* | treewide: use get_random_u32_below() instead of deprecated functionJason A. Donenfeld2022-11-181-1/+1
|/ | | | | | | | | | | | | | | | | | | | This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds2022-10-161-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
| * treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld2022-10-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* | Merge tag 'mm-nonmm-stable-2022-10-11' of ↵Linus Torvalds2022-10-121-5/+0
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - hfs and hfsplus kmap API modernization (Fabio Francesco) - make crash-kexec work properly when invoked from an NMI-time panic (Valentin Schneider) - ntfs bugfixes (Hawkins Jiawei) - improve IPC msg scalability by replacing atomic_t's with percpu counters (Jiebin Sun) - nilfs2 cleanups (Minghao Chi) - lots of other single patches all over the tree! * tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype proc: test how it holds up with mapping'less process mailmap: update Frank Rowand email address ia64: mca: use strscpy() is more robust and safer init/Kconfig: fix unmet direct dependencies ia64: update config files nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure fork: remove duplicate included header files init/main.c: remove unnecessary (void*) conversions proc: mark more files as permanent nilfs2: remove the unneeded result variable nilfs2: delete unnecessary checks before brelse() checkpatch: warn for non-standard fixes tag style usr/gen_init_cpio.c: remove unnecessary -1 values from int file ipc/msg: mitigate the lock contention with percpu counter percpu: add percpu_counter_add_local and percpu_counter_sub_local fs/ocfs2: fix repeated words in comments relay: use kvcalloc to alloc page array in relay_alloc_page_array proc: make config PROC_CHILDREN depend on PROC_FS fs: uninline inode_maybe_inc_iversion() ...
| * kernel: exit: cleanup release_thread()Kefeng Wang2022-09-111-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only x86 has own release_thread(), introduce a new weak release_thread() function to clean empty definitions in other ARCHs. Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Guo Ren <guoren@kernel.org> [csky] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Brian Cain <bcain@quicinc.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Stafford Horne <shorne@gmail.com> [openrisc] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Huacai Chen <chenhuacai@kernel.org> [LoongArch] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> [csky] Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | powerpc/64: Fix msr_check_and_set/clear MSR[EE] raceNicholas Piggin2022-10-041-2/+2
|/ | | | | | | | | | | | | | | | | irq soft-masking means that when Linux irqs are disabled, the MSR[EE] value can change from 1 to 0 asynchronously: if a masked interrupt of the PACA_IRQ_MUST_HARD_MASK variety fires while irqs are disabled, the masked handler will return with MSR[EE]=0. This means a sequence like mtmsr(mfmsr() | MSR_FP) is racy if it can be called with local irqs disabled, unless a hard_irq_disable has been done. Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221004051157.308999-2-npiggin@gmail.com
* powerpc: Enable execve syscall exit tracepointNaveen N. Rao2022-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On execve[at], we are zero'ing out most of the thread register state including gpr[0], which contains the syscall number. Due to this, we fail to trigger the syscall exit tracepoint properly. Fix this by retaining gpr[0] in the thread register state. Before this patch: # tail /sys/kernel/debug/tracing/trace cat-123 [000] ..... 61.449351: sys_execve(filename: 7fffa6b23448, argv: 7fffa6b233e0, envp: 7fffa6b233f8) cat-124 [000] ..... 62.428481: sys_execve(filename: 7fffa6b23448, argv: 7fffa6b233e0, envp: 7fffa6b233f8) echo-125 [000] ..... 65.813702: sys_execve(filename: 7fffa6b23378, argv: 7fffa6b233a0, envp: 7fffa6b233b0) echo-125 [000] ..... 65.822214: sys_execveat(fd: 0, filename: 1009ac48, argv: 7ffff65d0c98, envp: 7ffff65d0ca8, flags: 0) After this patch: # tail /sys/kernel/debug/tracing/trace cat-127 [000] ..... 100.416262: sys_execve(filename: 7fffa41b3448, argv: 7fffa41b33e0, envp: 7fffa41b33f8) cat-127 [000] ..... 100.418203: sys_execve -> 0x0 echo-128 [000] ..... 103.873968: sys_execve(filename: 7fffa41b3378, argv: 7fffa41b33a0, envp: 7fffa41b33b0) echo-128 [000] ..... 103.875102: sys_execve -> 0x0 echo-128 [000] ..... 103.882097: sys_execveat(fd: 0, filename: 1009ac48, argv: 7fffd10d2148, envp: 7fffd10d2158, flags: 0) echo-128 [000] ..... 103.883225: sys_execveat -> 0x0 Cc: stable@vger.kernel.org Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Sumit Dubey2 <Sumit.Dubey2@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220609103328.41306-1-naveen.n.rao@linux.vnet.ibm.com
* Merge tag 'powerpc-5.19-2' of ↵Linus Torvalds2022-06-091-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - On 32-bit fix overread/overwrite of thread_struct via ptrace PEEK/POKE. - Fix softirqs not switching to the softirq stack since we moved irq_exit(). - Force thread size increase when KASAN is enabled to avoid stack overflows. - On Book3s 64 mark more code as not to be instrumented by KASAN to avoid crashes. - Exempt __get_wchan() from KASAN checking, as it's inherently racy. - Fix a recently introduced crash in the papr_scm driver in some configurations. - Remove include of <generated/compile.h> which is forbidden. Thanks to Ariel Miculas, Chen Jingwen, Christophe Leroy, Erhard Furtner, He Ying, Kees Cook, Masahiro Yamada, Nageswara R Sastry, Paul Mackerras, Sachin Sant, Vaibhav Jain, and Wanming Hu. * tag 'powerpc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32: Fix overread/overwrite of thread_struct via ptrace powerpc/book3e: get rid of #include <generated/compile.h> powerpc/kasan: Force thread size increase with KASAN powerpc/papr_scm: don't requests stats with '0' sized stats buffer powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK powerpc/kasan: Silence KASAN warnings in __get_wchan() powerpc/kasan: Mark more real-mode code as not to be instrumented
| * powerpc/kasan: Silence KASAN warnings in __get_wchan()He Ying2022-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following KASAN warning was reported in our kernel. BUG: KASAN: stack-out-of-bounds in get_wchan+0x188/0x250 Read of size 4 at addr d216f958 by task ps/14437 CPU: 3 PID: 14437 Comm: ps Tainted: G O 5.10.0 #1 Call Trace: [daa63858] [c0654348] dump_stack+0x9c/0xe4 (unreliable) [daa63888] [c035cf0c] print_address_description.constprop.3+0x8c/0x570 [daa63908] [c035d6bc] kasan_report+0x1ac/0x218 [daa63948] [c00496e8] get_wchan+0x188/0x250 [daa63978] [c0461ec8] do_task_stat+0xce8/0xe60 [daa63b98] [c0455ac8] proc_single_show+0x98/0x170 [daa63bc8] [c03cab8c] seq_read_iter+0x1ec/0x900 [daa63c38] [c03cb47c] seq_read+0x1dc/0x290 [daa63d68] [c037fc94] vfs_read+0x164/0x510 [daa63ea8] [c03808e4] ksys_read+0x144/0x1d0 [daa63f38] [c005b1dc] ret_from_syscall+0x0/0x38 --- interrupt: c00 at 0x8fa8f4 LR = 0x8fa8cc The buggy address belongs to the page: page:98ebcdd2 refcount:0 mapcount:0 mapping:00000000 index:0x2 pfn:0x1216f flags: 0x0() raw: 00000000 00000000 01010122 00000000 00000002 00000000 ffffffff 00000000 raw: 00000000 page dumped because: kasan: bad access detected Memory state around the buggy address: d216f800: 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 d216f880: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >d216f900: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 ^ d216f980: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 d216fa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 After looking into this issue, I find the buggy address belongs to the task stack region. It seems KASAN has something wrong. I look into the code of __get_wchan in x86 architecture and find the same issue has been resolved by the commit f7d27c35ddff ("x86/mm, kasan: Silence KASAN warnings in get_wchan()"). The solution could be applied to powerpc architecture too. As Andrey Ryabinin said, get_wchan() is racy by design, it may access volatile stack of running task, thus it may access redzone in a stack frame and cause KASAN to warn about this. Use READ_ONCE_NOCHECK() to silence these warnings. Reported-by: Wanming Hu <huwanming@huaweil.com> Signed-off-by: He Ying <heying24@huawei.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220121014418.155675-1-heying24@huawei.com
* | Merge tag 'kthread-cleanups-for-v5.19' of ↵Linus Torvalds2022-06-031-7/+8
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull kthread updates from Eric Biederman: "This updates init and user mode helper tasks to be ordinary user mode tasks. Commit 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") caused init and the user mode helper threads that call kernel_execve to have struct kthread allocated for them. This struct kthread going away during execve in turned made a use after free of struct kthread possible. Here, commit 343f4c49f243 ("kthread: Don't allocate kthread_struct for init and umh") is enough to fix the use after free and is simple enough to be backportable. The rest of the changes pass struct kernel_clone_args to clean things up and cause the code to make sense. In making init and the user mode helpers tasks purely user mode tasks I ran into two complications. The function task_tick_numa was detecting tasks without an mm by testing for the presence of PF_KTHREAD. The initramfs code in populate_initrd_image was using flush_delayed_fput to ensuere the closing of all it's file descriptors was complete, and flush_delayed_fput does not work in a userspace thread. I have looked and looked and more complications and in my code review I have not found any, and neither has anyone else with the code sitting in linux-next" * tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: sched: Update task_tick_numa to ignore tasks without an mm fork: Stop allowing kthreads to call execve fork: Explicitly set PF_KTHREAD init: Deal with the init process being a user mode process fork: Generalize PF_IO_WORKER handling fork: Explicity test for idle tasks in copy_thread fork: Pass struct kernel_clone_args into copy_thread kthread: Don't allocate kthread_struct for init and umh
| * fork: Generalize PF_IO_WORKER handlingEric W. Biederman2022-05-071-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add fn and fn_arg members into struct kernel_clone_args and test for them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER). This allows any task that wants to be a user space task that only runs in kernel mode to use this functionality. The code on x86 is an exception and still retains a PF_KTHREAD test because x86 unlikely everything else handles kthreads slightly differently than user space tasks that start with a function. The functions that created tasks that start with a function have been updated to set ".fn" and ".fn_arg" instead of ".stack" and ".stack_size". These functions are fork_idle(), create_io_thread(), kernel_thread(), and user_mode_thread(). Link: https://lkml.kernel.org/r/20220506141512.516114-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * fork: Pass struct kernel_clone_args into copy_threadEric W. Biederman2022-05-071-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With io_uring we have started supporting tasks that are for most purposes user space tasks that exclusively run code in kernel mode. The kernel task that exec's init and tasks that exec user mode helpers are also user mode tasks that just run kernel code until they call kernel execve. Pass kernel_clone_args into copy_thread so these oddball tasks can be supported more cleanly and easily. v2: Fix spelling of kenrel_clone_args on h8300 Link: https://lkml.kernel.org/r/20220506141512.516114-2-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | powerpc: Remove asm/prom.h from all files that don't need itChristophe Leroy2022-05-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | Several files include asm/prom.h for no reason. Clean it up. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Drop change to prom_parse.c as reported by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7c9b8fda63dcf63e1b28f43e7ebdb95182cbc286.1646767214.git.christophe.leroy@csgroup.eu
* | powerpc: fix typos in commentsJulia Lawall2022-05-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
* | powerpc: Simplify and move arch_randomize_brk()Christophe Leroy2022-05-051-41/+0
|/ | | | | | | | | | | | | | | | | arch_randomize_brk() is only needed for hash on book3s/64, for other platforms the one provided by the default mmap layout is good enough. Move it to hash_utils.c and use randomize_page() like the generic one. And properly opt out the radix case instead of making an assumption on mmu_highuser_ssize. Also change to a 32M range like most other architectures instead of 8M. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eafa4d18ec8ac7b98dd02b40181e61643707cc7c.1649523076.git.christophe.leroy@csgroup.eu
* powerpc/inst: Define ppc_inst_tChristophe Leroy2021-12-091-1/+1
| | | | | | | | | | In order to stop using 'struct ppc_inst' on PPC32, define a ppc_inst_t typedef. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
* powerpc: Add KUAP support for BOOKE and 40xChristophe Leroy2021-12-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On booke/40x we don't have segments like book3s/32. On booke/40x we don't have access protection groups like 8xx. Use the PID register to provide user access protection. Kernel address space can be accessed with any PID. User address space has to be accessed with the PID of the user. User PID is always not null. Everytime the kernel is entered, set PID register to 0 and restore PID register when returning to user. Everytime kernel needs to access user data, PID is restored for the access. In TLB miss handlers, check the PID and bail out to data storage exception when PID is 0 and accessed address is in user space. Note that also forbids execution of user text by kernel except when user access is unlocked. But this shouldn't be a problem as the kernel is not supposed to ever run user text. This patch prepares the infrastructure but the real activation of KUAP is done by following patches for each processor type one by one. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5d65576a8e31e9480415785a180c92dd4e72306d.1634627931.git.christophe.leroy@csgroup.eu
* powerpc/kuap: Prepare for supporting KUAP on BOOK3E/64Christophe Leroy2021-12-091-3/+3
| | | | | | | | | | | | | | | Also call kuap_lock() and kuap_save_and_lock() from interrupt functions with CONFIG_PPC64. For book3s/64 we keep them empty as it is done in assembly. Also do the locked assert when switching task unless it is book3s/64. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1cbf94e26e6d6e2e028fd687588a7e6622d454a6.1634627931.git.christophe.leroy@csgroup.eu
* powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMUNicholas Piggin2021-12-091-6/+7
| | | | | | | | | | | | | Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves 128kB kernel image size (90kB text) on powernv_defconfig minus KVM, 350kB on pseries_defconfig minus KVM, 40kB on a tiny config. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG. Fix radix_enabled() use in setup_initial_memory_limit(). Add some stubs to reduce number of ifdefs.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
* KVM: PPC: Book3S HV P9: Use Linux SPR save/restore to manage some host SPRsNicholas Piggin2021-11-241-0/+6
| | | | | | | | | | | | | | | Linux implements SPR save/restore including storage space for registers in the task struct for process context switching. Make use of this similarly to the way we make use of the context switching fp/vec save restore. This improves code reuse, allows some stack space to be saved, and helps with avoiding VRSAVE updates if they are not required. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-39-npiggin@gmail.com
* KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRsNicholas Piggin2021-11-241-0/+28
| | | | | | | | | | | | | This reduces the number of mtmsrd required to enable facility bits when saving/restoring registers, by having the KVM code set all bits up front rather than using individual facility functions that set their particular MSR bits. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-20-npiggin@gmail.com
* sched: Add wrapper for get_wchan() to keep task blockedKees Cook2021-10-151-6/+3
| | | | | | | | | | | | | Having a stable wchan means the process must be blocked and for it to stay that way while performing stack unwinding. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm] Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64] Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
* powerpc: Add dear as a synonym for pt_regs.dar registerXiongwei Song2021-08-261-1/+1
| | | | | | | | | | | | Create an anonymous union for dar and dear regsiters, we can reference dear to get the effective address when CONFIG_4xx=y or CONFIG_BOOKE=y. Otherwise, reference dar. This makes code more clear. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Reword commit title] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210807010239.416055-4-sxwjean@me.com
* powerpc: Add esr as a synonym for pt_regs.dsisrXiongwei Song2021-08-261-1/+1
| | | | | | | | | | | | Create an anonymous union for dsisr and esr regsiters, we can reference esr to get the exception detail when CONFIG_4xx=y or CONFIG_BOOKE=y. Otherwise, reference dsisr. This makes code more clear. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Reword commit title] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210807010239.416055-2-sxwjean@me.com
* Merge tag 'powerpc-5.14-1' of ↵Linus Torvalds2021-07-021-37/+70
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei. * tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits) powerpc: Only build restart_table.c for 64s powerpc/64s: move ret_from_fork etc above __end_soft_masked powerpc/64s/interrupt: clean up interrupt return labels powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols powerpc/64: enable MSR[EE] in irq replay pt_regs powerpc/64s/interrupt: preserve regs->softe for NMI interrupts powerpc/64s: add a table of implicit soft-masked addresses powerpc/64e: remove implicit soft-masking and interrupt exit restart logic powerpc/64e: fix CONFIG_RELOCATABLE build warnings powerpc/64s: fix hash page fault interrupt handler powerpc/4xx: Fix setup_kuep() on SMP powerpc/32s: Fix setup_{kuap/kuep}() on SMP powerpc/interrupt: Use names in check_return_regs_valid() powerpc/interrupt: Also use exit_must_hard_disable() on PPC32 powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE powerpc/ptrace: Refactor regs_set_return_{msr/ip} powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip} powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() powerpc/pseries/vas: Include irqdomain.h powerpc: mark local variables around longjmp as volatile ...