summaryrefslogtreecommitdiffstats
path: root/kernel/kexec_file.c
Commit message (Collapse)AuthorAgeFilesLines
* arm64, crash: wrap crash dumping code into crash related ifdefsBaoquan He2024-02-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Now crash codes under kernel/ folder has been split out from kexec code, crash dumping can be separated from kexec reboot in config items on arm64 with some adjustments. Here wrap up crash dumping codes with CONFIG_CRASH_DUMP ifdeffery. [bhe@redhat.com: fix building error in generic codes] Link: https://lkml.kernel.org/r/20240129135033.157195-2-bhe@redhat.com Link: https://lkml.kernel.org/r/20240124051254.67105-8-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Michael Kelley <mhklinux@outlook.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* crash: split crash dumping code out from kexec_core.cBaoquan He2024-02-231-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, KEXEC_CORE select CRASH_CORE automatically because crash codes need be built in to avoid compiling error when building kexec code even though the crash dumping functionality is not enabled. E.g -------------------- CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_KEXEC=y CONFIG_KEXEC_FILE=y --------------------- After splitting out crashkernel reservation code and vmcoreinfo exporting code, there's only crash related code left in kernel/crash_core.c. Now move crash related codes from kexec_core.c to crash_core.c and only build it in when CONFIG_CRASH_DUMP=y. And also wrap up crash codes inside CONFIG_CRASH_DUMP ifdeffery scope, or replace inappropriate CONFIG_KEXEC_CORE ifdef with CONFIG_CRASH_DUMP ifdef in generic kernel files. With these changes, crash_core codes are abstracted from kexec codes and can be disabled at all if only kexec reboot feature is wanted. Link: https://lkml.kernel.org/r/20240124051254.67105-5-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Michael Kelley <mhklinux@outlook.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kexec_file: fix incorrect temp_start value in locate_mem_hole_top_down()Yuntao Wang2023-12-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | temp_end represents the address of the last available byte. Therefore, the starting address of the memory segment with temp_end as its last available byte and a size of `kbuf->memsz`, that is, the value of temp_start, should be `temp_end - kbuf->memsz + 1` instead of `temp_end - kbuf->memsz`. Additionally, use the ALIGN_DOWN macro instead of open-coding it directly in locate_mem_hole_top_down() to improve code readability. Link: https://lkml.kernel.org/r/20231217033528.303333-3-ytcoode@gmail.com Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kexec_file: print out debugging message if requiredBaoquan He2023-12-201-3/+8
| | | | | | | | | | | | | | | | | Then when specifying '-d' for kexec_file_load interface, loaded locations of kernel/initrd/cmdline etc can be printed out to help debug. Here replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. And also print out type/start/head of kimage and flags to help debug. Link: https://lkml.kernel.org/r/20231213055747.61826-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Conor Dooley <conor@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kexec_file: add kexec_file flag to control debug printingBaoquan He2023-12-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "kexec_file: print out debugging message if required", v4. Currently, specifying '-d' on kexec command will print a lot of debugging informationabout kexec/kdump loading with kexec_load interface. However, kexec_file_load prints nothing even though '-d' is specified. It's very inconvenient to debug or analyze the kexec/kdump loading when something wrong happened with kexec/kdump itself or develper want to check the kexec/kdump loading. In this patchset, a kexec_file flag is KEXEC_FILE_DEBUG added and checked in code. If it's passed in, debugging message of kexec_file code will be printed out and can be seen from console and dmesg. Otherwise, the debugging message is printed like beofre when pr_debug() is taken. Note: **** ===== 1) The code in kexec-tools utility also need be changed to support passing KEXEC_FILE_DEBUG to kernel when 'kexec -s -d' is specified. The patch link is here: ========= [PATCH] kexec_file: add kexec_file flag to support debug printing http://lists.infradead.org/pipermail/kexec/2023-November/028505.html 2) s390 also has kexec_file code, while I am not sure what debugging information is necessary. So leave it to s390 developer. Test: **** ==== Testing was done in v1 on x86_64 and arm64. For v4, tested on x86_64 again. And on x86_64, the printed messages look like below: -------------------------------------------------------------- kexec measurement buffer for the loaded kernel at 0x207fffe000. Loaded purgatory at 0x207fff9000 Loaded boot_param, command line and misc at 0x207fff3000 bufsz=0x1180 memsz=0x1180 Loaded 64bit kernel at 0x207c000000 bufsz=0xc88200 memsz=0x3c4a000 Loaded initrd at 0x2079e79000 bufsz=0x2186280 memsz=0x2186280 Final command line is: root=/dev/mapper/fedora_intel--knightslanding--lb--02-root ro rd.lvm.lv=fedora_intel-knightslanding-lb-02/root console=ttyS0,115200N81 crashkernel=256M E820 memmap: 0000000000000000-000000000009a3ff (1) 000000000009a400-000000000009ffff (2) 00000000000e0000-00000000000fffff (2) 0000000000100000-000000006ff83fff (1) 000000006ff84000-000000007ac50fff (2) ...... 000000207fff6150-000000207fff615f (128) 000000207fff6160-000000207fff714f (1) 000000207fff7150-000000207fff715f (128) 000000207fff7160-000000207fff814f (1) 000000207fff8150-000000207fff815f (128) 000000207fff8160-000000207fffffff (1) nr_segments = 5 segment[0]: buf=0x000000004e5ece74 bufsz=0x211 mem=0x207fffe000 memsz=0x1000 segment[1]: buf=0x000000009e871498 bufsz=0x4000 mem=0x207fff9000 memsz=0x5000 segment[2]: buf=0x00000000d879f1fe bufsz=0x1180 mem=0x207fff3000 memsz=0x2000 segment[3]: buf=0x000000001101cd86 bufsz=0xc88200 mem=0x207c000000 memsz=0x3c4a000 segment[4]: buf=0x00000000c6e38ac7 bufsz=0x2186280 mem=0x2079e79000 memsz=0x2187000 kexec_file_load: type:0, start:0x207fff91a0 head:0x109e004002 flags:0x8 --------------------------------------------------------------------------- This patch (of 7): When specifying 'kexec -c -d', kexec_load interface will print loading information, e.g the regions where kernel/initrd/purgatory/cmdline are put, the memmap passed to 2nd kernel taken as system RAM ranges, and printing all contents of struct kexec_segment, etc. These are very helpful for analyzing or positioning what's happening when kexec/kdump itself failed. The debugging printing for kexec_load interface is made in user space utility kexec-tools. Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing. Because kexec_file code is mostly implemented in kernel space, and the debugging printing functionality is missed. It's not convenient when debugging kexec/kdump loading and jumping with kexec_file_load interface. Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging message printing. And add global variable kexec_file_dbg_print and macro kexec_dprintk() to facilitate the printing. This is a preparation, later kexec_dprintk() will be used to replace the existing pr_debug(). Once 'kexec -s -d' is specified, it will print out kexec/kdump loading information. If '-d' is not specified, it regresses to pr_debug(). Link: https://lkml.kernel.org/r/20231213055747.61826-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20231213055747.61826-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Conor Dooley <conor@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kexec_file: load kernel at top of system RAM if requiredBaoquan He2023-12-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "kexec_file: Load kernel at top of system RAM if required". Justification: ============== Kexec_load interface has been doing top down searching and loading kernel/initrd/purgtory etc to prepare for kexec reboot. In that way, the benefits are that it avoids to consume and fragment limited low memory which satisfy DMA buffer allocation and big chunk of continuous memory during system init; and avoids to stir with BIOS/FW reserved or occupied areas, or corner case handling/work around/quirk occupied areas when doing system init. By the way, the top-down searching and loading of kexec-ed kernel is done in user space utility code. For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply ignored. It calls walk_system_ram_res() directly to go through all resources of System RAM bottom up, to find an available memory region, then call locate_mem_hole_callback() to allocate memory in that found memory region from top to down. This is not expected and inconsistent with kexec_load. Implementation =============== In patch 1, introduce a new function walk_system_ram_res_rev() which is a variant of walk_system_ram_res(), it walks through a list of all the resources of System RAM in reversed order, i.e., from higher to lower. In patch 2, check if kexec_buf.top_down is 'true' in kexec_walk_resources(), if yes, call walk_system_ram_res_rev() to find memory region of system RAM from top to down to load kernel/initrd etc. Background information: ======================= And I ever tried this in the past in a different way, please see below link. In the post, I tried to adjust struct sibling linking code, replace the the singly linked list with list_head so that walk_system_ram_res_rev() can be implemented in a much easier way. Finally I failed. https://lore.kernel.org/all/20180718024944.577-4-bhe@redhat.com/ This time, I picked up the patch from AKASHI Takahiro's old post and made some change to take as the current patch 1: https://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/531456.html This patch (of 2): Kexec_load interface has been doing top down searching and loading kernel/initrd/purgtory etc to prepare for kexec reboot. In that way, the benefits are that it avoids to consume and fragment limited low memory which satisfy DMA buffer allocation and big chunk of continuous memory during system init; and avoids to stir with BIOS/FW reserved or occupied areas, or corner case handling/work around/quirk occupied areas when doing system init. By the way, the top-down searching and loading of kexec-ed kernel is done in user space utility code. For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply ignored. It calls walk_system_ram_res() directly to go through all resources of System RAM bottom up, to find an available memory region, then call locate_mem_hole_callback() to allocate memory in that found memory region from top to down. This is not expected and inconsistent with kexec_load. Here check if kexec_buf.top_down is 'true' in kexec_walk_resources(), if yes, call the newly added walk_system_ram_res_rev() to find memory region of system RAM from top to down to load kernel/initrd etc. Link: https://lkml.kernel.org/r/20231114091658.228030-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20231114091658.228030-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Merge tag 'integrity-v6.6' of ↵Linus Torvalds2023-08-301-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity subsystem updates from Mimi Zohar: - With commit 099f26f22f58 ("integrity: machine keyring CA configuration") certificates may be loaded onto the IMA keyring, directly or indirectly signed by keys on either the "builtin" or the "machine" keyrings. With the ability for the system/machine owner to sign the IMA policy itself without needing to recompile the kernel, update the IMA architecture specific policy rules to require the IMA policy itself be signed. [ As commit 099f26f22f58 was upstreamed in linux-6.4, updating the IMA architecture specific policy now to require signed IMA policies may break userspace expectations. ] - IMA only checked the file data hash was not on the system blacklist keyring for files with an appended signature (e.g. kernel modules, Power kernel image). Check all file data hashes regardless of how it was signed - Code cleanup, and a kernel-doc update * tag 'integrity-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: kexec_lock: Replace kexec_mutex() by kexec_lock() in two comments ima: require signed IMA policy when UEFI secure boot is enabled integrity: Always reference the blacklist keyring with appraisal ima: Remove deprecated IMA_TRUSTED_KEYRING Kconfig
| * kexec_lock: Replace kexec_mutex() by kexec_lock() in two commentsWenyu Liu2023-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | kexec_mutex is replaced by an atomic variable in 05c6257433b (panic, kexec: make __crash_kexec() NMI safe). But there are still two comments that referenced kexec_mutex, replace them by kexec_lock. Signed-off-by: Wenyu Liu <liuwenyu7@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* | kexec: exclude elfcorehdr from the segment digestEric DeVolder2023-08-241-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a crash kernel is loaded via the kexec_file_load() syscall, the kernel places the various segments (ie crash kernel, crash initrd, boot_params, elfcorehdr, purgatory, etc) in memory. For those architectures that utilize purgatory, a hash digest of the segments is calculated for integrity checking. The digest is embedded into the purgatory image prior to placing in memory. Updates to the elfcorehdr in response to CPU and memory changes would cause the purgatory integrity checking to fail (at crash time, and no vmcore created). Therefore, the elfcorehdr segment is explicitly excluded from the purgatory digest, enabling updates to the elfcorehdr while also avoiding the need to recompute the hash digest and reload purgatory. Link: https://lkml.kernel.org/r/20230814214446.6659-4-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Suggested-by: Baoquan He <bhe@redhat.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | crash: move a few code bits to setup support of crash hotplugEric DeVolder2023-08-241-181/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28. Once the kdump service is loaded, if changes to CPUs or memory occur, either by hot un/plug or off/onlining, the crash elfcorehdr must also be updated. The elfcorehdr describes to kdump the CPUs and memory in the system, and any inaccuracies can result in a vmcore with missing CPU context or memory regions. The current solution utilizes udev to initiate an unload-then-reload of the kdump image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by the userspace kexec utility. In the original post I outlined the significant performance problems related to offloading this activity to userspace. This patchset introduces a generic crash handler that registers with the CPU and memory notifiers. Upon CPU or memory changes, from either hot un/plug or off/onlining, this generic handler is invoked and performs important housekeeping, for example obtaining the appropriate lock, and then invokes an architecture specific handler to do the appropriate elfcorehdr update. Note the description in patch 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that enables further optimizations related to CPU plug/unplug/online/offline performance of elfcorehdr updates. In the case of x86_64, the arch specific handler generates a new elfcorehdr, and overwrites the old one in memory; thus no involvement with userspace needed. To realize the benefits/test this patchset, one must make a couple of minor changes to userspace: - Prevent udev from updating kdump crash kernel on hot un/plug changes. Add the following as the first lines to the RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules: # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" With this changeset applied, the two rules evaluate to false for CPU and memory change events and thus skip the userspace unload-then-reload of kdump. - Change to the kexec_file_load for loading the kdump kernel: Eg. on RHEL: in /usr/bin/kdumpctl, change to: standard_kexec_args="-p -d -s" which adds the -s to select kexec_file_load() syscall. This kernel patchset also supports kexec_load() with a modified kexec userspace utility. A working changeset to the kexec userspace utility is posted to the kexec-tools mailing list here: http://lists.infradead.org/pipermail/kexec/2023-May/027049.html To use the kexec-tools patch, apply, build and install kexec-tools, then change the kdumpctl's standard_kexec_args to replace the -s with --hotplug. The removal of -s reverts to the kexec_load syscall and the addition of --hotplug invokes the changes put forth in the kexec-tools patch. This patch (of 8): The crash hotplug support leans on the work for the kexec_file_load() syscall. To also support the kexec_load() syscall, a few bits of code need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are moved out of kexec_file.c and into a common location crash_core.c. In addition, struct crash_mem and crash_notes were moved to new locales so that PROC_KCORE, which sets CRASH_CORE alone, builds correctly. No functionality change intended. Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | kexec: rename ARCH_HAS_KEXEC_PURGATORYEric DeVolder2023-08-181-3/+3
|/ | | | | | | | | | | The Kconfig refactor to consolidate KEXEC and CRASH options utilized option names of the form ARCH_SUPPORTS_<option>. Thus rename the ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow the same. Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of ↵Linus Torvalds2023-06-281-3/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-mm updates from Andrew Morton: - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level directories - Douglas Anderson has added a new "buddy" mode to the hardlockup detector. It permits the detector to work on architectures which cannot provide the required interrupts, by having CPUs periodically perform checks on other CPUs - Zhen Lei has enhanced kexec's ability to support two crash regions - Petr Mladek has done a lot of cleanup on the hard lockup detector's Kconfig entries - And the usual bunch of singleton patches in various places * tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits) kernel/time/posix-stubs.c: remove duplicated include ocfs2: remove redundant assignment to variable bit_off watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h devres: show which resource was invalid in __devm_ioremap_resource() watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64 watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h watchdog/hardlockup: make the config checks more straightforward watchdog/hardlockup: sort hardlockup detector related config values a logical way watchdog/hardlockup: move SMP barriers from common code to buddy code watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY watchdog/buddy: don't copy the cpumask in watchdog_next_cpu() watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog() watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy() watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick() watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe() watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails ...
| * kexec: avoid calculating array size twiceSimon Horman2023-06-091-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid calculating array size twice in kexec_purgatory_setup_sechdrs(). Once using array_size(), and once open-coded. Flagged by Coccinelle: .../kexec_file.c:881:8-25: WARNING: array_size is already used (line 877) to compute the same size No functional change intended. Compile tested only. Link: https://lkml.kernel.org/r/20230525-kexec-array_size-v1-1-8b4bf4f7500a@kernel.org Signed-off-by: Simon Horman <horms@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | kexec: support purgatories with .text.hot sectionsRicardo Ribalda2023-06-121-1/+13
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "kexec: Fix kexec_file_load for llvm16 with PGO", v7. When upreving llvm I realised that kexec stopped working on my test platform. The reason seems to be that due to PGO there are multiple .text sections on the purgatory, and kexec does not supports that. This patch (of 4): Clang16 links the purgatory text in two sections when PGO is in use: [ 1] .text PROGBITS 0000000000000000 00000040 00000000000011a1 0000000000000000 AX 0 0 16 [ 2] .rela.text RELA 0000000000000000 00003498 0000000000000648 0000000000000018 I 24 1 8 ... [17] .text.hot. PROGBITS 0000000000000000 00003220 000000000000020b 0000000000000000 AX 0 0 1 [18] .rela.text.hot. RELA 0000000000000000 00004428 0000000000000078 0000000000000018 I 24 17 8 And both of them have their range [sh_addr ... sh_addr+sh_size] on the area pointed by `e_entry`. This causes that image->start is calculated twice, once for .text and another time for .text.hot. The second calculation leaves image->start in a random location. Because of this, the system crashes immediately after: kexec_core: Starting new kernel Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-0-b05c520b7296@chromium.org Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-1-b05c520b7296@chromium.org Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Philipp Rudo <prudo@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kexec: remove unnecessary arch_kexec_kernel_image_load()Bjorn Helgaas2023-04-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | arch_kexec_kernel_image_load() only calls kexec_image_load_default(), and there are no arch-specific implementations. Remove the unnecessary arch_kexec_kernel_image_load() and make kexec_image_load_default() static. No functional change intended. Link: https://lkml.kernel.org/r/20230307224416.907040-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kexec: introduce sysctl parameters kexec_load_limit_*Ricardo Ribalda2023-02-021-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kexec allows replacing the current kernel with a different one. This is usually a source of concerns for sysadmins that want to harden a system. Linux already provides a way to disable loading new kexec kernel via kexec_load_disabled, but that control is very coard, it is all or nothing and does not make distinction between a panic kexec and a normal kexec. This patch introduces new sysctl parameters, with finer tuning to specify how many times a kexec kernel can be loaded. The sysadmin can set different limits for kexec panic and kexec reboot kernels. The value can be modified at runtime via sysctl, but only with a stricter value. With these new parameters on place, a system with loadpin and verity enabled, using the following kernel parameters: sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a good warranty that if initrd tries to load a panic kernel, a malitious user will have small chances to replace that kernel with a different one, even if they can trigger timeouts on the disk where the panic kernel lives. Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-3-6a8531a09b9a@chromium.org Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kexec: factor out kexec_load_permittedRicardo Ribalda2023-02-021-1/+1
| | | | | | | | | | | | | | | | | | | Both syscalls (kexec and kexec_file) do the same check, let's factor it out. Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-2-6a8531a09b9a@chromium.org Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Guilherme G. Piccoli <gpiccoli@igalia.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kexec: replace crash_mem_range with rangeLi Chen2022-11-181-1/+1
| | | | | | | | | | | | | | | | | | | | We already have struct range, so just use it. Link: https://lkml.kernel.org/r/20220929042936.22012-4-bhe@redhat.com Signed-off-by: Li Chen <lchen@ambarella.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Chen Lifu <chenlifu@huawei.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jianglei Nie <niejianglei2021@163.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Russell King <linux@armlinux.org.uk> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* panic, kexec: make __crash_kexec() NMI safeValentin Schneider2022-09-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI panic() doesn't work. The cause of that lies in the PREEMPT_RT definition of mutex_trylock(): if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task())) return 0; This prevents an nmi_panic() from executing the main body of __crash_kexec() which does the actual kexec into the kdump kernel. The warning and return are explained by: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context") [...] The reasons for this are: 1) There is a potential deadlock in the slowpath 2) Another cpu which blocks on the rtmutex will boost the task which allegedly locked the rtmutex, but that cannot work because the hard/softirq context borrows the task context. Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex and replace it with an atomic variable. This is somewhat overzealous as *some* callsites could keep using a mutex (e.g. the sysfs-facing ones like crash_shrink_memory()), but this has the benefit of involving a single unified lock and preventing any future NMI-related surprises. Tested by triggering NMI panics via: $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi $ echo 1 > /proc/sys/kernel/unknown_nmi_panic $ echo 1 > /proc/sys/kernel/panic $ ipmitool power diag Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context") Signed-off-by: Valentin Schneider <vschneid@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Merge tag 'mm-nonmm-stable-2022-08-06-2' of ↵Linus Torvalds2022-08-071-3/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc updates from Andrew Morton: "Updates to various subsystems which I help look after. lib, ocfs2, fatfs, autofs, squashfs, procfs, etc. A relatively small amount of material this time" * tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits) scripts/gdb: ensure the absolute path is generated on initial source MAINTAINERS: kunit: add David Gow as a maintainer of KUnit mailmap: add linux.dev alias for Brendan Higgins mailmap: update Kirill's email profile: setup_profiling_timer() is moslty not implemented ocfs2: fix a typo in a comment ocfs2: use the bitmap API to simplify code ocfs2: remove some useless functions lib/mpi: fix typo 'the the' in comment proc: add some (hopefully) insightful comments bdi: remove enum wb_congested_state kernel/hung_task: fix address space of proc_dohung_task_timeout_secs lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t() squashfs: support reading fragments in readahead call squashfs: implement readahead squashfs: always build "file direct" version of page actor Revert "squashfs: provide backing_dev_info in order to disable read-ahead" fs/ocfs2: Fix spelling typo in comment ia64: old_rr4 added under CONFIG_HUGETLB_PAGE proc: fix test for "vsyscall=xonly" boot option ...
| * kexec_file: increase maximum file size to 4GPasha Tatashin2022-06-161-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some case initrd can be large. For example, it could be a netboot image loaded by u-root, that is kexec'ing into it. The maximum size of initrd is arbitrary set to 2G. Also, the limit is not very obvious because it is hidden behind a generic INT_MAX macro. Theoretically, we could make it LONG_MAX, but it is safer to keep it sane, and just increase it to 4G. Increase the size to 4G, and make it obvious by having a new macro that specifies the maximum file size supported by kexec_file_load() syscall: KEXEC_FILE_SIZE_MAX. Link: https://lkml.kernel.org/r/20220527025535.3953665-3-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Sasha Levin <sashal@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Thelen <gthelen@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | kexec, KEYS: make the code in bzImage64_verify_sig genericCoiby Xu2022-07-151-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 278311e417be ("kexec, KEYS: Make use of platform keyring for signature verify") adds platform keyring support on x86 kexec but not arm64. The code in bzImage64_verify_sig uses the keys on the .builtin_trusted_keys, .machine, if configured and enabled, .secondary_trusted_keys, also if configured, and .platform keyrings to verify the signed kernel image as PE file. Cc: kexec@lists.infradead.org Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Reviewed-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* | kexec: clean up arch_kexec_kernel_verify_sigCoiby Xu2022-07-151-20/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before commit 105e10e2cf1c ("kexec_file: drop weak attribute from functions"), there was already no arch-specific implementation of arch_kexec_kernel_verify_sig. With weak attribute dropped by that commit, arch_kexec_kernel_verify_sig is completely useless. So clean it up. Note later patches are dependent on this patch so it should be backported to the stable tree as well. Cc: stable@vger.kernel.org Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Reviewed-by: Michal Suchanek <msuchanek@suse.de> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> [zohar@linux.ibm.com: reworded patch description "Note"] Link: https://lore.kernel.org/linux-integrity/20220714134027.394370-1-coxu@redhat.com/ Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* | kexec_file: drop weak attribute from functionsNaveen N. Rao2022-07-151-33/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As requested (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org), this series converts weak functions in kexec to use the #ifdef approach. Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]") changelog: : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") : [1], binutils (v2.36+) started dropping section symbols that it thought : were unused. This isn't an issue in general, but with kexec_file.c, gcc : is placing kexec_arch_apply_relocations[_add] into a separate : .text.unlikely section and the section symbol ".text.unlikely" is being : dropped. Due to this, recordmcount is unable to find a non-weak symbol in : .text.unlikely to generate a relocation record against. This patch (of 2); Drop __weak attribute from functions in kexec_file.c: - arch_kexec_kernel_image_probe() - arch_kimage_file_post_load_cleanup() - arch_kexec_kernel_image_load() - arch_kexec_locate_mem_hole() - arch_kexec_kernel_verify_sig() arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so drop the static attribute for the latter. arch_kexec_kernel_verify_sig() is not overridden by any architecture, so drop the __weak attribute. Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* | ima: force signature verification when CONFIG_KEXEC_SIG is configuredCoiby Xu2022-07-131-1/+10
|/ | | | | | | | | | | | Currently, an unsigned kernel could be kexec'ed when IMA arch specific policy is configured unless lockdown is enabled. Enforce kernel signature verification check in the kexec_file_load syscall when IMA arch specific policy is configured. Fixes: 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE") Reported-and-suggested-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* Merge tag 'riscv-for-linus-5.19-mw0' of ↵Linus Torvalds2022-05-311-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the Svpbmt extension, which allows memory attributes to be encoded in pages - Support for the Allwinner D1's implementation of page-based memory attributes - Support for running rv32 binaries on rv64 systems, via the compat subsystem - Support for kexec_file() - Support for the new generic ticket-based spinlocks, which allows us to also move to qrwlock. These should have already gone in through the asm-geneic tree as well - A handful of cleanups and fixes, include some larger ones around atomics and XIP * tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits) RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add] riscv: compat: Using seperated vdso_maps for compat_vdso_info RISC-V: Fix the XIP build RISC-V: Split out the XIP fixups into their own file RISC-V: ignore xipImage RISC-V: Avoid empty create_*_mapping definitions riscv: Don't output a bogus mmu-type on a no MMU kernel riscv: atomic: Add custom conditional atomic operation implementation riscv: atomic: Optimize dec_if_positive functions riscv: atomic: Cleanup unnecessary definition RISC-V: Load purgatory in kexec_file RISC-V: Add purgatory RISC-V: Support for kexec_file on panic RISC-V: Add kexec_file support RISC-V: use memcpy for kexec_file mode kexec_file: Fix kexec_file.c build error for riscv platform riscv: compat: Add COMPAT Kbuild skeletal support riscv: compat: ptrace: Add compat_arch_ptrace implement riscv: compat: signal: Add rt_frame implementation riscv: add memory-type errata for T-Head ...
| * kexec_file: Fix kexec_file.c build error for riscv platformLiao Chang2022-05-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_KEXEC_FILE is set for riscv platform, the compilation of kernel/kexec_file.c generate build error: kernel/kexec_file.c: In function 'crash_prepare_elf64_headers': ./arch/riscv/include/asm/page.h:110:71: error: request for member 'virt_addr' in something not a structure or union 110 | ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < kernel_map.virt_addr)) | ^ ./arch/riscv/include/asm/page.h:131:2: note: in expansion of macro 'is_linear_mapping' 131 | is_linear_mapping(_x) ? \ | ^~~~~~~~~~~~~~~~~ ./arch/riscv/include/asm/page.h:140:31: note: in expansion of macro '__va_to_pa_nodebug' 140 | #define __phys_addr_symbol(x) __va_to_pa_nodebug(x) | ^~~~~~~~~~~~~~~~~~ ./arch/riscv/include/asm/page.h:143:24: note: in expansion of macro '__phys_addr_symbol' 143 | #define __pa_symbol(x) __phys_addr_symbol(RELOC_HIDE((unsigned long)(x), 0)) | ^~~~~~~~~~~~~~~~~~ kernel/kexec_file.c:1327:36: note: in expansion of macro '__pa_symbol' 1327 | phdr->p_offset = phdr->p_paddr = __pa_symbol(_text); This occurs is because the "kernel_map" referenced in macro is_linear_mapping() is suppose to be the one of struct kernel_mapping defined in arch/riscv/mm/init.c, but the 2nd argument of crash_prepare_elf64_header() has same symbol name, in expansion of macro is_linear_mapping in function crash_prepare_elf64_header(), "kernel_map" actually is the local variable. Signed-off-by: Liao Chang <liaochang1@huawei.com> Link: https://lore.kernel.org/r/20220408100914.150110-2-lizhengyu3@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* | kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]Naveen N. Rao2022-05-271-34/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [1], binutils (v2.36+) started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Address this by dropping the weak attribute from these functions. Instead, follow the existing pattern of having architectures #define the name of the function they want to override in their headers. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h] Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGEDDavid Hildenbrand2021-11-061-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED, indicating that we're dealing with a memory region that is never indicated in the firmware-provided memory map, but always detected and added by a driver. Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such memory regions like ordinary MEMBLOCK_NONE memory regions -- for example, when selecting memory regions to add to the vmcore for dumping in the crashkernel via for_each_mem_range(). However, especially kexec_file is not supposed to select such memblocks via for_each_free_mem_range() / for_each_free_mem_range_reverse() to place kexec images, similar to how we handle IORESOURCE_SYSRAM_DRIVER_MANAGED without CONFIG_ARCH_KEEP_MEMBLOCK. We'll make sure that memory hotplug code sets the flag where applicable (IORESOURCE_SYSRAM_DRIVER_MANAGED) next. This prepares architectures that need CONFIG_ARCH_KEEP_MEMBLOCK, such as arm64, for virtio-mem support. Note that kexec *must not* indicate this memory to the second kernel and *must not* place kexec-images on this memory. Let's add a comment to kexec_walk_memblock(), documenting how we handle MEMBLOCK_DRIVER_MANAGED now just like using IORESOURCE_SYSRAM_DRIVER_MANAGED in locate_mem_hole_callback() for kexec_walk_resources(). Also note that MEMBLOCK_HOTPLUG cannot be reused due to different semantics: MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the firmware-provided memory map and added to the system early during boot; kexec *has to* indicate this memory to the second kernel and can place kexec-images on this memory. After memory hotunplug, kexec has to be re-armed. We mostly ignore this flag when "movable_node" is not set on the kernel command line, because then we're told to not care about hotunpluggability of such memory regions. MEMBLOCK_DRIVER_MANAGED: memory is not indicated as "System RAM" in the firmware-provided memory map; this memory is always detected and added to the system by a driver; memory might not actually be physically hotunpluggable. kexec *must not* indicate this memory to the second kernel and *must not* place kexec-images on this memory. Link: https://lkml.kernel.org/r/20211004093605.5830-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jianyong Wu <Jianyong.Wu@arm.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shahab Vahedi <shahab@synopsys.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel: kexec_file: fix error return code of kexec_calculate_store_digests()Jia-Ju Bai2021-05-071-1/+3
| | | | | | | | | | | | | | When vzalloc() returns NULL to sha_regions, no error return code of kexec_calculate_store_digests() is assigned. To fix this bug, ret is assigned with -ENOMEM in this case. Link: https://lkml.kernel.org/r/20210309083904.24321-1-baijiaju1990@gmail.com Fixes: a43cac0d9dc2 ("kexec: split kexec_file syscall code to kexec_file.c") Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ima: Free IMA measurement buffer after kexec syscallLakshmi Ramasubramanian2021-02-101-0/+5
| | | | | | | | | | | | | | | | | | | IMA allocates kernel virtual memory to carry forward the measurement list, from the current kernel to the next kernel on kexec system call, in ima_add_kexec_buffer() function. This buffer is not freed before completing the kexec system call resulting in memory leak. Add ima_buffer field in "struct kimage" to store the virtual address of the buffer allocated for the IMA measurement list. Free the memory allocated for the IMA measurement list in kimage_file_post_load_cleanup() function. Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list") Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* crypto: sha - split sha.h into sha1.h and sha2.hEric Biggers2020-11-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2, and <crypto/sha3.h> contains declarations for SHA-3. This organization is inconsistent, but more importantly SHA-1 is no longer considered to be cryptographically secure. So to the extent possible, SHA-1 shouldn't be grouped together with any of the other SHA versions, and usage of it should be phased out. Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and <crypto/sha2.h>, and make everyone explicitly specify whether they want the declarations for SHA-1, SHA-2, or both. This avoids making the SHA-1 declarations visible to files that don't want anything to do with SHA-1. It also prepares for potentially moving sha1.h into a new insecure/ or dangerous/ directory. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGEDDavid Hildenbrand2020-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is always set to 0 by hardware. This is far from beautiful (and confusing), and the bit only applies to SYSRAM. So let's move it out of the bus-specific (PnP) defined bits. We'll add another SYSRAM specific bit soon. If we ever need more bits for other purposes, we can steal some from "desc", or reshuffle/regroup what we have. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Grall <julien@xen.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <lenb@kernel.org> Cc: Leonardo Bras <leobras.c@gmail.com> Cc: Libor Pechacek <lpechacek@suse.cz> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Wei Liu <wei.liu@kernel.org> Link: https://lkml.kernel.org/r/20200911103459.10306-3-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/kernel_file_read: Add "offset" arg for partial readsKees Cook2020-10-051-2/+2
| | | | | | | | | | | | | | To perform partial reads, callers of kernel_read_file*() must have a non-NULL file_size argument and a preallocated buffer. The new "offset" argument can then be used to seek to specific locations in the file to fill the buffer to, at most, "buf_size" per call. Where possible, the LSM hooks can report whether a full file has been read or not so that the contents can be reasoned about. Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201002173828.2099543-14-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fs/kernel_read_file: Add file_size output argumentKees Cook2020-10-051-2/+2
| | | | | | | | | | | | | | In preparation for adding partial read support, add an optional output argument to kernel_read_file*() that reports the file size so callers can reason more easily about their reading progress. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Scott Branden <scott.branden@broadcom.com> Link: https://lore.kernel.org/r/20201002173828.2099543-8-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fs/kernel_read_file: Remove redundant size argumentKees Cook2020-10-051-7/+7
| | | | | | | | | | | | | | | In preparation for refactoring kernel_read_file*(), remove the redundant "size" argument which is not needed: it can be included in the return code, with callers adjusted. (VFS reads already cannot be larger than INT_MAX.) Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Scott Branden <scott.branden@broadcom.com> Link: https://lore.kernel.org/r/20201002173828.2099543-6-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fs/kernel_read_file: Split into separate include fileScott Branden2020-10-051-0/+1
| | | | | | | | | | | | | | | | | | Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h include file. That header gets pulled in just about everywhere and doesn't really need functions not related to the general fs interface. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/20200706232309.12010-2-scott.branden@broadcom.com Link: https://lore.kernel.org/r/20201002173828.2099543-4-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'x86-urgent-2020-08-15' of ↵Linus Torvalds2020-08-151-15/+26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes and small updates all around the place: - Fix mitigation state sysfs output - Fix an FPU xstate/sxave code assumption bug triggered by Architectural LBR support - Fix Lightning Mountain SoC TSC frequency enumeration bug - Fix kexec debug output - Fix kexec memory range assumption bug - Fix a boundary condition in the crash kernel code - Optimize porgatory.ro generation a bit - Enable ACRN guests to use X2APIC mode - Reduce a __text_poke() IRQs-off critical section for the benefit of PREEMPT_RT" * tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Acquire pte lock with interrupts enabled x86/bugs/multihit: Fix mitigation reporting when VMX is not in use x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs x86/purgatory: Don't generate debug info for purgatory.ro x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC kexec_file: Correctly output debugging information for the PT_LOAD ELF header kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges x86/crash: Correct the address boundary of function parameters x86/acrn: Remove redundant chars from ACRN signature x86/acrn: Allow ACRN guest to use X2APIC mode
| * kexec_file: Correctly output debugging information for the PT_LOAD ELF headerLianbo Jiang2020-08-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when we enable the debugging switch to debug kexec_file, we always get the following incorrect results: kexec_file: Crash PT_LOAD elf header. phdr=00000000c988639b vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=51 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=000000003cca69a0 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=52 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=00000000c584cb9f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=53 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=00000000cf85d57f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=54 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=00000000a4a8f847 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=55 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=00000000272ec49f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=56 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=00000000ea0b65de vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=57 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=000000001f5e490c vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=58 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=00000000dfe4109e vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=59 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=00000000480ed2b6 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=60 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=0000000080b65151 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=61 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=0000000024e31c5e vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=62 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=00000000332e0385 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=63 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=000000002754d5da vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=64 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=00000000783320dd vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=65 p_offset=0x0 kexec_file: Crash PT_LOAD elf header. phdr=0000000076fe5b64 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=66 p_offset=0x0 The reason is that kernel always prints the values of the next PT_LOAD instead of the current PT_LOAD. Change it to ensure that we can get the correct debugging information. [ mingo: Amended changelog, capitalized "ELF". ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Young <dyoung@redhat.com> Link: https://lore.kernel.org/r/20200804044933.1973-4-lijiang@redhat.com
| * kexec: Improve & fix crash_exclude_mem_range() to handle overlapping rangesLianbo Jiang2020-08-071-12/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The crash_exclude_mem_range() function can only handle one memory region a time. It will fail in the case in which the passed in area covers several memory regions. In this case, it will only exclude the first region, then return, but leave the later regions unsolved. E.g in a NEC system with two usable RAM regions inside the low 1M: ... BIOS-e820: [mem 0x0000000000000000-0x000000000003efff] usable BIOS-e820: [mem 0x000000000003f000-0x000000000003ffff] reserved BIOS-e820: [mem 0x0000000000040000-0x000000000009ffff] usable It will only exclude the memory region [0, 0x3efff], the memory region [0x40000, 0x9ffff] will still be added into /proc/vmcore, which may cause the following failure when dumping vmcore: ioremap on RAM at 0x0000000000040000 - 0x0000000000040fff WARNING: CPU: 0 PID: 665 at arch/x86/mm/ioremap.c:186 __ioremap_caller+0x2c7/0x2e0 ... RIP: 0010:__ioremap_caller+0x2c7/0x2e0 ... cp: error reading '/proc/vmcore': Cannot allocate memory kdump: saving vmcore failed In order to fix this bug, let's extend the crash_exclude_mem_range() to handle the overlapping ranges. [ mingo: Amended the changelog. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Young <dyoung@redhat.com> Link: https://lore.kernel.org/r/20200804044933.1973-3-lijiang@redhat.com
* | Merge tag 'powerpc-5.9-1' of ↵Linus Torvalds2020-08-071-2/+14
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
| * kexec_file: Allow archs to handle special regions while locating memory holeHari Bathini2020-07-291-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some architectures may have special memory regions, within the given memory range, which can't be used for the buffer in a kexec segment. Implement weak arch_kexec_locate_mem_hole() definition which arch code may override, to take care of special regions, while trying to locate a memory hole. Also, add the missing declarations for arch overridable functions and and drop the __weak descriptors in the declarations to avoid non-weak definitions from becoming weak. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602273603.575379.17665852963340380839.stgit@hbathini
* | Merge tag 'integrity-v5.9' of ↵Linus Torvalds2020-08-061-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "The nicest change is the IMA policy rule checking. The other changes include allowing the kexec boot cmdline line measure policy rules to be defined in terms of the inode associated with the kexec kernel image, making the IMA_APPRAISE_BOOTPARAM, which governs the IMA appraise mode (log, fix, enforce), a runtime decision based on the secure boot mode of the system, and including errno in the audit log" * tag 'integrity-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: integrity: remove redundant initialization of variable ret ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime ima: AppArmor satisfies the audit rule requirements ima: Rename internal filter rule functions ima: Support additional conditionals in the KEXEC_CMDLINE hook function ima: Use the common function to detect LSM conditionals in a rule ima: Move comprehensive rule validation checks out of the token parser ima: Use correct type for the args_p member of ima_rule_entry.lsm elements ima: Shallow copy the args_p member of ima_rule_entry.lsm elements ima: Fail rule parsing when appraise_flag=blacklist is unsupportable ima: Fail rule parsing when the KEY_CHECK hook is combined with an invalid cond ima: Fail rule parsing when the KEXEC_CMDLINE hook is combined with an invalid cond ima: Fail rule parsing when buffer hook functions have an invalid action ima: Free the entire rule if it fails to parse ima: Free the entire rule when deleting a list of rules ima: Have the LSM free its audit rule IMA: Add audit log for failure conditions integrity: Add errno field in audit message
| * | ima: Support additional conditionals in the KEXEC_CMDLINE hook functionTyler Hicks2020-07-201-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take the properties of the kexec kernel's inode and the current task ownership into consideration when matching a KEXEC_CMDLINE operation to the rules in the IMA policy. This allows for some uniformity when writing IMA policy rules for KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK, and KEXEC_CMDLINE operations. Prior to this patch, it was not possible to write a set of rules like this: dont_measure func=KEXEC_KERNEL_CHECK obj_type=foo_t dont_measure func=KEXEC_INITRAMFS_CHECK obj_type=foo_t dont_measure func=KEXEC_CMDLINE obj_type=foo_t measure func=KEXEC_KERNEL_CHECK measure func=KEXEC_INITRAMFS_CHECK measure func=KEXEC_CMDLINE The inode information associated with the kernel being loaded by a kexec_kernel_load(2) syscall can now be included in the decision to measure or not Additonally, the uid, euid, and subj_* conditionals can also now be used in KEXEC_CMDLINE rules. There was no technical reason as to why those conditionals weren't being considered previously other than ima_match_rules() didn't have a valid inode to use so it immediately bailed out for KEXEC_CMDLINE operations rather than going through the full list of conditional comparisons. Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: kexec@lists.infradead.org Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* / kexec: do not verify the signature without the lockdown or mandatory signatureLianbo Jiang2020-06-261-28/+6
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signature verification is an important security feature, to protect system from being attacked with a kernel of unknown origin. Kexec rebooting is a way to replace the running kernel, hence need be secured carefully. In the current code of handling signature verification of kexec kernel, the logic is very twisted. It mixes signature verification, IMA signature appraising and kexec lockdown. If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of signature, the supported crypto, and key, we don't think this is wrong, Unless kexec lockdown is executed. IMA is considered as another kind of signature appraising method. If kexec kernel image has signature/crypto/key, it has to go through the signature verification and pass. Otherwise it's seen as verification failure, and won't be loaded. Seems kexec kernel image with an unqualified signature is even worse than those w/o signature at all, this sounds very unreasonable. E.g. If people get a unsigned kernel to load, or a kernel signed with expired key, which one is more dangerous? So, here, let's simplify the logic to improve code readability. If the KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification is mandated. Otherwise, we lift the bar for any kernel image. Link: http://lkml.kernel.org/r/20200602045952.27487-1-lijiang@redhat.com Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Matthew Garrett <mjg59@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kexec_file: don't place kexec images on IORESOURCE_MEM_DRIVER_MANAGEDDavid Hildenbrand2020-06-041-0/+5
| | | | | | | | | | | | | | | | | | | Memory flagged with IORESOURCE_MEM_DRIVER_MANAGED is special - it won't be part of the initial memmap of the kexec kernel and not all memory might be accessible. Don't place any kexec images onto it. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Dan Williams <dan.j.williams@intel.com> Link: http://lkml.kernel.org/r/20200508084217.9160-4-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kexec: add machine_kexec_post_load()Pavel Tatashin2020-01-081-0/+4
| | | | | | | | | | | | It is the same as machine_kexec_prepare(), but is called after segments are loaded. This way, can do processing work with already loaded relocation segments. One such example is arm64: it has to have segments loaded in order to create a page table, but it cannot do it during kexec time, because at that time allocations won't be possible anymore. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Dave Young <dyoung@redhat.com> Signed-off-by: Will Deacon <will@kernel.org>
* kexec: Fix pointer-to-int-cast warningsHelge Deller2019-11-011-2/+2
| | | | | | | | | | | | | | | Fix two pointer-to-int-cast warnings when compiling for the 32-bit parisc platform: kernel/kexec_file.c: In function ‘crash_prepare_elf64_headers’: kernel/kexec_file.c:1307:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] phdr->p_vaddr = (Elf64_Addr)_text; ^ kernel/kexec_file.c:1324:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] phdr->p_vaddr = (unsigned long long) __va(mstart); ^ Signed-off-by: Helge Deller <deller@gmx.de>
* Merge branch 'next-lockdown' of ↵Linus Torvalds2019-09-281-9/+59
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull kernel lockdown mode from James Morris: "This is the latest iteration of the kernel lockdown patchset, from Matthew Garrett, David Howells and others. From the original description: This patchset introduces an optional kernel lockdown feature, intended to strengthen the boundary between UID 0 and the kernel. When enabled, various pieces of kernel functionality are restricted. Applications that rely on low-level access to either hardware or the kernel may cease working as a result - therefore this should not be enabled without appropriate evaluation beforehand. The majority of mainstream distributions have been carrying variants of this patchset for many years now, so there's value in providing a doesn't meet every distribution requirement, but gets us much closer to not requiring external patches. There are two major changes since this was last proposed for mainline: - Separating lockdown from EFI secure boot. Background discussion is covered here: https://lwn.net/Articles/751061/ - Implementation as an LSM, with a default stackable lockdown LSM module. This allows the lockdown feature to be policy-driven, rather than encoding an implicit policy within the mechanism. The new locked_down LSM hook is provided to allow LSMs to make a policy decision around whether kernel functionality that would allow tampering with or examining the runtime state of the kernel should be permitted. The included lockdown LSM provides an implementation with a simple policy intended for general purpose use. This policy provides a coarse level of granularity, controllable via the kernel command line: lockdown={integrity|confidentiality} Enable the kernel lockdown feature. If set to integrity, kernel features that allow userland to modify the running kernel are disabled. If set to confidentiality, kernel features that allow userland to extract confidential information from the kernel are also disabled. This may also be controlled via /sys/kernel/security/lockdown and overriden by kernel configuration. New or existing LSMs may implement finer-grained controls of the lockdown features. Refer to the lockdown_reason documentation in include/linux/security.h for details. The lockdown feature has had signficant design feedback and review across many subsystems. This code has been in linux-next for some weeks, with a few fixes applied along the way. Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode") is missing a Signed-off-by from its author. Matthew responded that he is providing this under category (c) of the DCO" * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits) kexec: Fix file verification on S390 security: constify some arrays in lockdown LSM lockdown: Print current->comm in restriction messages efi: Restrict efivar_ssdt_load when the kernel is locked down tracefs: Restrict tracefs when the kernel is locked down debugfs: Restrict debugfs when the kernel is locked down kexec: Allow kexec_file() with appropriate IMA policy when locked down lockdown: Lock down perf when in confidentiality mode bpf: Restrict bpf when kernel lockdown is in confidentiality mode lockdown: Lock down tracing and perf kprobes when in confidentiality mode lockdown: Lock down /proc/kcore x86/mmiotrace: Lock down the testmmiotrace module lockdown: Lock down module params that specify hardware parameters (eg. ioport) lockdown: Lock down TIOCSSERIAL lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down acpi: Disable ACPI table override if the kernel is locked down acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down ACPI: Limit access to custom_method when the kernel is locked down x86/msr: Restrict MSR access when the kernel is locked down x86: Lock down IO port access when the kernel is locked down ...
| * kexec: Allow kexec_file() with appropriate IMA policy when locked downMatthew Garrett2019-08-191-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Systems in lockdown mode should block the kexec of untrusted kernels. For x86 and ARM we can ensure that a kernel is trustworthy by validating a PE signature, but this isn't possible on other architectures. On those platforms we can use IMA digital signatures instead. Add a function to determine whether IMA has or will verify signatures for a given event type, and if so permit kexec_file() even if the kernel is otherwise locked down. This is restricted to cases where CONFIG_INTEGRITY_TRUSTED_KEYRING is set in order to prevent an attacker from loading additional keys at runtime. Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: linux-integrity@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>