summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kexec/file_load_64.c
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/crash: add crash memory hotplug supportSourabh Jain2024-04-231-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot remove, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <----> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the kexec tool. Passing this flag to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240326055413.186534-7-sourabhjain@linux.ibm.com
* powerpc/crash: add crash CPU hotplug supportSourabh Jain2024-04-231-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to CPU/Memory hotplug or online/offline events, the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward, CPU hotplug or online/offline events are referred as CPU/Memory add/remove events. The current solution to address the above issue involves monitoring the CPU/Memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove events, the generic crash hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, `arch_crash_handle_hotplug_event()`, to update the required kdump image components. This patch adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU add/remove events. Support for memory add/remove events is added in a subsequent patch with the title "powerpc: add crash memory hotplug support" As mentioned earlier, only the elfcorehdr and FDT kdump image components need to be updated in the event of CPU or memory add/remove events. However, on PowerPC architecture crash hotplug handler only updates the FDT to enable crash hotplug support for CPU add/remove events. Here's why. The elfcorehdr on PowerPC is built with possible CPUs, and thus, it does not need an update on CPU add/remove events. On the other hand, the FDT needs to be updated on CPU add events to include the newly added CPU. If the FDT is not updated and the kernel crashes on a newly added CPU, the kdump kernel will fail to boot due to the unavailability of the crashing CPU in the FDT. During the early boot, it is expected that the boot CPU must be a part of the FDT; otherwise, the kernel will raise a BUG and fail to boot. For more information, refer to commit 36ae37e3436b0 ("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is okay to have an offline CPU in the kdump FDT, no action is taken in case of CPU removal. There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Few changes have been made to ensure kernel can safely update the FDT of kdump image loaded using both system calls. For kexec_file_load syscall the kdump image is prepared in kernel. So to support an increasing number of CPUs, the FDT is constructed with extra buffer space to ensure it can accommodate a possible number of CPU nodes. Additionally, a call to fdt_pack (which trims the unused space once the FDT is prepared) is avoided if this feature is enabled. For the kexec_load syscall, the FDT is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by userspace (kexec tools). When userspace passes this flag to the kernel, it indicates that the FDT is built to accommodate possible CPUs, and the FDT segment is excluded from SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240326055413.186534-6-sourabhjain@linux.ibm.com
* powerpc/kexec: make the update_cpus_node() function publicSourabh Jain2024-04-231-87/+0
| | | | | | | | | | | | | | | | Move the update_cpus_node() from kexec/{file_load_64.c => core_64.c} to allow other kexec components to use it. Later in the series, this function is used for in-kernel updates to the kdump image during CPU/memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. No functional changes are intended. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240326055413.186534-5-sourabhjain@linux.ibm.com
* powerpc/kexec: move *_memory_ranges functions to ranges.cSourabh Jain2024-04-231-190/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the following functions form kexec/{file_load_64.c => ranges.c} and make them public so that components other than KEXEC_FILE can also use these functions. 1. get_exclude_memory_ranges 2. get_reserved_memory_ranges 3. get_crash_memory_ranges 4. get_usable_memory_ranges Later in the series get_crash_memory_ranges function is utilized for in-kernel updates to kdump image during CPU/Memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. Since the above functions are moved to ranges.c, some of the helper functions in ranges.c are no longer required to be public. Mark them as static and removed them from kexec_ranges.h header file. Finally, remove the CONFIG_KEXEC_FILE build dependency for range.c because it is required for other config, such as CONFIG_CRASH_DUMP. No functional changes are intended. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240326055413.186534-4-sourabhjain@linux.ibm.com
* powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMPHari Bathini2024-03-171-129/+140
| | | | | | | | | | | CONFIG_KEXEC_FILE does not have to select CONFIG_CRASH_DUMP. Move some code under CONFIG_CRASH_DUMP to support CONFIG_KEXEC_FILE and !CONFIG_CRASH_DUMP case. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240226103010.589537-3-hbathini@linux.ibm.com
* kexec_file, power: print out debugging message if requiredBaoquan He2023-12-201-9/+9
| | | | | | | | | | | | | | | Then when specifying '-d' for kexec_file_load interface, loaded locations of kernel/initrd/cmdline etc can be printed out to help debug. Here replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. Link: https://lkml.kernel.org/r/20231213055747.61826-7-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Conor Dooley <conor@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* powerpc: Annotate endianness of various variables and functionsBenjamin Gray2023-10-191-3/+3
| | | | | | | | | | | | | | | Sparse reports several endianness warnings on variables and functions that are consistently treated as big endian. There are no multi-endianness shenanigans going on here so fix these low hanging fruit up in one patch. All changes are just type annotations; no endianness switching operations are introduced by this patch. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231011053711.93427-7-bgray@linux.ibm.com
* powerpc/kexec_file: add missing of_node_putJulia Lawall2023-09-181-2/+6
| | | | | | | | | | | | | | for_each_node_with_property performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. This was done using the Coccinelle semantic patch iterators/for_each_child.cocci Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230907095521.14053-7-Julia.Lawall@inria.fr
* powerpc: Move DMA64_PROPNAME define to a headerMichal Suchanek2023-08-181-4/+1
| | | | | | | | | Avoid redefining the same value in multiple source. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230817162411.429-1-msuchanek@suse.de
* powerpc/kexec: fix minor typoLaurent Dufour2023-08-021-3/+3
| | | | | | | | | | | Function name in the descriptor was not correct. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307251721.bUGcsCeQ-lkp@intel.com/ Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230725132759.53975-1-ldufour@linux.ibm.com
* powerpc: Explicitly include correct DT includesRob Herring2023-08-021-1/+1
| | | | | | | | | | | | | | | | | The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Signed-off-by: Rob Herring <robh@kernel.org> [mpe: Fixup maple/setup.c which needs platform_device] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230724210247.778034-1-robh@kernel.org
* powerpc/kexec_file: print error string on usable memory property update failureSourabh Jain2023-02-151-1/+2
| | | | | | | | | | | | Print the FDT error description along with the error message if failed to set the "linux,drconf-usable-memory" property in the kdump kernel's FDT. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221216122708.182154-1-sourabhjain@linux.ibm.com
* powerpc/pseries: Pass PLPKS password on kexecRussell Currey2023-02-121-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | Before interacting with the PLPKS, we ask the hypervisor to generate a password for the current boot, which is then required for most further PLPKS operations. If we kexec into a new kernel, the new kernel will try and fail to generate a new password, as the password has already been set. Pass the password through to the new kernel via the device tree, in /chosen/ibm,plpks-pw. Check for the presence of this property before trying to generate a new password - if it exists, use the existing password and remove it from the device tree. This only works with the kexec_file_load() syscall, not the older kexec_load() syscall, however if you're using Secure Boot then you want to be using kexec_file_load() anyway. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-24-ajd@linux.ibm.com
* powerpc/kexec_file: fix implicit decl errorRandy Dunlap2023-02-061-0/+1
| | | | | | | | | | | | | | | | | | | kexec (PPC64) code calls memory_hotplug_max(). Add the header declaration for it from <asm/mmzone.h>. Using <linux/mmzone.h> does not work since the #include for <asm/mmzone.h> depends on CONFIG_NUMA=y, which is not always set. Fixes this build error/warning: arch/powerpc/kexec/file_load_64.c: In function 'kexec_extra_fdt_size_ppc64': arch/powerpc/kexec/file_load_64.c:993:33: error: implicit declaration of function 'memory_hotplug_max' 993 | usm_entries = ((memory_hotplug_max() / drmem_lmb_size()) + | ^~~~~~~~~~~~~~~~~~ Fixes: fc546faa5595 ("powerpc/kexec_file: Count hot-pluggable memory in FDT estimate") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230204172206.7662-1-rdunlap@infradead.org
* powerpc/kexec_file: Count hot-pluggable memory in FDT estimateSourabh Jain2023-02-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | On Systems where online memory is lesser compared to max memory, the kexec_file_load system call may fail to load the kdump kernel with the below errors: "Failed to update fdt with linux,drconf-usable-memory property" "Error setting up usable-memory property for kdump kernel" This happens because the size estimation for usable memory properties for the kdump kernel's FDT is based on the online memory whereas the usable memory properties include max memory. In short, the hot-pluggable memory is not accounted for while estimating the size of the usable memory properties. The issue is addressed by calculating usable memory property size using max hotplug address instead of the last online memory address. Fixes: 2377c92e37fe ("powerpc/kexec_file: fix FDT size estimation for kdump kernel") Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230131030615.729894-1-sourabhjain@linux.ibm.com
* powerpc/kexec_file: Fix division by zero in extra size estimationMichael Ellerman2023-01-311-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In kexec_extra_fdt_size_ppc64() there's logic to estimate how much extra space will be needed in the device tree for some memory related properties. That logic uses the size of RAM divided by drmem_lmb_size() to do the estimation. However drmem_lmb_size() can be zero if the machine has no hotpluggable memory configured, which is the case when booting with qemu and no maxmem=x parameter is passed (the default). The division by zero is reported by UBSAN, and can also lead to an overflow and a warning from kvmalloc, and kdump kernel loading fails: WARNING: CPU: 0 PID: 133 at mm/util.c:596 kvmalloc_node+0x15c/0x160 Modules linked in: CPU: 0 PID: 133 Comm: kexec Not tainted 6.2.0-rc5-03455-g07358bd97810 #223 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-dd0dca pSeries NIP: c00000000041ff4c LR: c00000000041fe58 CTR: 0000000000000000 REGS: c0000000096ef750 TRAP: 0700 Not tainted (6.2.0-rc5-03455-g07358bd97810) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24248242 XER: 2004011e CFAR: c00000000041fed0 IRQMASK: 0 ... NIP kvmalloc_node+0x15c/0x160 LR kvmalloc_node+0x68/0x160 Call Trace: kvmalloc_node+0x68/0x160 (unreliable) of_kexec_alloc_and_setup_fdt+0xb8/0x7d0 elf64_load+0x25c/0x4a0 kexec_image_load_default+0x58/0x80 sys_kexec_file_load+0x5c0/0x920 system_call_exception+0x128/0x330 system_call_vectored_common+0x15c/0x2ec To fix it, skip the calculation if drmem_lmb_size() is zero. Fixes: 2377c92e37fe ("powerpc/kexec_file: fix FDT size estimation for kdump kernel") Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230130014707.541110-1-mpe@ellerman.id.au
* Merge tag 'powerpc-6.2-1' of ↵Linus Torvalds2022-12-191-1/+58
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add powerpc qspinlock implementation optimised for large system scalability and paravirt. See the merge message for more details - Enable objtool to be built on powerpc to generate mcount locations - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is restricted to the patching CPU - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI - Sanitise user registers on interrupt entry on 64-bit Book3S - Many other small features and fixes Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, and Wolfram Sang. * tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits) powerpc/code-patching: Fix oops with DEBUG_VM enabled powerpc/qspinlock: Fix 32-bit build powerpc/prom: Fix 32-bit build powerpc/rtas: mandate RTAS syscall filtering powerpc/rtas: define pr_fmt and convert printk call sites powerpc/rtas: clean up includes powerpc/rtas: clean up rtas_error_log_max initialization powerpc/pseries/eeh: use correct API for error log size powerpc/rtas: avoid scheduling in rtas_os_term() powerpc/rtas: avoid device tree lookups in rtas_os_term() powerpc/rtasd: use correct OF API for event scan rate powerpc/rtas: document rtas_call() powerpc/pseries: unregister VPA when hot unplugging a CPU powerpc/pseries: reset the RCU watchdogs after a LPM powerpc: Take in account addition CPU node when building kexec FDT powerpc: export the CPU node count powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state powerpc/dts/fsl: Fix pca954x i2c-mux node names cxl: Remove unnecessary cxl_pci_window_alignment() selftests/powerpc: Fix resource leaks ...
| * powerpc: Take in account addition CPU node when building kexec FDTLaurent Dufour2022-12-071-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a system with a large number of CPUs, the creation of the FDT for a kexec kernel may fail because the allocated FDT is not large enough. When this happens, such a message is displayed on the console: Unable to add ibm,processor-vadd-size property: FDT_ERR_NOSPACE The property's name may change depending when the buffer overwrite is detected. Obviously the created FDT is missing information, and it is expected that system dump or kexec kernel failed to run properly. When the FDT is allocated, the size of the FDT the kernel received at boot time is used and an extra size can be applied. Currently, only memory added after boot time is taken in account, not the CPU nodes. The extra size should take in account these additional CPU nodes and compute the required extra space. To achieve that, the size of a CPU node, including its subnode is computed once and multiplied by the number of additional CPU nodes. The assumption is that the size of the CPU node is _same_ for all the node, the only variable part should be the name "PowerPC,POWERxx@##" where "##" may vary a little. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> [mpe: Don't shadow function name w/variable, minor coding style changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221110180619.15796-3-ldufour@linux.ibm.com
* | kexec: replace crash_mem_range with rangeLi Chen2022-11-181-1/+1
|/ | | | | | | | | | | | | | | | | | | | We already have struct range, so just use it. Link: https://lkml.kernel.org/r/20220929042936.22012-4-bhe@redhat.com Signed-off-by: Li Chen <lchen@ambarella.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Chen Lifu <chenlifu@huawei.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jianglei Nie <niejianglei2021@163.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Russell King <linux@armlinux.org.uk> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* powerpc/kexec: Fix build failure from uninitialised variableRussell Currey2022-08-101-5/+5
| | | | | | | | | | | | | clang 14 won't build because ret is uninitialised and can be returned if both prop and fdtprop are NULL. Drop the ret variable and return an error in that failure case. Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220810054331.373761-1-ruscur@russell.cc
* powerpc/64e: Fix kexec build errorMichael Ellerman2022-08-031-0/+1
| | | | | | | | | | | | | | | When building ppc64_book3e_allmodconfig the build fails with: arch/powerpc/kexec/file_load_64.c:1063:14: error: implicit declaration of function ‘firmware_has_feature’ 1063 | if (!firmware_has_feature(FW_FEATURE_LPAR)) | ^~~~~~~~~~~~~~~~~~~~ Add a direct include of asm/firmware.h to fix the error. Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220803063152.1249270-1-mpe@ellerman.id.au
* pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-windowAlexey Kardashevskiy2022-07-281-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pseries platform uses 32bit default DMA window (always 4K pages) and optional 64bit DMA window available via DDW ("Dynamic DMA Windows"), 64K or 2M pages. For ages the default one was not removed and a huge window was created in addition. Things changed with SRIOV-enabled PowerVM which creates a default-and-bigger DMA window in 64bit space (still using 4K pages) for IOV VFs so certain OSes do not need to use the DDW API in order to utilize all available TCE budget. Linux on the other hand removes the default window and creates a bigger one (with more TCEs or/and a bigger page size - 64K/2M) in a bid to map the entire RAM, and if the new window size is smaller than that - it still uses this new bigger window. The result is that the default window is removed but the "ibm,dma-window" property is not. When kdump is invoked, the existing code tries reusing the existing 64bit DMA window which location and parameters are stored in the device tree but this fails as the new property does not make it to the kdump device tree blob. So the code falls back to the default window which does not exist anymore although the device tree says that it does. The result of that is that PCI devices become unusable and cannot be used for kdumping. This preserves the DMA64 and DIRECT64 properties in the device tree blob for the crash kernel. Since the crash kernel setup is done after device drivers are loaded and probed, the proper DMA config is stored at least for boot time devices. Because DDW window is optional and the code configures the default window first, the existing code creates an IOMMU table descriptor for the non-existing default DMA window. It is harmless for kdump as it does not touch the actual window (only reads what is mapped and marks those IO pages as used) but it is bad for kexec which clears it thinking it is a smaller default window rather than a bigger DDW window. This removes the "ibm,dma-window" property from the device tree after a bigger window is created and the crash kernel setup picks it up. Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220629060614.1680476-1-aik@ozlabs.ru
* powerpc/kexec_file: Add of_node_put() before gotoWan Jiabing2021-10-221-0/+1
| | | | | | | | | | | | | | | Fix following coccicheck warning: ./arch/powerpc/kexec/file_load_64.c:698:1-22: WARNING: Function for_each_node_by_type should have of_node_put() before goto Early exits from for_each_node_by_type should decrement the node reference counter. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211018015418.10182-1-wanjiabing@vivo.com
* powerpc/kexec_file: Use current CPU info while setting up FDTSourabh Jain2021-05-041-0/+92
| | | | | | | | | | | | | | | | | | | | | | | kexec_file_load() uses initial_boot_params in setting up the device tree for the kernel to be loaded. Though initial_boot_params holds info about CPUs at the time of boot, it doesn't account for hot added CPUs. So, kexec'ing with kexec_file_load() syscall leaves the kexec'ed kernel with inaccurate CPU info. If kdump kernel is loaded with kexec_file_load() syscall and the system crashes on a hot added CPU, the capture kernel hangs failing to identify the boot CPU, with no output. To avoid this from happening, extract current CPU info from of_root device node and use it for setting up the fdt in kexec_file_load case. Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve map") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210429060256.199714-1-sourabhjain@linux.ibm.com
* powerpc/kexec_file: Restore FDT size estimation for kdump kernelThiago Jung Bauermann2021-03-111-18/+8
| | | | | | | | | | | | | | | | | | | | | | kexec_fdt_totalsize_ppc64() includes the base FDT size in its size calculation, but commit 3c985d31ad66 ("powerpc: Use common of_kexec_alloc_and_setup_fdt()") changed the kexec code to use the generic function of_kexec_alloc_and_setup_fdt() which already includes the base FDT size. That change made the code overestimate the size a bit by counting twice the space required for the kernel command line and /chosen properties. Therefore change kexec_fdt_totalsize_ppc64() to calculate just the extra space needed by the kdump kernel, and change the function name so that it better reflects what the function is now doing. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [robh: reword commit msg as no longer a fix from merging to branches] Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210220005204.1417200-1-bauerman@linux.ibm.com
* powerpc: Move arch independent ima kexec functions to drivers/of/kexec.cLakshmi Ramasubramanian2021-03-081-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | The functions defined in "arch/powerpc/kexec/ima.c" handle setting up and freeing the resources required to carry over the IMA measurement list from the current kernel to the next kernel across kexec system call. These functions do not have architecture specific code, but are currently limited to powerpc. Move remove_ima_buffer() and setup_ima_buffer() calls into of_kexec_alloc_and_setup_fdt() defined in "drivers/of/kexec.c". Move the remaining architecture independent functions from "arch/powerpc/kexec/ima.c" to "drivers/of/kexec.c". Delete "arch/powerpc/kexec/ima.c" and "arch/powerpc/include/asm/ima.h". Remove references to the deleted files and functions in powerpc and in ima. Co-developed-by: Prakhar Srivastava <prsriva@linux.microsoft.com> Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Tested-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210221174930.27324-11-nramas@linux.microsoft.com
* powerpc: Use common of_kexec_alloc_and_setup_fdt()Rob Herring2021-03-081-0/+3
| | | | | | | | | | | | | | | The code for setting up the /chosen node in the device tree and updating the memory reservation for the next kernel has been moved to of_kexec_alloc_and_setup_fdt() defined in "drivers/of/kexec.c". Use the common of_kexec_alloc_and_setup_fdt() to setup the device tree and update the memory reservation for kexec for powerpc. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210221174930.27324-8-nramas@linux.microsoft.com
* powerpc: Use ELF fields defined in 'struct kimage'Lakshmi Ramasubramanian2021-03-081-7/+7
| | | | | | | | | | | | | | | ELF related fields elf_headers, elf_headers_sz, and elfcorehdr_addr have been moved from 'struct kimage_arch' to 'struct kimage' as elf_headers, elf_headers_sz, and elf_load_addr respectively. Use the ELF fields defined in 'struct kimage'. Suggested-by: Rob Herring <robh@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210221174930.27324-4-nramas@linux.microsoft.com
* powerpc/kexec_file: fix FDT size estimation for kdump kernelHari Bathini2021-02-111-0/+35
| | | | | | | | | | | | | | | | | | | On systems with large amount of memory, loading kdump kernel through kexec_file_load syscall may fail with the below error: "Failed to update fdt with linux,drconf-usable-memory property" This happens because the size estimation for kdump kernel's FDT does not account for the additional space needed to setup usable memory properties. Fix it by accounting for the space needed to include linux,usable-memory & linux,drconf-usable-memory properties while estimating kdump kernel's FDT size. Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve map") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/161243826811.119001.14083048209224609814.stgit@hbathini
* arch, drivers: replace for_each_membock() with for_each_mem_range()Mike Rapoport2020-10-131-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several occurrences of the following pattern: for_each_memblock(memory, reg) { start = __pfn_to_phys(memblock_region_memory_base_pfn(reg); end = __pfn_to_phys(memblock_region_memory_end_pfn(reg)); /* do something with start and end */ } Using for_each_mem_range() iterator is more appropriate in such cases and allows simpler and cleaner code. [akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build] [rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring] Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memblock: reduce number of parameters in for_each_mem_range()Mike Rapoport2020-10-131-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently for_each_mem_range() and for_each_mem_range_rev() iterators are the most generic way to traverse memblock regions. As such, they have 8 parameters and they are hardly convenient to users. Most users choose to utilize one of their wrappers and the only user that actually needs most of the parameters is memblock itself. To avoid yet another naming for memblock iterators, rename the existing for_each_mem_range[_rev]() to __for_each_mem_range[_rev]() and add a new for_each_mem_range[_rev]() wrappers with only index, start and end parameters. The new wrapper nicely fits into init_unavailable_mem() and will be used in upcoming changes to simplify memblock traversals. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> [MIPS] Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-11-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* powerpc/kexec_file: Enable early kernel OPAL callsHari Bathini2020-07-291-0/+20
| | | | | | | | | | | | Kernels built with CONFIG_PPC_EARLY_DEBUG_OPAL enabled expects r8 & r9 to be filled with OPAL base & entry addresses respectively. Setting these registers allows the kernel to perform OPAL calls before the device tree is parsed. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602303975.575379.5032301944162937479.stgit@hbathini
* powerpc/kexec_file: Fix kexec load failure with lack of memory holeHari Bathini2020-07-291-19/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | The kexec purgatory has to run in real mode. Only the first memory block maybe accessible in real mode. And, unlike the case with panic kernel, no memory is set aside for regular kexec load. Another thing to note is, the memory for crashkernel is reserved at an offset of 128MB. So, when crashkernel memory is reserved, the memory ranges to load kexec segments shrink further as the generic code only looks for memblock free memory ranges and in all likelihood only a tiny bit of memory from 0 to 128MB would be available to load kexec segments. With kdump being used by default in general, kexec file load is likely to fail almost always. This can be fixed by changing the memory hole lookup logic for regular kexec to use the same method as kdump. This would mean that most kexec segments will overlap with crashkernel memory region. That should still be ok as the pages, whose destination address isn't available while loading, are placed in an intermediate location till a flush to the actual destination address happens during kexec boot sequence. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602302326.575379.14038896654942043093.stgit@hbathini
* powerpc/kexec_file: Add appropriate regions for memory reserve mapHari Bathini2020-07-291-5/+53
| | | | | | | | | | | | | While initrd, elfcorehdr and backup regions are already added to the reserve map, there are a few missing regions that need to be added to the memory reserve map. Add them here. And now that all the changes to load panic kernel are in place, claim likewise. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602300473.575379.4218568032039284448.stgit@hbathini
* powerpc/kexec_file: Prepare elfcore header for crashing kernelHari Bathini2020-07-291-0/+165
| | | | | | | | | | | | | | | Prepare elf headers for the crashing kernel's core file using crash_prepare_elf64_headers() and pass on this info to kdump kernel by updating its command line with elfcorehdr parameter. Also, add elfcorehdr location to reserve map to avoid it from being stomped on while booting. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> [mpe: Ensure cmdline is nul terminated] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602298855.575379.15819225623219909517.stgit@hbathini
* powerpc/kexec_file: Setup backup region for kdump kernelHari Bathini2020-07-291-3/+90
| | | | | | | | | | | | | | Though kdump kernel boots from loaded address, the first 64KB of it is copied down to real 0. So, setup a backup region and let purgatory copy the first 64KB of crashed kernel into this backup region before booting into kdump kernel. Update reserve map with backup region and crashed kernel's memory to avoid kdump kernel from accidentially using that memory. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602294718.575379.16216507537038008623.stgit@hbathini
* powerpc/kexec_file: Restrict memory usage of kdump kernelHari Bathini2020-07-291-1/+385
| | | | | | | | | | | | | | | | | Kdump kernel, used for capturing the kernel core image, is supposed to use only specific memory regions to avoid corrupting the image to be captured. The regions are crashkernel range - the memory reserved explicitly for kdump kernel, memory used for the tce-table, the OPAL region and RTAS region as applicable. Restrict kdump kernel memory to use only these regions by setting up usable-memory DT property. Also, tell the kdump kernel to run at the loaded address by setting the magic word at 0x5c. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602284284.575379.6962016255404325493.stgit@hbathini
* powerpc/kexec_file: Avoid stomping memory used by special regionsHari Bathini2020-07-291-2/+335
| | | | | | | | | | | | | | crashkernel region could have an overlap with special memory regions like OPAL, RTAS, TCE table & such. These regions are referred to as excluded memory ranges. Setup these ranges during image probe in order to avoid them while finding the buffer for different kdump segments. Override arch_kexec_locate_mem_hole() to locate a memory hole taking these ranges into account. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602281047.575379.6636807148335160795.stgit@hbathini
* powerpc/kexec_file: Mark PPC64 specific codeHari Bathini2020-07-291-0/+87
Some of the kexec_file_load code isn't PPC64 specific. Move PPC64 specific code from kexec/file_load.c to kexec/file_load_64.c. Also, rename purgatory/trampoline.S to purgatory/trampoline_64.S in the same spirit. No functional changes. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602276920.575379.10390965946438306388.stgit@hbathini