linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'modules-6.6-rc1' of ↵	Linus Torvalds	2023-08-29	17	-70/+55
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules updates from Luis Chamberlain: "Summary of the changes worth highlighting from most interesting to boring below: - Christoph Hellwig's symbol_get() fix to Nvidia's efforts to circumvent the protection he put in place in year 2020 to prevent proprietary modules from using GPL only symbols, and also ensuring proprietary modules which export symbols grandfather their taint. That was done through year 2020 commit 262e6ae7081d ("modules: inherit TAINT_PROPRIETARY_MODULE"). Christoph's new fix is done by clarifing __symbol_get() was only ever intended to prevent module reference loops by Linux kernel modules and so making it only find symbols exported via EXPORT_SYMBOL_GPL(). The circumvention tactic used by Nvidia was to use symbol_get() to purposely swift through proprietary module symbols and completely bypass our traditional EXPORT_SYMBOL() annotations and community agreed upon restrictions. A small set of preamble patches fix up a few symbols which just needed adjusting for this on two modules, the rtc ds1685 and the networking enetc module. Two other modules just needed some build fixing and removal of use of __symbol_get() as they can't ever be modular, as was done by Arnd on the ARM pxa module and Christoph did on the mmc au1xmmc driver. This is a good reminder to us that symbol_get() is just a hack to address things which should be fixed through Kconfig at build time as was done in the later patches, and so ultimately it should just go. - Extremely late minor fix for old module layout 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()") by James Morse for arm64. Note that this layout thing is old, it is not* Song Liu's commit ac3b43283923 ("module: replace module_layout with module_memory"). The issue however is very odd to run into and so there was no hurry to get this in fast. - Although the fix did not go through the modules tree I'd like to highlight the fix by Peter Zijlstra in commit 54097309620e ("x86/static_call: Fix __static_call_fixup()") now merged in your tree which came out of what was originally suspected to be a fallout of the the newer module layout changes by Song Liu commit ac3b43283923 ("module: replace module_layout with module_memory") instead of module_init_section()"). Thanks to the report by Christian Bricart and the debugging by Song Liu & Peter that turned to be noted as a kernel regression in place since v5.19 through commit ee88d363d156 ("x86,static_call: Use alternative RET encoding"). I highlight this to reflect and clarify that we haven't seen more fallout from ac3b43283923 ("module: replace module_layout with module_memory"). - RISC-V toolchain got mapping symbol support which prefix symbols with "$" to help with alignment considerations for disassembly. This is used to differentiate between incompatible instruction encodings when disassembling. RISC-V just matches what ARM/AARCH64 did for alignment considerations and Palmer Dabbelt extended is_mapping_symbol() to accept these symbols for RISC-V. We already had support for this for all architectures but it also checked for the second character, the RISC-V check Dabbelt added was just for the "$". After a bit of testing and fallout on linux-next and based on feedback from Masahiro Yamada it was decided to simplify the check and treat the first char "$" as unique for all architectures, and so we no make is_mapping_symbol() for all archs if the symbol starts with "$". The most relevant commit for this for RISC-V on binutils was: https://sourceware.org/pipermail/binutils/2021-July/117350.html - A late fix by Andrea Righi (today) to make module zstd decompression use vmalloc() instead of kmalloc() to account for large compressed modules. I suspect we'll see similar things for other decompression algorithms soon. - samples/hw_breakpoint minor fixes by Rong Tao, Arnd Bergmann and Chen Jiahao" * tag 'modules-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module/decompress: use vmalloc() for zstd decompression workspace kallsyms: Add more debug output for selftest ARM: module: Use module_init_layout_section() to spot init sections arm64: module: Use module_init_layout_section() to spot init sections module: Expose module_init_layout_section() modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules rtc: ds1685: use EXPORT_SYMBOL_GPL for ds1685_rtc_poweroff net: enetc: use EXPORT_SYMBOL_GPL for enetc_phc_index mmc: au1xmmc: force non-modular build and remove symbol_get usage ARM: pxa: remove use of symbol_get() samples/hw_breakpoint: mark sample_hbp as static samples/hw_breakpoint: fix building without module unloading samples/hw_breakpoint: Fix kernel BUG 'invalid opcode: 0000' modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols kernel: params: Remove unnecessary ‘0’ values from err module: Ignore RISC-V mapping symbols too
\| *	module/decompress: use vmalloc() for zstd decompression workspace	Andrea Righi	2023-08-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using kmalloc() to allocate the decompression workspace for zstd may trigger the following warning when large modules are loaded (i.e., xfs): [ 2.961884] WARNING: CPU: 1 PID: 254 at mm/page_alloc.c:4453 __alloc_pages+0x2c3/0x350 ... [ 2.989033] Call Trace: [ 2.989841] <TASK> [ 2.990614] ? show_regs+0x6d/0x80 [ 2.991573] ? __warn+0x89/0x160 [ 2.992485] ? __alloc_pages+0x2c3/0x350 [ 2.993520] ? report_bug+0x17e/0x1b0 [ 2.994506] ? handle_bug+0x51/0xa0 [ 2.995474] ? exc_invalid_op+0x18/0x80 [ 2.996469] ? asm_exc_invalid_op+0x1b/0x20 [ 2.997530] ? module_zstd_decompress+0xdc/0x2a0 [ 2.998665] ? __alloc_pages+0x2c3/0x350 [ 2.999695] ? module_zstd_decompress+0xdc/0x2a0 [ 3.000821] __kmalloc_large_node+0x7a/0x150 [ 3.001920] __kmalloc+0xdb/0x170 [ 3.002824] module_zstd_decompress+0xdc/0x2a0 [ 3.003857] module_decompress+0x37/0xc0 [ 3.004688] init_module_from_file+0xd0/0x100 [ 3.005668] idempotent_init_module+0x11c/0x2b0 [ 3.006632] __x64_sys_finit_module+0x64/0xd0 [ 3.007568] do_syscall_64+0x59/0x90 [ 3.008373] ? ksys_read+0x73/0x100 [ 3.009395] ? exit_to_user_mode_prepare+0x30/0xb0 [ 3.010531] ? syscall_exit_to_user_mode+0x37/0x60 [ 3.011662] ? do_syscall_64+0x68/0x90 [ 3.012511] ? do_syscall_64+0x68/0x90 [ 3.013364] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 However, continuous physical memory does not seem to be required in module_zstd_decompress(), so use vmalloc() instead, to prevent the warning and avoid potential failures at loading compressed modules. Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	kallsyms: Add more debug output for selftest	Kees Cook	2023-08-24	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While debugging a recent kallsyms_selftest failure[1], I needed more details on what specifically was failing. This adds those details for each failure state that is checked. [1] https://lore.kernel.org/all/202308232200.1c932a90-oliver.sang@intel.com/ Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Yonghong Song <yhs@meta.com> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: kernel test robot <oliver.sang@intel.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	ARM: module: Use module_init_layout_section() to spot init sections	James Morse	2023-08-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Today module_frob_arch_sections() spots init sections from their 'init' prefix, and uses this to keep the init PLTs separate from the rest. get_module_plt() uses within_module_init() to determine if a location is in the init text or not, but this depends on whether core code thought this was an init section. Naturally the logic is different. module_init_layout_section() groups the init and exit text together if module unloading is disabled, as the exit code will never run. The result is kernels with this configuration can't load all their modules because there are not enough PLTs for the combined init+exit section. A previous patch exposed module_init_layout_section(), use that so the logic is the same. Fixes: 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()") Cc: stable@vger.kernel.org Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	arm64: module: Use module_init_layout_section() to spot init sections	James Morse	2023-08-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Today module_frob_arch_sections() spots init sections from their 'init' prefix, and uses this to keep the init PLTs separate from the rest. module_emit_plt_entry() uses within_module_init() to determine if a location is in the init text or not, but this depends on whether core code thought this was an init section. Naturally the logic is different. module_init_layout_section() groups the init and exit text together if module unloading is disabled, as the exit code will never run. The result is kernels with this configuration can't load all their modules because there are not enough PLTs for the combined init+exit section. This results in the following: \| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc \| Modules linked in: crct10dif_common \| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208 \| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 \| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) \| pc : module_emit_plt_entry+0x184/0x1cc \| lr : module_emit_plt_entry+0x94/0x1cc \| sp : ffffffc0803bba60 [...] \| Call trace: \| module_emit_plt_entry+0x184/0x1cc \| apply_relocate_add+0x2bc/0x8e4 \| load_module+0xe34/0x1bd4 \| init_module_from_file+0x84/0xc0 \| __arm64_sys_finit_module+0x1b8/0x27c \| invoke_syscall.constprop.0+0x5c/0x104 \| do_el0_svc+0x58/0x160 \| el0_svc+0x38/0x110 \| el0t_64_sync_handler+0xc0/0xc4 \| el0t_64_sync+0x190/0x194 A previous patch exposed module_init_layout_section(), use that so the logic is the same. Reported-by: Adam Johnston <adam.johnston@arm.com> Tested-by: Adam Johnston <adam.johnston@arm.com> Fixes: 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()") Cc: <stable@vger.kernel.org> # 5.15.x: 60a0aab7463ee69 arm64: module-plts: inline linux/moduleloader.h Cc: <stable@vger.kernel.org> # 5.15.x Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	module: Expose module_init_layout_section()	James Morse	2023-08-03	2	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	module_init_layout_section() choses whether the core module loader considers a section as init or not. This affects the placement of the exit section when module unloading is disabled. This code will never run, so it can be free()d once the module has been initialised. arm and arm64 need to count the number of PLTs they need before applying relocations based on the section name. The init PLTs are stored separately so they can be free()d. arm and arm64 both use within_module_init() to decide which list of PLTs to use when applying the relocation. Because within_module_init()'s behaviour changes when module unloading is disabled, both architecture would need to take this into account when counting the PLTs. Today neither architecture does this, meaning when module unloading is disabled there are insufficient PLTs in the init section to load some modules, resulting in warnings: \| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc \| Modules linked in: crct10dif_common \| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208 \| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 \| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) \| pc : module_emit_plt_entry+0x184/0x1cc \| lr : module_emit_plt_entry+0x94/0x1cc \| sp : ffffffc0803bba60 [...] \| Call trace: \| module_emit_plt_entry+0x184/0x1cc \| apply_relocate_add+0x2bc/0x8e4 \| load_module+0xe34/0x1bd4 \| init_module_from_file+0x84/0xc0 \| __arm64_sys_finit_module+0x1b8/0x27c \| invoke_syscall.constprop.0+0x5c/0x104 \| do_el0_svc+0x58/0x160 \| el0_svc+0x38/0x110 \| el0t_64_sync_handler+0xc0/0xc4 \| el0t_64_sync+0x190/0x194 Instead of duplicating module_init_layout_section()s logic, expose it. Reported-by: Adam Johnston <adam.johnston@arm.com> Fixes: 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()") Cc: stable@vger.kernel.org Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules	Christoph Hellwig	2023-08-02	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It has recently come to my attention that nvidia is circumventing the protection added in 262e6ae7081d ("modules: inherit TAINT_PROPRIETARY_MODULE") by importing exports from their proprietary modules into an allegedly GPL licensed module and then rexporting them. Given that symbol_get was only ever intended for tightly cooperating modules using very internal symbols it is logical to restrict it to being used on EXPORT_SYMBOL_GPL and prevent nvidia from costly DMCA Circumvention of Access Controls law suites. All symbols except for four used through symbol_get were already exported as EXPORT_SYMBOL_GPL, and the remaining four ones were switched over in the preparation patches. Fixes: 262e6ae7081d ("modules: inherit TAINT_PROPRIETARY_MODULE") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	rtc: ds1685: use EXPORT_SYMBOL_GPL for ds1685_rtc_poweroff	Christoph Hellwig	2023-08-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ds1685_rtc_poweroff is only used externally via symbol_get, which was only ever intended for very internal symbols like this one. Use EXPORT_SYMBOL_GPL for it so that symbol_get can enforce only being used on EXPORT_SYMBOL_GPL symbols. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Joshua Kinard <kumba@gentoo.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	net: enetc: use EXPORT_SYMBOL_GPL for enetc_phc_index	Christoph Hellwig	2023-08-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enetc_phc_index is only used via symbol_get, which was only ever intended for very internal symbols like this one. Use EXPORT_SYMBOL_GPL for it so that symbol_get can enforce only being used on EXPORT_SYMBOL_GPL symbols. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	mmc: au1xmmc: force non-modular build and remove symbol_get usage	Christoph Hellwig	2023-08-02	4	-35/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	au1xmmc is split somewhat awkwardly into the main mmc subsystem driver, and callbacks in platform_data that sit under arch/mips/ and are always built in. The latter than call mmc_detect_change through symbol_get. Remove the use of symbol_get by requiring the driver to be built in. In the future the interrupt handlers for card insert/eject detection should probably be moved into the main driver, and which point it can be built modular again. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Manuel Lauss <manuel.lauss@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> [mcgrof: squashed in depends on MMC=y suggested by Arnd] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	ARM: pxa: remove use of symbol_get()	Arnd Bergmann	2023-08-02	2	-15/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The spitz board file uses the obscure symbol_get() function to optionally call a function from sharpsl_pm.c if that is built. However, the two files are always built together these days, and have been for a long time, so this can be changed to a normal function call. Link: https://lore.kernel.org/lkml/20230731162639.GA9441@lst.de/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	samples/hw_breakpoint: mark sample_hbp as static	Chen Jiahao	2023-07-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a sparse warning shown as below: samples/hw_breakpoint/data_breakpoint.c:24:19: warning: symbol 'sample_hbp' was not declared. Should it be static? Since 'sample_hbp' is only called within data_breakpoint.c, mark it as static to fix the warning. Fixes: 44ee63587dce ("percpu: Add __percpu sparse annotations to hw_breakpoint") Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	samples/hw_breakpoint: fix building without module unloading	Arnd Bergmann	2023-07-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__symbol_put() is really meant as an internal helper and is not available when module unloading is disabled, unlike the previously used symbol_put(): samples/hw_breakpoint/data_breakpoint.c: In function 'hw_break_module_exit': samples/hw_breakpoint/data_breakpoint.c:73:9: error: implicit declaration of function '__symbol_put'; did you mean '__symbol_get'? [-Werror=implicit-function-declaration] The hw_break_module_exit() function is not actually used when module unloading is disabled, but it still causes the build failure for an undefined identifier. Enclose this one call in an appropriate #ifdef to clarify what the requirement is. Leaving out the entire exit function would also work but feels less clar in this case. Fixes: 910e230d5f1bb ("samples/hw_breakpoint: Fix kernel BUG 'invalid opcode: 0000'") Fixes: d8a84d33a4954 ("samples/hw_breakpoint: drop use of kallsyms_lookup_name()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	samples/hw_breakpoint: Fix kernel BUG 'invalid opcode: 0000'	Rong Tao	2023-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Macro symbol_put() is defined as __symbol_put(__stringify(x)) ksym_name = "jiffies" symbol_put(ksym_name) will be resolved as __symbol_put("ksym_name") which is clearly wrong. So symbol_put must be replaced with __symbol_put. When we uninstall hw_breakpoint.ko (rmmod), a kernel bug occurs with the following error: [11381.854152] kernel BUG at kernel/module/main.c:779! [11381.854159] invalid opcode: 0000 [#2] PREEMPT SMP PTI [11381.854163] CPU: 8 PID: 59623 Comm: rmmod Tainted: G D OE 6.2.9-200.fc37.x86_64 #1 [11381.854167] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B360M-HDV, BIOS P3.20 10/23/2018 [11381.854169] RIP: 0010:__symbol_put+0xa2/0xb0 [11381.854175] Code: 00 e8 92 d2 f7 ff 65 8b 05 c3 2f e6 78 85 c0 74 1b 48 8b 44 24 30 65 48 2b 04 25 28 00 00 00 75 12 48 83 c4 38 c3 cc cc cc cc <0f> 0b 0f 1f 44 00 00 eb de e8 c0 df d8 00 90 90 90 90 90 90 90 90 [11381.854178] RSP: 0018:ffffad8ec6ae7dd0 EFLAGS: 00010246 [11381.854181] RAX: 0000000000000000 RBX: ffffffffc1fd1240 RCX: 000000000000000c [11381.854184] RDX: 000000000000006b RSI: ffffffffc02bf7c7 RDI: ffffffffc1fd001c [11381.854186] RBP: 000055a38b76e7c8 R08: ffffffff871ccfe0 R09: 0000000000000000 [11381.854188] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [11381.854190] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [11381.854192] FS: 00007fbf7c62c740(0000) GS:ffff8c5badc00000(0000) knlGS:0000000000000000 [11381.854195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11381.854197] CR2: 000055a38b7793f8 CR3: 0000000363e1e001 CR4: 00000000003726e0 [11381.854200] DR0: ffffffffb3407980 DR1: 0000000000000000 DR2: 0000000000000000 [11381.854202] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [11381.854204] Call Trace: [11381.854207] <TASK> [11381.854212] s_module_exit+0xc/0xff0 [symbol_getput] [11381.854219] __do_sys_delete_module.constprop.0+0x198/0x2f0 [11381.854225] do_syscall_64+0x58/0x80 [11381.854231] ? exit_to_user_mode_prepare+0x180/0x1f0 [11381.854237] ? syscall_exit_to_user_mode+0x17/0x40 [11381.854241] ? do_syscall_64+0x67/0x80 [11381.854245] ? syscall_exit_to_user_mode+0x17/0x40 [11381.854248] ? do_syscall_64+0x67/0x80 [11381.854252] ? exc_page_fault+0x70/0x170 [11381.854256] entry_SYSCALL_64_after_hwframe+0x72/0xdc Signed-off-by: Rong Tao <rongtao@cestc.cn> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols	Palmer Dabbelt	2023-07-24	3	-16/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trying to restrict the '$'-prefix change to RISC-V caused some fallout, so let's just treat all those symbols as special. Fixes: c05780ef3c190 ("module: Ignore RISC-V mapping symbols too") Link: https://lore.kernel.org/all/20230712015747.77263-1-wangkefeng.wang@huawei.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	kernel: params: Remove unnecessary ‘0’ values from err	Li zeming	2023-07-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	err is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li zeming <zeming@nfschina.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
\| *	module: Ignore RISC-V mapping symbols too	Palmer Dabbelt	2023-07-10	3	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RISC-V has an extended form of mapping symbols that we use to encode the ISA when it changes in the middle of an ELF. This trips up modpost as a build failure, I haven't yet verified it yet but I believe the kallsyms difference should result in stacks looking sane again. Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/all/9d9e2902-5489-4bf0-d9cb-556c8e5d71c2@infradead.org/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
* \|	Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵	Linus Torvalds	2023-08-29	205	-1392/+2853
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
\| * \|	document while_each_thread(), change first_tid() to use for_each_thread()	Oleg Nesterov	2023-08-24	2	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the comment to explain that while_each_thread(g,t) is not rcu-safe unless g is stable (e.g. current). Even if g is a group leader and thus can't exit before t, t or another sub-thread can exec and remove g from the thread_group list. The only lockless user of while_each_thread() is first_tid() and it is fine in that it can't loop forever, yet for_each_thread() looks better and I am going to change while_each_thread/next_thread. Link: https://lkml.kernel.org/r/20230823170806.GA11724@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	drivers/char/mem.c: shrink character device's devlist[] array	Alexey Dobriyan	2023-08-24	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge padding, shrinking "struct memdev" from 32 bytes to 24 bytes on 64-bit. Link: https://lkml.kernel.org/r/fe4d62ab-2427-4635-b9f4-467853fb63e3@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	x86/crash: optimize CPU changes	Eric DeVolder	2023-08-24	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	crash_prepare_elf64_headers() writes into the elfcorehdr an ELF PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs (ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr. The kimage->file_mode term covers kdump images loaded via the kexec_file_load() syscall. Since crash_prepare_elf64_headers() wrote the initial elfcorehdr, no update to the elfcorehdr is needed for CPU changes. The kimage->elfcorehdr_updated term covers kdump images loaded via the kexec_load() syscall. At least one memory or CPU change must occur to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr. Afterwards, no update to the elfcorehdr is needed for CPU changes. This code is intentionally NOT hoisted into crash_handle_hotplug_event() as it would prevent the arch-specific handler from running for CPU changes. This would break PPC, for example, which needs to update other information besides the elfcorehdr, on CPU changes. Link: https://lkml.kernel.org/r/20230814214446.6659-9-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()	Eric DeVolder	2023-08-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function crash_prepare_elf64_headers() generates the elfcorehdr which describes the CPUs and memory in the system for the crash kernel. In particular, it writes out ELF PT_NOTEs for memory regions and the CPUs in the system. With respect to the CPUs, the current implementation utilizes for_each_present_cpu() which means that as CPUs are added and removed, the elfcorehdr must again be updated to reflect the new set of CPUs. The reasoning behind the move to use for_each_possible_cpu(), is: - At kernel boot time, all percpu crash_notes are allocated for all possible CPUs; that is, crash_notes are not allocated dynamically when CPUs are plugged/unplugged. Thus the crash_notes for each possible CPU are always available. - The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU. Changing to for_each_possible_cpu() is valid as the crash_notes pointed to by each CPU PT_NOTE are present and always valid. Furthermore, examining a common crash processing path of: kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer elfcorehdr /proc/vmcore vmcore reveals how the ELF CPU PT_NOTEs are utilized: - Upon panic, each CPU is sent an IPI and shuts itself down, recording its state in its crash_notes. When all CPUs are shutdown, the crash kernel is launched with a pointer to the elfcorehdr. - The crash kernel via linux/fs/proc/vmcore.c does not examine or use the contents of the PT_NOTEs, it exposes them via /proc/vmcore. - The makedumpfile utility uses /proc/vmcore and reads the CPU PT_NOTEs to craft a nr_cpus variable, which is reported in a header but otherwise generally unused. Makedumpfile creates the vmcore. - The 'crash' dump analyzer does not appear to reference the CPU PT_NOTEs. Instead it looks-up the cpu_[possible\|present\|onlin]_mask symbols and directly examines those structure contents from vmcore memory. From that information it is able to determine which CPUs are present and online, and locate the corresponding crash_notes. Said differently, it appears that 'crash' analyzer does not rely on the ELF PT_NOTEs for CPUs; rather it obtains the information directly via kernel symbols and the memory within the vmcore. (There maybe other vmcore generating and analysis tools that do use these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most common solution.) This results in the benefit of having all CPUs described in the elfcorehdr, and therefore reducing the need to re-generate the elfcorehdr on CPU changes, at the small expense of an additional 56 bytes per PT_NOTE for not-present-but-possible CPUs. On systems where kexec_file_load() syscall is utilized, all the above is valid. On systems where kexec_load() syscall is utilized, there may be the need for the elfcorehdr to be regenerated once. The reason being that some archs only populate the 'present' CPUs from the /sys/devices/system/cpus entries, which the userspace 'kexec' utility uses to generate the userspace-supplied elfcorehdr. In this situation, one memory or CPU change will rewrite the elfcorehdr via the crash_prepare_elf64_headers() function and now all possible CPUs will be described, just as with kexec_file_load() syscall. Link: https://lkml.kernel.org/r/20230814214446.6659-8-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Suggested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	crash: hotplug support for kexec_load()	Eric DeVolder	2023-08-24	8	-6/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hotplug support for kexec_load() requires changes to the userspace kexec-tools and a little extra help from the kernel. Given a kdump capture kernel loaded via kexec_load(), and a subsequent hotplug event, the crash hotplug handler finds the elfcorehdr and rewrites it to reflect the hotplug change. That is the desired outcome, however, at kernel panic time, the purgatory integrity check fails (because the elfcorehdr changed), and the capture kernel does not boot and no vmcore is generated. Therefore, the userspace kexec-tools/kexec must indicate to the kernel that the elfcorehdr can be modified (because the kexec excluded the elfcorehdr from the digest, and sized the elfcorehdr memory buffer appropriately). To facilitate hotplug support with kexec_load(): - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is safe for the kernel to modify the kexec_load()'d elfcorehdr - the /sys/kernel/crash_elfcorehdr_size node communicates the preferred size of the elfcorehdr memory buffer - The sysfs crash_hotplug nodes (ie. /sys/devices/system/[cpu\|memory]/crash_hotplug) dynamically take into account kexec_file_load() vs kexec_load() and KEXEC_UPDATE_ELFCOREHDR. This is critical so that the udev rule processing of crash_hotplug is all that is needed to determine if the userspace unload-then-load of the kdump image is to be skipped, or not. The proposed udev rule change looks like: # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" The table below indicates the behavior of kexec_load()'d kdump image updates (with the new udev crash_hotplug rule in place): Kernel \|Kexec -------+-----+---- Old \|Old \|New \| a \| a -------+-----+---- New \| a \| b -------+-----+---- where kexec 'old' and 'new' delineate kexec-tools has the needed modifications for the crash hotplug feature, and kernel 'old' and 'new' delineate the kernel supports this crash hotplug feature. Behavior 'a' indicates the unload-then-reload of the entire kdump image. For the kexec 'old' column, the unload-then-reload occurs due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel (with 'new' kexec) does not present the crash_hotplug sysfs node, which leads to the unload-then-reload of the kdump image. Behavior 'b' indicates the desired optimized behavior of the kernel directly modifying the elfcorehdr and avoiding the unload-then-reload of the kdump image. If the udev rule is not updated with crash_hotplug node check, then no matter any combination of kernel or kexec is new or old, the kdump image continues to be unload-then-reload on hotplug changes. To fully support crash hotplug feature, there needs to be a rollout of kernel, kexec-tools and udev rule changes. However, the order of the rollout of these pieces does not matter; kexec_load()'d kdump images still function for hotplug as-is. Link: https://lkml.kernel.org/r/20230814214446.6659-7-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Suggested-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	x86/crash: add x86 crash hotplug support	Eric DeVolder	2023-08-24	3	-7/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When CPU or memory is hot un/plugged, or off/onlined, the crash elfcorehdr, which describes the CPUs and memory in the system, must also be updated. A new elfcorehdr is generated from the available CPUs and memory and replaces the existing elfcorehdr. The segment containing the elfcorehdr is identified at run-time in crash_core:crash_handle_hotplug_event(). No modifications to purgatory (see 'kexec: exclude elfcorehdr from the segment digest') or boot_params (as the elfcorehdr= capture kernel command line parameter pointer remains unchanged and correct) are needed, just elfcorehdr. For kexec_file_load(), the elfcorehdr segment size is based on NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a growing number of CPU and memory resources. For kexec_load(), the userspace kexec utility needs to size the elfcorehdr segment in the same/similar manner. To accommodate kexec_load() syscall in the absence of kexec_file_load() syscall support, prepare_elf_headers() and dependents are moved outside of CONFIG_KEXEC_FILE. [eric.devolder@oracle.com: correct unused function build error] Link: https://lkml.kernel.org/r/20230821182644.2143-1-eric.devolder@oracle.com Link: https://lkml.kernel.org/r/20230814214446.6659-6-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	crash: memory and CPU hotplug sysfs attributes	Eric DeVolder	2023-08-24	7	-0/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce the crash_hotplug attribute for memory and CPUs for use by userspace. These attributes directly facilitate the udev rule for managing userspace re-loading of the crash kernel upon hot un/plug changes. For memory, expose the crash_hotplug attribute to the /sys/devices/system/memory directory. For example: # udevadm info --attribute-walk /sys/devices/system/memory/memory81 looking at device '/devices/system/memory/memory81': KERNEL=="memory81" SUBSYSTEM=="memory" DRIVER=="" ATTR{online}=="1" ATTR{phys_device}=="0" ATTR{phys_index}=="00000051" ATTR{removable}=="1" ATTR{state}=="online" ATTR{valid_zones}=="Movable" looking at parent device '/devices/system/memory': KERNELS=="memory" SUBSYSTEMS=="" DRIVERS=="" ATTRS{auto_online_blocks}=="offline" ATTRS{block_size_bytes}=="8000000" ATTRS{crash_hotplug}=="1" For CPUs, expose the crash_hotplug attribute to the /sys/devices/system/cpu directory. For example: # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0 looking at device '/devices/system/cpu/cpu0': KERNEL=="cpu0" SUBSYSTEM=="cpu" DRIVER=="processor" ATTR{crash_notes}=="277c38600" ATTR{crash_notes_size}=="368" ATTR{online}=="1" looking at parent device '/devices/system/cpu': KERNELS=="cpu" SUBSYSTEMS=="" DRIVERS=="" ATTRS{crash_hotplug}=="1" ATTRS{isolated}=="" ATTRS{kernel_max}=="8191" ATTRS{nohz_full}==" (null)" ATTRS{offline}=="4-7" ATTRS{online}=="0-3" ATTRS{possible}=="0-7" ATTRS{present}=="0-3" With these sysfs attributes in place, it is possible to efficiently instruct the udev rule to skip crash kernel reloading for kernels configured with crash hotplug support. For example, the following is the proposed udev rule change for RHEL system 98-kexec.rules (as the first lines of the rule file): # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" When examined in the context of 98-kexec.rules, the above rules test if crash_hotplug is set, and if so, the userspace initiated unload-then-reload of the crash kernel is skipped. CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options. If an architecture supports, for example, memory hotplug but not CPU hotplug, then the /sys/devices/system/memory/crash_hotplug attribute file is present, but the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be present. Thus the udev rule skips userspace processing of memory hot un/plug events, but the udev rule will evaluate false for CPU events, thus allowing userspace to process CPU hot un/plug events (ie the unload-then-reload of the kdump capture kernel). Link: https://lkml.kernel.org/r/20230814214446.6659-5-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	kexec: exclude elfcorehdr from the segment digest	Eric DeVolder	2023-08-24	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a crash kernel is loaded via the kexec_file_load() syscall, the kernel places the various segments (ie crash kernel, crash initrd, boot_params, elfcorehdr, purgatory, etc) in memory. For those architectures that utilize purgatory, a hash digest of the segments is calculated for integrity checking. The digest is embedded into the purgatory image prior to placing in memory. Updates to the elfcorehdr in response to CPU and memory changes would cause the purgatory integrity checking to fail (at crash time, and no vmcore created). Therefore, the elfcorehdr segment is explicitly excluded from the purgatory digest, enabling updates to the elfcorehdr while also avoiding the need to recompute the hash digest and reload purgatory. Link: https://lkml.kernel.org/r/20230814214446.6659-4-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Suggested-by: Baoquan He <bhe@redhat.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	crash: add generic infrastructure for crash hotplug support	Eric DeVolder	2023-08-24	5	-0/+197
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To support crash hotplug, a mechanism is needed to update the crash elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/ onlining). The crash elfcorehdr describes the CPUs and memory to be written into the vmcore. To track CPU changes, callbacks are registered with the cpuhp mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The crash hotplug elfcorehdr update has no explicit ordering requirement (relative to other cpuhp states), so meets the criteria for utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic state and avoids the need to introduce a new state for crash hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE group, just prior to the STARTING group, which is very close to the CPU starting up in a plug/online situation, or stopping in a unplug/ offline situation. This minimizes the window of time during an actual plug/online or unplug/offline situation in which the elfcorehdr would be inaccurate. Note that for a CPU being unplugged or offlined, the CPU will still be present in the list of CPUs generated by crash_prepare_elf64_headers(). However, there is no need to explicitly omit the CPU, see justification in 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'. To track memory changes, a notifier is registered to capture the memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier(). The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event() which performs needed tasks and then dispatches the event to the architecture specific arch_crash_handle_hotplug_event() to update the elfcorehdr with the current state of CPUs and memory. During the process, the kexec_lock is held. Link: https://lkml.kernel.org/r/20230814214446.6659-3-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	crash: move a few code bits to setup support of crash hotplug	Eric DeVolder	2023-08-24	5	-233/+238
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28. Once the kdump service is loaded, if changes to CPUs or memory occur, either by hot un/plug or off/onlining, the crash elfcorehdr must also be updated. The elfcorehdr describes to kdump the CPUs and memory in the system, and any inaccuracies can result in a vmcore with missing CPU context or memory regions. The current solution utilizes udev to initiate an unload-then-reload of the kdump image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by the userspace kexec utility. In the original post I outlined the significant performance problems related to offloading this activity to userspace. This patchset introduces a generic crash handler that registers with the CPU and memory notifiers. Upon CPU or memory changes, from either hot un/plug or off/onlining, this generic handler is invoked and performs important housekeeping, for example obtaining the appropriate lock, and then invokes an architecture specific handler to do the appropriate elfcorehdr update. Note the description in patch 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that enables further optimizations related to CPU plug/unplug/online/offline performance of elfcorehdr updates. In the case of x86_64, the arch specific handler generates a new elfcorehdr, and overwrites the old one in memory; thus no involvement with userspace needed. To realize the benefits/test this patchset, one must make a couple of minor changes to userspace: - Prevent udev from updating kdump crash kernel on hot un/plug changes. Add the following as the first lines to the RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules: # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" With this changeset applied, the two rules evaluate to false for CPU and memory change events and thus skip the userspace unload-then-reload of kdump. - Change to the kexec_file_load for loading the kdump kernel: Eg. on RHEL: in /usr/bin/kdumpctl, change to: standard_kexec_args="-p -d -s" which adds the -s to select kexec_file_load() syscall. This kernel patchset also supports kexec_load() with a modified kexec userspace utility. A working changeset to the kexec userspace utility is posted to the kexec-tools mailing list here: http://lists.infradead.org/pipermail/kexec/2023-May/027049.html To use the kexec-tools patch, apply, build and install kexec-tools, then change the kdumpctl's standard_kexec_args to replace the -s with --hotplug. The removal of -s reverts to the kexec_load syscall and the addition of --hotplug invokes the changes put forth in the kexec-tools patch. This patch (of 8): The crash hotplug support leans on the work for the kexec_file_load() syscall. To also support the kexec_load() syscall, a few bits of code need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are moved out of kexec_file.c and into a common location crash_core.c. In addition, struct crash_mem and crash_notes were moved to new locales so that PROC_KCORE, which sets CRASH_CORE alone, builds correctly. No functionality change intended. Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	kstrtox: consistently use _tolower()	Andy Shevchenko	2023-08-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already use _tolower() in other places, so convert the one which open codes it. Link: https://lkml.kernel.org/r/20230817145919.543251-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	kill do_each_thread()	Oleg Nesterov	2023-08-21	4	-13/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eric has pointed out that we still have 3 users of do_each_thread(). Change them to use for_each_process_thread() and kill this helper. There is a subtle change, after do_each_thread/while_each_thread g == t == &init_task, while after for_each_process_thread() they both point to nowhere, but this doesn't matter. > Why is for_each_process_thread() better than do_each_thread()? Say, for_each_process_thread() is rcu safe, do_each_thread() is not. And certainly for_each_process_thread(p, t) { do_something(p, t); } looks better than do_each_thread(p, t) { do_something(p, t); } while_each_thread(p, t); And again, there are only 3 users of this awkward helper left. It should have been killed years ago and in fact I thought it had already been killed. It uses while_each_thread() which needs some changes. Link: https://lkml.kernel.org/r/20230817163708.GA8248@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: "Christian Brauner (Microsoft)" <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jiri Slaby <jirislaby@kernel.org> # tty/serial Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse	Ryusuke Konishi	2023-08-21	2	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A syzbot stress test using a corrupted disk image reported that mark_buffer_dirty() called from __nilfs_mark_inode_dirty() or nilfs_palloc_commit_alloc_entry() may output a kernel warning, and can panic if the kernel is booted with panic_on_warn. This is because nilfs2 keeps buffer pointers in local structures for some metadata and reuses them, but such buffers may be forcibly discarded by nilfs_clear_dirty_page() in some critical situations. This issue is reported to appear after commit 28a65b49eb53 ("nilfs2: do not write dirty data after degenerating to read-only"), but the issue has potentially existed before. Fix this issue by checking the uptodate flag when attempting to reuse an internally held buffer, and reloading the metadata instead of reusing the buffer if the flag was lost. Link: https://lkml.kernel.org/r/20230818131804.7758-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+cdfcae656bac88ba0e2d@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000003da75f05fdeffd12@google.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	scripts/bloat-o-meter: count weak symbol sizes	Geert Uytterhoeven	2023-08-21	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, bloat-o-meter does not take into account weak symbols, and thus ignores any size changes in code or data marked __weak. Fix this by handling weak code ("w"/"W") and data ("v"/"V"). Link: https://lkml.kernel.org/r/a1e7abd2571c3bbfe75345d6ee98b276d2d5c39d.1692200010.git.geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	treewide: drop CONFIG_EMBEDDED	Randy Dunlap	2023-08-21	78	-85/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is only one Kconfig user of CONFIG_EMBEDDED and it can be switched to EXPERT or "if !ARCH_MULTIPLATFORM" (suggested by Arnd). Link: https://lkml.kernel.org/r/20230816055010.31534-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [RISC-V] Acked-by: Greg Ungerer <gerg@linux-m68k.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Russell King <linux@armlinux.org.uk> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Brian Cain <bcain@quicinc.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	lockdep: fix static memory detection even more	Helge Deller	2023-08-21	2	-40/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On the parisc architecture, lockdep reports for all static objects which are in the __initdata section (e.g. "setup_done" in devtmpfs, "kthreadd_done" in init/main.c) this warning: INFO: trying to register non-static key. The warning itself is wrong, because those objects are in the __initdata section, but the section itself is on parisc outside of range from _stext to _end, which is why the static_obj() functions returns a wrong answer. While fixing this issue, I noticed that the whole existing check can be simplified a lot. Instead of checking against the _stext and _end symbols (which include code areas too) just check for the .data and .bss segments (since we check a data object). This can be done with the existing is_kernel_core_data() macro. In addition objects in the __initdata section can be checked with init_section_contains(), and is_kernel_rodata() allows keys to be in the _ro_after_init section. This partly reverts and simplifies commit bac59d18c701 ("x86/setup: Fix static memory detection"). Link: https://lkml.kernel.org/r/ZNqrLRaOi/3wPAdp@p100 Fixes: bac59d18c701 ("x86/setup: Fix static memory detection") Signed-off-by: Helge Deller <deller@gmx.de> Cc: Borislav Petkov <bp@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	lib/vsprintf: declare no_hash_pointers in sprintf.h	Andy Shevchenko	2023-08-21	3	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sparse is not happy to see non-static variable without declaration: lib/vsprintf.c:61:6: warning: symbol 'no_hash_pointers' was not declared. Should it be static? Declare respective variable in the sprintf.h. With this, add a comment to discourage its use if no real need. Link: https://lkml.kernel.org/r/20230814163344.17429-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Marco Elver <elver@google.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	lib/vsprintf: split out sprintf() and friends	Andy Shevchenko	2023-08-21	4	-29/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch series "lib/vsprintf: Rework header inclusions", v3. Some patches that reduce the mess with the header inclusions related to vsprintf.c module. Each patch has its own description, and has no dependencies to each other, except the collisions over modifications of the same places. Hence the series. This patch (of 2): kernel.h is being used as a dump for all kinds of stuff for a long time. sprintf() and friends are used in many drivers without need of the full kernel.h dependency train with it. Here is the attempt on cleaning it up by splitting out sprintf() and friends. Link: https://lkml.kernel.org/r/20230814163344.17429-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20230814163344.17429-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	kernel/fork: stop playing lockless games for exe_file replacement	Mateusz Guzik	2023-08-21	2	-15/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for exe_file serialization"). While the commit message does not explain why the change, I found the original submission [1] which ultimately claims it cleans things up by removing dependency of exe_file on the semaphore. However, fe69d560b5bd ("kernel/fork: always deny write access to current MM exe_file") added a semaphore up/down cycle to synchronize the state of exe_file against fork, defeating the point of the original change. This is on top of semaphore trips already present both in the replacing function and prctl (the only consumer). Normally replacing exe_file does not happen for busy processes, thus write-locking is not an impediment to performance in the intended use case. If someone keeps invoking the routine for a busy processes they are trying to play dirty and that's another reason to avoid any trickery. As such I think the atomic here only adds complexity for no benefit. Just write-lock around the replacement. I also note that replacement races against the mapping check loop as nothing synchronizes actual assignment with with said checks but I am not addressing it in this patch. (Is the loop of any use to begin with?) Link: https://lore.kernel.org/linux-mm/1424979417.10344.14.camel@stgolabs.net/ [1] Link: https://lkml.kernel.org/r/20230814172140.1777161-1-mjguzik@gmail.com Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Christian Brauner (Microsoft)" <brauner@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	adfs: delete unused "union adfs_dirtail" definition	Alexey Dobriyan	2023-08-21	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	union adfs_dirtail::new stands in the way if Linux++ project: "new" can't be used as member's name because it is a keyword in C++. Link: https://lkml.kernel.org/r/43b0a4c8-a7cf-4ab1-98f7-0f65c096f9e8@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	scripts/gdb/vmalloc: add vmallocinfo support	Kuan-Ying Lee	2023-08-21	3	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This GDB script shows the vmallocinfo for user to analyze the vmalloc memory usage. Example output: 0xffff800008000000-0xffff800008009000 36864 <start_kernel+372> pages=8 vmalloc 0xffff800008009000-0xffff80000800b000 8192 <gicv2m_init_one+400> phys=0x8020000 ioremap 0xffff80000800b000-0xffff80000800d000 8192 <bpf_prog_alloc_no_stats+72> pages=1 vmalloc 0xffff80000800d000-0xffff80000800f000 8192 <bpf_jit_alloc_exec+16> pages=1 vmalloc 0xffff800008010000-0xffff80000ad30000 47316992 <paging_init+452> phys=0x40210000 vmap 0xffff80000ad30000-0xffff80000c1c0000 21561344 <paging_init+556> phys=0x42f30000 vmap 0xffff80000c1c0000-0xffff80000c370000 1769472 <paging_init+592> phys=0x443c0000 vmap 0xffff80000c370000-0xffff80000de90000 28442624 <paging_init+692> phys=0x44570000 vmap 0xffff80000de90000-0xffff80000f4c1000 23269376 <paging_init+788> phys=0x46090000 vmap 0xffff80000f4c1000-0xffff80000f4c3000 8192 <gen_pool_add_owner+112> pages=1 vmalloc 0xffff80000f4c3000-0xffff80000f4c5000 8192 <gen_pool_add_owner+112> pages=1 vmalloc 0xffff80000f4c5000-0xffff80000f4c7000 8192 <gen_pool_add_owner+112> pages=1 vmalloc Link: https://lkml.kernel.org/r/20230808083020.22254-9-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	scripts/gdb/slab: add slab support	Kuan-Ying Lee	2023-08-21	3	-0/+340
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add 'lx-slabinfo' and 'lx-slabtrace' support. This GDB scripts print slabinfo and slabtrace for user to analyze slab memory usage. Example output like below: (gdb) lx-slabinfo Pointer \| name \| active_objs \| num_objs \| objsize \| objperslab \| pagesperslab ------------------ \| -------------------- \| ------------ \| ------------ \| -------- \| ----------- \| ------------- 0xffff0000c59df480 \| p9_req_t \| 0 \| 0 \| 280 \| 29 \| 2 0xffff0000c59df280 \| isp1760_qh \| 0 \| 0 \| 160 \| 25 \| 1 0xffff0000c59df080 \| isp1760_qtd \| 0 \| 0 \| 184 \| 22 \| 1 0xffff0000c59dee80 \| isp1760_urb_listite \| 0 \| 0 \| 136 \| 30 \| 1 0xffff0000c59dec80 \| asd_sas_event \| 0 \| 0 \| 256 \| 32 \| 2 0xffff0000c59dea80 \| sas_task \| 0 \| 0 \| 448 \| 36 \| 4 0xffff0000c59de880 \| bio-120 \| 18 \| 21 \| 384 \| 21 \| 2 0xffff0000c59de680 \| io_kiocb \| 0 \| 0 \| 448 \| 36 \| 4 0xffff0000c59de480 \| bfq_io_cq \| 0 \| 0 \| 1504 \| 21 \| 8 0xffff0000c59de280 \| bfq_queue \| 0 \| 0 \| 720 \| 22 \| 4 0xffff0000c59de080 \| mqueue_inode_cache \| 1 \| 28 \| 1152 \| 28 \| 8 0xffff0000c59dde80 \| v9fs_inode_cache \| 0 \| 0 \| 832 \| 39 \| 8 ... (gdb) lx-slabtrace --cache_name kmalloc-1k 63 <tty_register_device_attr+508> waste=16632/264 age=46856/46871/46888 pid=1 cpus=6, 0xffff800008720240 <__kmem_cache_alloc_node+236>: mov x22, x0 0xffff80000862a4fc <kmalloc_trace+64>: mov x21, x0 0xffff8000095d086c <tty_register_device_attr+508>: mov x19, x0 0xffff8000095d0f98 <tty_register_driver+704>: cmn x0, #0x1, lsl #12 0xffff80000c2677e8 <vty_init+620>: Cannot access memory at address 0xffff80000c2677e8 0xffff80000c265a10 <tty_init+276>: Cannot access memory at address 0xffff80000c265a10 0xffff80000c26d3c4 <chr_dev_init+204>: Cannot access memory at address 0xffff80000c26d3c4 0xffff8000080161d4 <do_one_initcall+176>: mov w21, w0 0xffff80000c1c1b58 <kernel_init_freeable+956>: Cannot access memory at address 0xffff80000c1c1b58 0xffff80000acf1334 <kernel_init+36>: bl 0xffff8000081ac040 <async_synchronize_full> 0xffff800008018d00 <ret_from_fork+16>: mrs x28, sp_el0 (gdb) lx-slabtrace --cache_name kmalloc-1k --free 428 <not-available> age=4294958600 pid=0 cpus=0, Link: https://lkml.kernel.org/r/20230808083020.22254-8-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	scripts/gdb/page_owner: add page owner support	Kuan-Ying Lee	2023-08-21	3	-0/+198
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This GDB script prints page owner information for user to analyze the memory usage or memory corruption issue. Example output from an aarch64 system: (gdb) lx-dump-page-owner --pfn 655360 page_owner tracks the page as allocated Page last allocated via order 0, gfp_mask: 0x8, pid: 1, tgid: 1 ("swapper/0\000\000\000\000\000\000"), ts 1295948880 ns, free_ts 1011852016 ns PFN: 655360, Flags: 0x3fffc0000000000 0xffff8000086ab964 <post_alloc_hook+452>: ldp x19, x20, [sp, #16] 0xffff80000862e4e0 <split_map_pages+344>: cbnz w22, 0xffff80000862e57c <split_map_pages+500> 0xffff8000086370c4 <isolate_freepages_range+556>: mov x0, x27 0xffff8000086bc1cc <alloc_contig_range+808>: mov x24, x0 0xffff80000877d6d8 <cma_alloc+772>: mov w1, w0 0xffff8000082c8d18 <dma_alloc_from_contiguous+104>: ldr x19, [sp, #16] 0xffff8000082ce0e8 <atomic_pool_expand+208>: mov x19, x0 0xffff80000c1e41b4 <__dma_atomic_pool_init+172>: Cannot access memory at address 0xffff80000c1e41b4 0xffff80000c1e4298 <dma_atomic_pool_init+92>: Cannot access memory at address 0xffff80000c1e4298 0xffff8000080161d4 <do_one_initcall+176>: mov w21, w0 0xffff80000c1c1b50 <kernel_init_freeable+952>: Cannot access memory at address 0xffff80000c1c1b50 0xffff80000acf87dc <kernel_init+36>: bl 0xffff8000081ab100 <async_synchronize_full> 0xffff800008018d00 <ret_from_fork+16>: mrs x28, sp_el0 page last free stack trace: 0xffff8000086a6e8c <free_unref_page_prepare+796>: mov w2, w23 0xffff8000086aee1c <free_unref_page+96>: tst w0, #0xff 0xffff8000086af3f8 <__free_pages+292>: ldp x19, x20, [sp, #16] 0xffff80000c1f3214 <init_cma_reserved_pageblock+220>: Cannot access memory at address 0xffff80000c1f3214 0xffff80000c20363c <cma_init_reserved_areas+1284>: Cannot access memory at address 0xffff80000c20363c 0xffff8000080161d4 <do_one_initcall+176>: mov w21, w0 0xffff80000c1c1b50 <kernel_init_freeable+952>: Cannot access memory at address 0xffff80000c1c1b50 0xffff80000acf87dc <kernel_init+36>: bl 0xffff8000081ab100 <async_synchronize_full> 0xffff800008018d00 <ret_from_fork+16>: mrs x28, sp_el0 Link: https://lkml.kernel.org/r/20230808083020.22254-7-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	scripts/gdb/stackdepot: add stackdepot support	Kuan-Ying Lee	2023-08-21	3	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for printing the backtrace of stackdepot handle. This is the preparation patch for dumping page_owner, slabtrace usage. Link: https://lkml.kernel.org/r/20230808083020.22254-6-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	scripts/gdb/aarch64: add aarch64 page operation helper commands and configs	Kuan-Ying Lee	2023-08-21	4	-204/+626
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Move page table debugging from mm.py to pgtable.py. 2. Add aarch64 kernel config and memory constants value. 3. Add below aarch64 page operation helper commands. page_to_pfn, page_to_phys, pfn_to_page, page_address, virt_to_phys, sym_to_pfn, pfn_to_kaddr, virt_to_page. 4. Only support CONFIG_SPARSEMEM_VMEMMAP=y now. Link: https://lkml.kernel.org/r/20230808083020.22254-5-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	scripts/gdb/utils: add common type usage	Kuan-Ying Lee	2023-08-21	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we often use 'unsigned long', 'size_t', 'usigned int' and 'struct page', we add these common types to utils. Link: https://lkml.kernel.org/r/20230808083020.22254-4-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	scripts/gdb/modules: add get module text support	Kuan-Ying Lee	2023-08-21	1	-1/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we get an text address from coredump and we cannot find this address in vmlinux, it might located in kernel module. We want to know which kernel module it located in. This GDB scripts can help us to find the target kernel module. (gdb) lx-getmod-by-textaddr 0xffff800002d305ac 0xffff800002d305ac is in kasan_test.ko Link: https://lkml.kernel.org/r/20230808083020.22254-3-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	scripts/gdb/symbols: add specific ko module load command	Kuan-Ying Lee	2023-08-21	1	-2/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch series "Add GDB memory helper commands", v2. I've created some GDB commands I think useful when I debug some memory issues and kernel module issue. For memory issue, we would like to get slabinfo, slabtrace, page_owner and vmallocinfo to debug the memory issues. For module issue, we would like to query kernel module name when we get a module text address and load module symbol by specific path. Patch 1-2: - Add kernel module related command. Patch 3-5: - Prepares for the memory-related command. Patch 6-8: - Add memory-related commands. This patch (of 8): Add lx-symbols <ko_path> command to support add specific ko module. Example output like below: (gdb) lx-symbols mm/kasan/kasan_test.ko loading @0xffff800002d30000: mm/kasan/kasan_test.ko Link: https://lkml.kernel.org/r/20230808083020.22254-1-Kuan-Ying.Lee@mediatek.com Link: https://lkml.kernel.org/r/20230808083020.22254-2-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	checkpatch: reword long-line warning about commit-msg	Jim Cromie	2023-08-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reword the warning to complain about line length 1st, since thats whats actually tested. Link: https://lkml.kernel.org/r/20230808033019.21911-3-jim.cromie@gmail.com Cc: apw@canonical.com Cc: joe@perches.com Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	checkpatch: special case extern struct in .c	Jim Cromie	2023-08-21	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"externs should be avoided in .c files" needs an exception for linker symbols, like those that mark the start, stop of many kernel sections. Since checkpatch already checks REALNAME to avoid looking at fragments changing vmlinux.lds.h, add a new else-if block to look at them instead. As a simple heuristic, treat all words (in the patch-line) as possible symbols, to screen later warnings. For my test case, the possible-symbols included BOUNDED_BY (a macro), which is extra, but not troublesome - these are just to screen WARNINGS that might be issued on later fragments (changing .c files) Where the WARN is issued, precede it with an else-if block to catch one common extern-in-c use case: "extern struct foo bar[]". Here we can at least issue a softer warning, after checking for a match with a maybe-linker-symbol parsed earlier from the patch. Though heuristic, it worked for my test-case, allowing both start__, stop__ $symbol's (wo the prefixes specifically named). I've coded it narrowly, it can be expanded later to cover any other expressions. It does require that the externs in .c's have the additions to vmlinux.lds.h in the same patch. And requires vmlinux.lds.h before .c fragments. Link: https://lkml.kernel.org/r/20230808033019.21911-2-jim.cromie@gmail.com Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	x86/kernel: increase kcov coverage under arch/x86/kernel folder	Pengfei Xu	2023-08-18	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently kcov instrument is disabled for object files under arch/x86/kernel folder. For object files under arch/x86/kernel, actually just disabling the kcov instrument of files:"head32.o or head64.o and sev.o" could achieve successful booting and provide kcov coverage for object files that do not disable kcov instrument. The additional kcov coverage collected from arch/x86/kernel folder helps kernel fuzzing efforts to find bugs. Link to related improvement discussion is below: https://groups.google.com/g/syzkaller/c/Dsl-RYGCqs8/m/x-tfpTyFBAAJ Related ticket is as follow: https://bugzilla.kernel.org/show_bug.cgi?id=198443 Link: https://lkml.kernel.org/r/06c0bb7b5f61e5884bf31180e8c122648c752010.1690771380.git.pengfei.xu@intel.com Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Pengfei Xu <pengfei.xu@intel.com> Cc: Aleksandr Nogikh <nogikh@google.com> Cc: <heng.su@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Kees Cook <keescook@google.com>, Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| * \|	fs: ocfs2: namei: check return value of ocfs2_add_entry()	Artem Chernyshev	2023-08-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Process result of ocfs2_add_entry() in case we have an error value. Found by Linux Verification Center (linuxtesting.org) with SVACE. Link: https://lkml.kernel.org/r/20230803145417.177649-1-artem.chernyshev@red-soft.ru Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Artem Chernyshev <artem.chernyshev@red-soft.ru> Cc: Joel Becker <jlbec@evilplan.org> Cc: Kurt Hackel <kurt.hackel@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>