summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'v6.5-rc1-sysctl-next' of ↵Linus Torvalds2023-06-2815-434/+351
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "The changes for sysctl are in line with prior efforts to stop usage of deprecated routines which incur recursion and also make it hard to remove the empty array element in each sysctl array declaration. The most difficult user to modify was parport which required a bit of re-thinking of how to declare shared sysctls there, Joel Granados has stepped up to the plate to do most of this work and eventual removal of register_sysctl_table(). That work ended up saving us about 1465 bytes according to bloat-o-meter. Since we gained a few bloat-o-meter karma points I moved two rather small sysctl arrays from kernel/sysctl.c leaving us only two more sysctl arrays to move left. Most changes have been tested on linux-next for about a month. The last straggler patches are a minor parport fix, changes to the sysctl kernel selftest so to verify correctness and prevent regressions for the future change he made to provide an alternative solution for the special sysctl mount point target which was using the now deprecated sysctl child element. This is all prep work to now finally be able to remove the empty array element in all sysctl declarations / registrations which is expected to save us a bit of bytes all over the kernel. That work will be tested early after v6.5-rc1 is out" * tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: replace child with an enumeration sysctl: Remove debugging dump_stack test_sysclt: Test for registering a mount point test_sysctl: Add an option to prevent test skip test_sysctl: Add an unregister sysctl test test_sysctl: Group node sysctl test under one func test_sysctl: Fix test metadata getters parport: plug a sysctl register leak sysctl: move security keys sysctl registration to its own file sysctl: move umh sysctl registration to its own file signal: move show_unhandled_signals sysctl to its own file sysctl: remove empty dev table sysctl: Remove register_sysctl_table sysctl: Refactor base paths registrations sysctl: stop exporting register_sysctl_table parport: Removed sysctl related defines parport: Remove register_sysctl_table from parport_default_proc_register parport: Remove register_sysctl_table from parport_device_proc_register parport: Remove register_sysctl_table from parport_proc_register parport: Move magic number "15" to a define
| * sysctl: replace child with an enumerationJoel Granados2023-06-183-66/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is part of the effort to remove the empty element at the end of ctl_table structs. "child" was a deprecated elem in this struct and was being used to differentiate between two types of ctl_tables: "normal" and "permanently emtpy". What changed?: * Replace "child" with an enumeration that will have two values: the default (0) and the permanently empty (1). The latter is left at zero so when struct ctl_table is created with kzalloc or in a local context, it will have the zero value by default. We document the new enum with kdoc. * Remove the "empty child" check from sysctl_check_table * Remove count_subheaders function as there is no longer a need to calculate how many headers there are for every child * Remove the recursive call to unregister_sysctl_table as there is no need to traverse down the child tree any longer * Add a new SYSCTL_PERM_EMPTY_DIR binary flag * Remove the last remanence of child from partport/procfs.c Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * sysctl: Remove debugging dump_stackJoel Granados2023-06-181-1/+0
| | | | | | | | | | | | | | Remove unneeded dump_stack in __register_sysctl_table Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * test_sysclt: Test for registering a mount pointJoel Granados2023-06-182-6/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | Test that target gets created by register_sysctl_mount_point and that no additional target can be created "on top" of a permanently empty sysctl table. Create a mount point target (mnt) in the sysctl test driver; try to create another on top of that (mnt_error). Output an error if "mnt_error" is present when we run the sysctl selftests. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * test_sysctl: Add an option to prevent test skipJoel Granados2023-06-181-22/+44
| | | | | | | | | | | | | | | | | | | | Tests were being skipped because the target was not present. Add a flag that controls whether to skip a test based on the presence of the target. Actually skip tests in the test_case function with a "return" instead of a "continue". Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * test_sysctl: Add an unregister sysctl testJoel Granados2023-06-182-0/+46
| | | | | | | | | | | | | | | | Add a test that checks that the unregistered directory is removed from /proc/sys/debug Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * test_sysctl: Group node sysctl test under one funcJoel Granados2023-06-181-3/+17
| | | | | | | | | | | | | | | | Preparation commit to add a new type of test to test_sysctl.c. We want to differentiate between node and (sub)directory tests. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * test_sysctl: Fix test metadata gettersJoel Granados2023-06-181-6/+13
| | | | | | | | | | | | | | | | | | | | | | The functions get_test_{count,enabled,target} use awk to get the N'th field in the ALL_TESTS variable. A variable with leading zeros (e.g. 0009) is misinterpreted as an entire line instead of the N'th field. Remove the leading zeros so this does not happen. We can now use the helper in tests 6, 7 and 8. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * parport: plug a sysctl register leakJoel Granados2023-06-181-10/+12
| | | | | | | | | | | | | | | | | | | | parport registers two sysctl directories in the parport_proc_register function but only one of them was getting unregistered in parport_proc_unregister. Keep track of both sysctl table headers and handle them together when (un)registering. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * sysctl: move security keys sysctl registration to its own fileLuis Chamberlain2023-06-083-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The security keys sysctls are already declared on its own file, just move the sysctl registration to its own file to help avoid merge conflicts on sysctls.c, and help with clearing up sysctl.c further. This creates a small penalty of 23 bytes: ./scripts/bloat-o-meter vmlinux.1 vmlinux.2 add/remove: 2/0 grow/shrink: 0/1 up/down: 49/-26 (23) Function old new delta init_security_keys_sysctls - 33 +33 __pfx_init_security_keys_sysctls - 16 +16 sysctl_init_bases 85 59 -26 Total: Before=21256937, After=21256960, chg +0.00% But soon we'll be saving tons of bytes anyway, as we modify the sysctl registrations to use ARRAY_SIZE and so we get rid of all the empty array elements so let's just clean this up now. Reviewed-by: Paul Moore <paul@paul-moore.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * sysctl: move umh sysctl registration to its own fileLuis Chamberlain2023-06-083-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the umh sysctl registration to its own file, the array is already there. We do this to remove the clutter out of kernel/sysctl.c to avoid merge conflicts. This also lets the sysctls not be built at all now when CONFIG_SYSCTL is not enabled. This has a small penalty of 23 bytes but soon we'll be removing all the empty entries on sysctl arrays so just do this cleanup now: ./scripts/bloat-o-meter vmlinux.base vmlinux.1 add/remove: 2/0 grow/shrink: 0/1 up/down: 49/-26 (23) Function old new delta init_umh_sysctls - 33 +33 __pfx_init_umh_sysctls - 16 +16 sysctl_init_bases 111 85 -26 Total: Before=21256914, After=21256937, chg +0.00% Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * signal: move show_unhandled_signals sysctl to its own fileLuis Chamberlain2023-05-302-14/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The show_unhandled_signals sysctl is the only sysctl for debug left on kernel/sysctl.c. We've been moving the syctls out from kernel/sysctl.c so to help avoid merge conflicts as the shared array gets out of hand. This change incurs simplifies sysctl registration by localizing it where it should go for a penalty in size of increasing the kernel by 23 bytes, we accept this given recent cleanups have actually already saved us 1465 bytes in the prior commits. ./scripts/bloat-o-meter vmlinux.3-remove-dev-table vmlinux.4-remove-debug-table add/remove: 3/1 grow/shrink: 0/1 up/down: 177/-154 (23) Function old new delta signal_debug_table - 128 +128 init_signal_sysctls - 33 +33 __pfx_init_signal_sysctls - 16 +16 sysctl_init_bases 85 59 -26 debug_table 128 - -128 Total: Before=21256967, After=21256990, chg +0.00% Reviewed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * sysctl: remove empty dev tableLuis Chamberlain2023-05-301-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that all the dev sysctls have been moved out we can remove the dev sysctl base directory. We don't need to create base directories, they are created for you as if using 'mkdir -p' with register_syctl() and register_sysctl_init(). For details refer to sysctl_mkdir_p() usage. We save 90 bytes with this changes: ./scripts/bloat-o-meter vmlinux.2.remove-sysctl-table vmlinux.3-remove-dev-table add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-90 (-90) Function old new delta sysctl_init_bases 111 85 -26 dev_table 64 - -64 Total: Before=21257057, After=21256967, chg -0.00% The empty dev table has been in place since the v2.5.0 days because back then ordering was essentialy. But later commit 7ec66d06362d ("sysctl: Stop requiring explicit management of sysctl directories"), merged as of v3.4-rc1, the entire ordering of directories was replaced by allowing sysctl directory autogeneration. This new mechanism introduced on v3.4 allows for sysctl directories to automatically be created for sysctl tables when they are needed and automatically removes them when no sysctl tables use them. That commit also added a dedicated struct ctl_dir as a new type for these autogenerated directories. Reviewed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * sysctl: Remove register_sysctl_tableJoel Granados2023-05-232-169/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. After removing all the calling functions, we remove both the register_sysctl_table function and the documentation check that appeared in check-sysctl-docs awk script. We save 595 bytes with this change: ./scripts/bloat-o-meter vmlinux.1.refactor-base-paths vmlinux.2.remove-sysctl-table add/remove: 2/8 grow/shrink: 1/0 up/down: 1154/-1749 (-595) Function old new delta count_subheaders - 983 +983 unregister_sysctl_table 29 184 +155 __pfx_count_subheaders - 16 +16 __pfx_unregister_sysctl_table.part 16 - -16 __pfx_register_leaf_sysctl_tables.constprop 16 - -16 __pfx_count_subheaders.part 16 - -16 __pfx___register_sysctl_base 16 - -16 unregister_sysctl_table.part 136 - -136 __register_sysctl_base 478 - -478 register_leaf_sysctl_tables.constprop 524 - -524 count_subheaders.part 547 - -547 Total: Before=21257652, After=21257057, chg -0.00% [mcgrof: remove register_leaf_sysctl_tables and append_path too and add bloat-o-meter stats] Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org>
| * sysctl: Refactor base paths registrationsJoel Granados2023-05-233-47/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. The old way of doing this through register_sysctl_base and DECLARE_SYSCTL_BASE macro is replaced with a call to register_sysctl_init. The 5 base paths affected are: "kernel", "vm", "debug", "dev" and "fs". We remove the register_sysctl_base function and the DECLARE_SYSCTL_BASE macro since they are no longer needed. In order to quickly acertain that the paths did not actually change I executed `find /proc/sys/ | sha1sum` and made sure that the sha was the same before and after the commit. We end up saving 563 bytes with this change: ./scripts/bloat-o-meter vmlinux.0.base vmlinux.1.refactor-base-paths add/remove: 0/5 grow/shrink: 2/0 up/down: 77/-640 (-563) Function old new delta sysctl_init_bases 55 111 +56 init_fs_sysctls 12 33 +21 vm_base_table 128 - -128 kernel_base_table 128 - -128 fs_base_table 128 - -128 dev_base_table 128 - -128 debug_base_table 128 - -128 Total: Before=21258215, After=21257652, chg -0.00% [mcgrof: modified to use register_sysctl_init() over register_sysctl() and add bloat-o-meter stats] Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Christian Brauner <brauner@kernel.org>
| * sysctl: stop exporting register_sysctl_tableJoel Granados2023-05-232-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We make register_sysctl_table static because the only function calling it is in fs/proc/proc_sysctl.c (__register_sysctl_base). We remove it from the sysctl.h header and modify the documentation in both the header and proc_sysctl.c files to mention "register_sysctl" instead of "register_sysctl_table". This plus the commits that remove register_sysctl_table from parport save 217 bytes: ./scripts/bloat-o-meter .bsysctl/vmlinux.old .bsysctl/vmlinux.new add/remove: 0/1 grow/shrink: 5/1 up/down: 458/-675 (-217) Function old new delta __register_sysctl_base 8 286 +278 parport_proc_register 268 379 +111 parport_device_proc_register 195 247 +52 kzalloc.constprop 598 608 +10 parport_default_proc_register 62 69 +7 register_sysctl_table 291 - -291 parport_sysctl_template 1288 904 -384 Total: Before=8603076, After=8602859, chg -0.00% Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * parport: Removed sysctl related definesJoel Granados2023-05-231-7/+0
| | | | | | | | | | | | | | | | | | | | The partport driver used to rely on defines to include different directories in sysctl. Now that we have made the transition to register_sysctl from regsiter_sysctl_table, they are no longer needed. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * parport: Remove register_sysctl_table from parport_default_proc_registerJoel Granados2023-05-231-17/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. Simply change the full path "dev/parport/default" to point to an already existing set of table entries (vars). We also remove the unused elements from parport_default_table. To make sure the resulting directory structure did not change we made sure that `find /proc/sys/dev/ | sha1sum` was the same before and after the change. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * parport: Remove register_sysctl_table from parport_device_proc_registerJoel Granados2023-05-231-24/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. We use a temp allocation to include both port and device name in proc. Allocated mem is freed at the end. The unused parport_device_sysctl_template struct elements that are not used are removed. Signed-off-by: Joel Granados <j.granados@samsung.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202305150948.pHgIh7Ql-lkp@intel.com/ Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * parport: Remove register_sysctl_table from parport_proc_registerJoel Granados2023-05-231-31/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. Register dev/parport/PORTNAME and dev/parport/PORTNAME/devices. Temporary allocation for name is freed at the end of the function. Remove all the struct elements that are no longer used in the parport_device_sysctl_template struct. Add parport specific defines that hide the base path sizes. To make sure the resulting directory structure did not change we made sure that `find /proc/sys/dev/ | sha1sum` was the same before and after the change. Signed-off-by: Joel Granados <j.granados@samsung.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202305150948.pHgIh7Ql-lkp@intel.com/ Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * parport: Move magic number "15" to a defineJoel Granados2023-05-232-1/+3
| | | | | | | | | | | | | | | | | | | | Put the size of a parport name behind a define so we can use it in other files. This is a preparation patch to be able to use this size in parport/procfs.c. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
* | Merge tag 'v6.5-rc1-modules-next' of ↵Linus Torvalds2023-06-288-152/+49
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "The changes queued up for modules are pretty tame, mostly code removal of moving of code. Only two minor functional changes are made, the only one which stands out is Sebastian Andrzej Siewior's simplification of module reference counting by removing preempt_disable() and that has been tested on linux-next for well over a month without no regressions. I'm now, I guess, also a kitchen sink for some kallsyms changes" [ There was a mis-communication about the concurrent module load changes that I had expected to come through Luis despite me authoring the patch. So some of the module updates were left hanging in the email ether, and I just committed them separately. It's my bad - I should have made it more clear that I expected my own patches to come through the module tree too. Now they missed linux-next, but hopefully that won't cause any issues - Linus ] * tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kallsyms: make kallsyms_show_value() as generic function kallsyms: move kallsyms_show_value() out of kallsyms.c kallsyms: remove unsed API lookup_symbol_attrs kallsyms: remove unused arch_get_kallsym() helper module: Remove preempt_disable() from module reference counting.
| * | kallsyms: make kallsyms_show_value() as generic functionManinder Singh2023-06-082-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change makes function kallsyms_show_value() as generic function without dependency on CONFIG_KALLSYMS. Now module address will be displayed with lsmod and /proc/modules. Earlier: ======= / # insmod test.ko / # lsmod test 12288 0 - Live 0x0000000000000000 (O) // No Module Load address / # With change: ========== / # insmod test.ko / # lsmod test 12288 0 - Live 0xffff800000fc0000 (O) // Module address / # cat /proc/modules test 12288 0 - Live 0xffff800000fc0000 (O) Co-developed-by: Onkarnath <onkarnath.1@samsung.com> Signed-off-by: Onkarnath <onkarnath.1@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * | kallsyms: move kallsyms_show_value() out of kallsyms.cManinder Singh2023-06-083-36/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | function kallsyms_show_value() is used by other parts like modules_open(), kprobes_read() etc. which can work in case of !KALLSYMS also. e.g. as of now lsmod do not show module address if KALLSYMS is disabled. since kallsyms_show_value() defination is not present, it returns false in !KALLSYMS. / # lsmod test 12288 0 - Live 0x0000000000000000 (O) So kallsyms_show_value() can be made generic without dependency on KALLSYMS. Thus moving out function to a new file ksyms_common.c. With this patch code is just moved to new file and no functional change. Co-developed-by: Onkarnath <onkarnath.1@samsung.com> Signed-off-by: Onkarnath <onkarnath.1@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * | kallsyms: remove unsed API lookup_symbol_attrsManinder Singh2023-05-264-71/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with commit '7878c231dae0 ("slab: remove /proc/slab_allocators")' lookup_symbol_attrs usage is removed. Thus removing redundant API. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * | kallsyms: remove unused arch_get_kallsym() helperArnd Bergmann2023-05-262-30/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arch_get_kallsym() function was introduced so that x86 could override it, but that override was removed in bf904d2762ee ("x86/pti/64: Remove the SYSCALL64 entry trampoline"), so now this does nothing except causing a warning about a missing prototype: kernel/kallsyms.c:662:12: error: no previous prototype for 'arch_get_kallsym' [-Werror=missing-prototypes] 662 | int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value, Restore the old behavior before d83212d5dd67 ("kallsyms, x86: Export addresses of PTI entry trampolines") to simplify the code and avoid the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Alan Maguire <alan.maguire@oracle.com> [mcgrof: fold in bpf selftest fix] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * | module: Remove preempt_disable() from module reference counting.Sebastian Andrzej Siewior2023-05-231-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The preempt_disable() section in module_put() was added in commit e1783a240f491 ("module: Use this_cpu_xx to dynamically allocate counters") while the per-CPU counter were switched to another API. The API requires that during the RMW operation the CPU remained the same. This counting API was later replaced with atomic_t in commit 2f35c41f58a97 ("module: Replace module_ref with atomic_t refcnt") Since this atomic_t replacement there is no need to keep preemption disabled while the reference counter is modified. Remove preempt_disable() from module_put(), __module_get() and try_module_get(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
* | | modules: catch concurrent module loads, treat them as idempotentLinus Torvalds2023-06-281-2/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the new-and-improved attempt at avoiding huge memory load spikes when the user space boot sequence tries to load hundreds (or even thousands) of redundant duplicate modules in parallel. See commit 9828ed3f695a ("module: error out early on concurrent load of the same module file") for background and an earlier failed attempt that was reverted. That earlier attempt just said "concurrently loading the same module is silly, just open the module file exclusively and return -ETXTBSY if somebody else is already loading it". While it is true that concurrent module loads of the same module is silly, the reason that earlier attempt then failed was that the concurrently loaded module would often be a prerequisite for another module. Thus failing to load the prerequisite would then cause cascading failures of the other modules, rather than just short-circuiting that one unnecessary module load. At the same time, we still really don't want to load the contents of the same module file hundreds of times, only to then wait for an eventually successful load, and have everybody else return -EEXIST. As a result, this takes another approach, and treats concurrent module loads from the same file as "idempotent" in the inode. So if one module load is ongoing, we don't start a new one, but instead just wait for the first one to complete and return the same return value as it did. So unlike the first attempt, this does not return early: the intent is not to speed up the boot, but to avoid a thundering herd problem in allocating memory (both physical and virtual) for a module more than once. Also note that this does change behavior: it used to be that when you had concurrent loads, you'd have one "winner" that would return success, and everybody else would return -EEXIST. In contrast, this idempotent logic goes all Oprah on the problem, and says "You are a winner! And you are a winner! We are ALL winners". But since there's no possible actual real semantic difference between "you loaded the module" and "somebody else already loaded the module", this is more of a feel-good change than an actual honest-to-goodness semantic change. Of course, any true Johnny-come-latelies that don't get caught in the concurrency filter will still return -EEXIST. It's no different from not even getting a seat at an Oprah taping. That's life. See the long thread on the kernel mailing list about this all, which includes some numbers for memory use before and after the patch. Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/ Reviewed-by: Johan Hovold <johan@kernel.org> Tested-by: Johan Hovold <johan@kernel.org> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Rudi Heitbaum <rudi@heitbaum..com> Tested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | module: split up 'finit_module()' into init_module_from_file() helperLinus Torvalds2023-06-281-15/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will simplify the next step, where we can then key off the inode to do one idempotent module load. Let's do the obvious re-organization in one step, and then the new code in another. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'mmc-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds2023-06-2832-333/+1067
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MMC updates from Ulf Hansson: "MMC core: - Allow synchronous detection of (e)MMC/SD/SDIO cards - Fixup error check for ioctls for SPI hosts - Disable broken SD-Cache support for Kingston Canvas Go Plus from 2019 - Disable broken eMMC-Trim support for Kingston EMMC04G-M627 - Disable broken eMMC-Trim support for Micron MTFC4GACAJCN-1M MMC host: - bcm2835: Convert DT bindings to YAML - mmci: - Enable asynchronous probe - Transform the ux500 HW-busy detection into a proper state machine - Add support for SW busy-end timeouts for the ux500 variants - mmci_stm32: - Add support for sdm32 variant revision v3.0 used on STM32MP25 - Improve the tuning sequence - mtk-sd: Tune polling-period to improve performance - sdhci: Fixup DMA configuration for 64-bit DMA mode - sdhci-bcm-kona: Convert DT bindings to YAML - sdhci-msm: - Switch to use the new ICE API - Add support for the SC8280XP/IPQ6018/QDU1000/QRU1000 variants - sdhci-pci-gli: - Add support SD Express cards for GL9767 - Add support for the Genesys Logic GL9767 variant" * tag 'mmc-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (42 commits) dt-bindings: mmc: fsl-imx-esdhc: Add imx6ul support mmc: mmci: Add support for SW busy-end timeouts mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus from 11/2019 mmc: core: disable TRIM on Kingston EMMC04G-M627 mmc: mmci: stm32: add delay block support for STM32MP25 mmc: mmci: stm32: prepare other delay block support mmc: mmci: stm32: manage block gap hardware flow control mmc: mmci: Add support for sdmmc variant revision v3.0 mmc: mmci: add stm32_idmabsize_align parameter dt-bindings: mmc: mmci: Add st,stm32mp25-sdmmc2 compatible mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M mmc: mmci: Break out a helper function mmc: mmci: Use a switch statement machine mmc: mmci: Use state machine state as exit condition mmc: mmci: Retry the busy start condition mmc: mmci: Make busy complete state machine explicit mmc: mmci: Break out error check in busy detect mmc: mmci: Stash status while waiting for busy mmc: mmci: Unwind big if() clause mmc: mmci: Clear busy_status when starting command ...
| * | | dt-bindings: mmc: fsl-imx-esdhc: Add imx6ul supportOleksij Rempel2023-06-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the 'fsl,imx6ul-usdhc' value to the compatible properties list in the fsl-imx-esdhc.yaml file. This is required to match the compatible strings present in the 'mmc@2190000' node of 'imx6ul-prti6g.dtb'. This commit addresses the following dtbs_check warning: imx6ul-prti6g.dtb:0:0: /soc/bus@2100000/mmc@2190000: failed to match any schema with compatible: ['fsl,imx6ul-usdhc', 'fsl,imx6sx-usdhc'] Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230621093245.78130-2-o.rempel@pengutronix.de Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: mmci: Add support for SW busy-end timeoutsUlf Hansson2023-06-223-7/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ux500 variant doesn't have a HW based timeout to use for busy-end IRQs. To avoid hanging and waiting for the card to stop signaling busy, let's schedule a delayed work, according to the corresponding cmd->busy_timeout for the command. If the work gets to run, let's kick the IRQ handler to complete the currently running request/command. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20230620091113.33393-1-ulf.hansson@linaro.org
| * | | mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus from 11/2019Marek Vasut2023-06-204-8/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This microSD card never clears Flush Cache bit after cache flush has been started in sd_flush_cache(). This leads e.g. to failure to mount file system. Add a quirk which disables the SD cache for this specific card from specific manufacturing date of 11/2019, since on newer dated cards from 05/2023 the cache flush works correctly. Fixes: 08ebf903af57 ("mmc: core: Fixup support for writeback-cache for eMMC and SD") Signed-off-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/20230620102713.7701-1-marex@denx.de Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: core: disable TRIM on Kingston EMMC04G-M627Robert Marko2023-06-201-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems that Kingston EMMC04G-M627 despite advertising TRIM support does not work when the core is trying to use REQ_OP_WRITE_ZEROES. We are seeing I/O errors in OpenWrt under 6.1 on Zyxel NBG7815 that we did not previously have and tracked it down to REQ_OP_WRITE_ZEROES. Trying to use fstrim seems to also throw errors like: [93010.835112] I/O error, dev loop0, sector 16902 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 2 Disabling TRIM makes the error go away, so lets add a quirk for this eMMC to disable TRIM. Signed-off-by: Robert Marko <robimarko@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230619193621.437358-1-robimarko@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: mmci: stm32: add delay block support for STM32MP25Yann Gautier2023-06-201-1/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On STM32MP25, the delay block is inside the SoC, and configured through the SYSCFG registers. The algorithm is also different from what was in STM32MP1 chip. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-7-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: mmci: stm32: prepare other delay block supportYann Gautier2023-06-201-13/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create an sdmmc_tuning_ops struct to ease support for another delay block peripheral. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-6-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: mmci: stm32: manage block gap hardware flow controlYann Gautier2023-06-202-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In stm32 sdmmc variant revision v3.0, a block gap hardware flow control should be used with bus speed modes SDR104 and HS200. It is enabled by writing a non-null value to the new added register MMCI_STM32_FIFOTHRR. The threshold will be 2^(N-1) bytes, so we can use the ffs() function to compute the value N to be written to the register. The threshold used should be the data block size, but must not be bigger than the FIFO size. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-5-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: mmci: Add support for sdmmc variant revision v3.0Yann Gautier2023-06-201-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an update of the SDMMC revision v2.2, with just an increased FIFO size, from 64B to 1kB. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-4-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: mmci: add stm32_idmabsize_align parameterYann Gautier2023-06-203-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The alignment for the IDMA size depends on the peripheral version, it should then be configurable. Add stm32_idmabsize_align in the variant structure. And remove now unused (and wrong) MMCI_STM32_IDMABNDT_* macros. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-3-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | dt-bindings: mmc: mmci: Add st,stm32mp25-sdmmc2 compatibleYann Gautier2023-06-201-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For STM32MP25, we'll need to distinguish how is managed the delay block. This is done through a new comptible dedicated for this SoC, as the delay block registers are located in SYSCFG peripheral. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230619115120.64474-2-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: core: disable TRIM on Micron MTFC4GACAJCN-1MRobert Marko2023-06-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems that Micron MTFC4GACAJCN-1M despite advertising TRIM support does not work when the core is trying to use REQ_OP_WRITE_ZEROES. We are seeing the following errors in OpenWrt under 6.1 on Qnap Qhora 301W that we did not previously have and tracked it down to REQ_OP_WRITE_ZEROES: [ 18.085950] I/O error, dev loop0, sector 596 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2 Disabling TRIM makes the error go away, so lets add a quirk for this eMMC to disable TRIM. Signed-off-by: Robert Marko <robimarko@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230530213259.1776512-1-robimarko@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: Merge branch fixes into nextUlf Hansson2023-06-1913-18/+23
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge the mmc fixes for v6.4-rc[n] into the next branch, to allow them to get tested together with the new mmc changes that are targeted for v6.5. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | | mmc: mmci: Break out a helper functionLinus Walleij2023-06-191-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These four lines clearing, masking and resetting the state of the busy detect state machine is repeated five times in the code so break this out to a small helper so things are easier to read. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230405-pl180-busydetect-fix-v7-9-69a7164f2a61@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | | mmc: mmci: Use a switch statement machineLinus Walleij2023-06-191-12/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As is custom, use a big switch statement to transition between the edges of the state machine inside the ux500 ->busy_complete callback. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230405-pl180-busydetect-fix-v7-8-69a7164f2a61@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | | mmc: mmci: Use state machine state as exit conditionLinus Walleij2023-06-191-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return true if and only if we reached the state MMCI_BUSY_DONE in the ux500 ->busy_complete() callback. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230405-pl180-busydetect-fix-v7-7-69a7164f2a61@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | | mmc: mmci: Retry the busy start conditionLinus Walleij2023-06-191-12/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the ux500 ->busy_complete() callback re-read the status register 10 times while waiting for the busy signal to assert in the status register. If this does not happen, we bail out regarding the command completed already, i.e. before we managed to start to check the busy status. There is a comment in the code about this, let's just implement it to be certain that we can catch this glitch if it happens. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230405-pl180-busydetect-fix-v7-6-69a7164f2a61@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | | mmc: mmci: Make busy complete state machine explicitLinus Walleij2023-06-192-18/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This refactors the ->busy_complete() callback currently only used by Ux500 and STM32 to handle busy detection on hardware where one and the same IRQ is fired whether we get a start or an end signal on busy detect. The code is currently using the cached status from the command IRQ in ->busy_status as a state to select what to do next: if this state is non-zero we are waiting for IRQs and if it is zero we treat the state as the starting point for a busy detect wait cycle. Make this explicit by creating a state machine where the ->busy_complete callback moves between three states. The Ux500 busy detect code currently assumes this order: we enable the busy detect IRQ, get a busy start IRQ, then a busy end IRQ, and then we clear and mask this IRQ and proceed. We insert debug prints for unexpected states. This works as before on most cards, however on a problematic card that is not working with busy detect, and which I have been debugging, the following happens a lot: [ 3.380554] mmci-pl18x 80005000.mmc: no busy signalling in time [ 3.387420] mmci-pl18x 80005000.mmc: no busy signalling in time [ 3.394561] mmci-pl18x 80005000.mmc: lost busy status when waiting for busy start IRQ This probably means that the busy detect start IRQ has already occurred when we start executing the ->busy_complete() callbacks, and the busy detect end IRQ is counted as the start IRQ, and this is what is causing the card to not be detected properly. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230405-pl180-busydetect-fix-v7-5-69a7164f2a61@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | | mmc: mmci: Break out error check in busy detectLinus Walleij2023-06-191-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The busy detect callback for Ux500 checks for an error in the status in the first if() clause. The only practical reason is that if an error occurs, the if()-clause is not executed, and the code falls through to the last if()-clause if (host->busy_status) which will clear and disable the irq. Make this explicit instead: it is easier to read. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230405-pl180-busydetect-fix-v7-4-69a7164f2a61@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | | mmc: mmci: Stash status while waiting for busyLinus Walleij2023-06-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some interesting flags can arrive while we are waiting for the first busy detect IRQ so OR then onto the stashed flags so they are not missed. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230405-pl180-busydetect-fix-v7-3-69a7164f2a61@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | | mmc: mmci: Unwind big if() clauseLinus Walleij2023-06-191-7/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does two things: firsr replace the hard-to-read long if-expression: if (!host->busy_status && !(status & err_msk) && (readl(base + MMCISTATUS) & host->variant->busy_detect_flag)) { With the more readable: if (!host->busy_status && !(status & err_msk)) { status = readl(base + MMCISTATUS); if (status & host->variant->busy_detect_flag) { Second notice that the re-read MMCISTATUS register is now stored into the status variable, using logic OR because what if something else changed too? While we are at it, explain what the function is doing. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230405-pl180-busydetect-fix-v7-2-69a7164f2a61@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>