summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* modpost: use fnmatch() to simplify match()Masahiro Yamada2022-06-051-61/+13
| | | | | | | | | Replace the own implementation for wildcard (glob) matching with a function call to fnmatch(). Also, change the return type to 'bool'. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* modpost: simplify mod->name allocationMasahiro Yamada2022-06-051-13/+12
| | | | | | | | | | | | | | | | | | | mod->name is set to the ELF filename with the suffix ".o" stripped. The current code calls strdup() and free() to manipulate the string, but a simpler approach is to pass new_module() with the name length subtracted by 2. Also, check if the passed filename ends with ".o" before stripping it. The current code blindly chops the suffix: tmp[strlen(tmp) - 2] = '\0' It will cause buffer under-run if strlen(tmp) < 2; Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
* kbuild: factor out the common objtool argumentsMasahiro Yamada2022-06-054-97/+52
| | | | | | | | | | | | | | | | scripts/Makefile.build and scripts/link-vmlinux.sh have similar setups for the objtool arguments. It was difficult to factor out them because all the vmlinux build rules were written in a shell script. It is somewhat tedious to touch the two files every time a new objtool option is supported. To reduce the code duplication, move the objtool for vmlinux.o into scripts/Makefile.vmlinux_o. Then, move the common macros to Makefile.lib so they are shared between Makefile.build and Makefile.vmlinux_o. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
* kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_oMasahiro Yamada2022-06-052-40/+62
| | | | | | | This is a preparation for moving the objtool rule in the next commit. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
* kbuild: clean .tmp_* pattern by make cleanMasahiro Yamada2022-06-053-8/+5
| | | | | | | | | | | | | Change the "make clean" rule to remove all the .tmp_* files. .tmp_objdiff is the only exception, which should be removed by "make mrproper". Rename the record directory of objdiff, .tmp_objdiff to .objdiff to avoid the removal by "make clean". Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
* kbuild: remove redundant cleanups in scripts/link-vmlinux.shMasahiro Yamada2022-06-011-2/+0
| | | | | | | | | | These are cleaned by the top Makefile. vmlinux.o and .vmlinux.d matches the '*.[aios]' and '.*.d' patterns respectively. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
* kbuild: rebuild multi-object modules when objtool is updatedMasahiro Yamada2022-06-011-3/+8
| | | | | | | | | | | | | | | When CONFIG_LTO_CLANG or CONFIG_X86_KERNEL_IBT is enabled, objtool for multi-object modules is postponed until the objects are linked together. Make sure to re-run objtool and re-link multi-object modules when objtool is updated. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
* kbuild: add cmd_and_savecmd macroMasahiro Yamada2022-06-011-2/+4
| | | | | | | | | | | | | | | | | | Separate out the command execution part of if_changed, as we did for if_changed_dep. This allows us to reuse it in if_changed_rule. define rule_foo $(call cmd_and_savecmd,foo) $(call cmd,bar) endef Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
* kbuild: make *.mod rule robust against too long argument errorMasahiro Yamada2022-06-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | Like built-in.a, the command length of the *.mod rule scales with the depth of the directory times the number of objects in the Makefile. Add $(obj)/ by the shell command (awk) instead of by Make's builtin function. In-tree modules still have some room to the limit (ARG_MAX=2097152), but this is more future-proof for big modules in a deep directory. For example, you can build i915 as a module (CONFIG_DRM_I915=m) and compare drivers/gpu/drm/i915/.i915.mod.cmd with/without this commit. The issue is more critical for external modules because the M= path can be very long as Jeff Johnson reported before [1]. [1] https://lore.kernel.org/linux-kbuild/4c02050c4e95e4cb8cc04282695f8404@codeaurora.org/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
* kbuild: make built-in.a rule robust against too long argument errorMasahiro Yamada2022-06-011-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kbuild runs at the top of objtree instead of changing the working directory to subdirectories. I think this design is nice overall but some commands have a scalability issue. The build command of built-in.a is one of them whose length scales with: O(D * N) Here, D is the length of the directory path (i.e. $(obj)/ prefix), N is the number of objects in the Makefile, O() is the big O notation. The deeper directory the Makefile directory is located, the more easily it will hit the too long argument error. We can make it better. Trim the $(obj)/ by Make's builtin function, and restore it by a shell command (sed). With this, the command length scales with: O(D + N) In-tree modules still have some room to the limit (ARG_MAX=2097152), but this is more future-proof for big modules in a deep directory. For example, you can build i915 as builtin (CONFIG_DRM_I915=y) and compare drivers/gpu/drm/i915/.built-in.a.cmd with/without this commit. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
* kbuild: check static EXPORT_SYMBOL* by script instead of modpostMasahiro Yamada2022-06-013-27/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'static' specifier and EXPORT_SYMBOL() are an odd combination. Commit 15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* functions") tried to detect it, but this check has false negatives. Here is the sample code. Makefile: obj-y += foo1.o foo2.o foo1.c: #include <linux/export.h> static void foo(void) {} EXPORT_SYMBOL(foo); foo2.c: void foo(void) {} foo1.c exports the static symbol 'foo', but modpost cannot catch it because it is fooled by foo2.c, which has a global symbol with the same name. s->is_static is cleared if a global symbol with the same name is found somewhere, but EXPORT_SYMBOL() and the global symbol do not necessarily belong to the same compilation unit. This check should be done per compilation unit, but I do not know how to do it in modpost. modpost runs against vmlinux.o or modules, which merges multiple objects, then forgets their origin. modpost cannot parse individual objects because they may not be ELF but LLVM IR when CONFIG_LTO_CLANG=y. Add a simple bash script to parse the output from ${NM}. This works for CONFIG_LTO_CLANG=y because llvm-nm can dump symbols of LLVM IR files. Revert 15bfc2348d54. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
* parisc: remove arch/parisc/nmMasahiro Yamada2022-05-292-7/+0
| | | | | | | | | | | | | | | | | | | | | | | Parisc overrides 'nm' with a shell script. I was hit by a false-positive error of $(NM) because this script returns the exit status of grep instead of ${CROSS_COMPILE}nm. (grep returns 1 if no lines were selected) I tried to fix it, but in the code review, Helge suggested to remove it entirely. [1] This script was added in 2003. [2] Presumably, it was a workaround for old toolchains (but even the parisc maintainer does not know the detail any more). Hopefully, recent tools should work fine. [1]: https://lore.kernel.org/all/1c12cd26-d8aa-4498-f4c0-29478b9578fe@gmx.de/ [2]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=36eaa6e4c0e0b6950136b956b72fd08155b92ca3 Suggested-by: Helge Deller <deller@gmx.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Helge Deller <deller@gmx.de>
* kbuild: do not create *.prelink.o for Clang LTO or IBTMasahiro Yamada2022-05-296-67/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_LTO_CLANG=y, additional intermediate *.prelink.o is created for each module. Also, objtool is postponed until LLVM IR is converted to ELF. CONFIG_X86_KERNEL_IBT works in a similar way to postpone objtool until objects are merged together. This commit stops generating *.prelink.o, so the build flow will look similar with/without LTO. The following figures show how the LTO build currently works, and how this commit is changing it. Current build flow ================== [1] single-object module $(LD) $(CC) +objtool $(LD) foo.c --------------------> foo.o -----> foo.prelink.o -----> foo.ko (LLVM IR) (ELF) | (ELF) | foo.mod.o --/ (LLVM IR) [2] multi-object module $(LD) $(CC) $(AR) +objtool $(LD) foo1.c -----> foo1.o -----> foo.o -----> foo.prelink.o -----> foo.ko | (archive) (ELF) | (ELF) foo2.c -----> foo2.o --/ | (LLVM IR) foo.mod.o --/ (LLVM IR) One confusion is that foo.o in multi-object module is an archive despite of its suffix. New build flow ============== [1] single-object module Since there is only one object, there is no need to keep the LLVM IR. Use $(CC)+$(LD) to generate an ELF object in one build rule. When LTO is disabled, $(LD) is unneeded because $(CC) produces an ELF object. $(CC)+$(LD)+objtool $(LD) foo.c ----------------------------> foo.o ---------> foo.ko (ELF) | (ELF) | foo.mod.o --/ (LLVM IR) [2] multi-object module Previously, $(AR) was used to combine LLVM IR files into an archive, but there was no technical reason to do so. Use $(LD) to merge them into a single ELF object. $(LD) $(CC) +objtool $(LD) foo1.c ---------> foo1.o ---------> foo.o ---------> foo.ko | (ELF) | (ELF) foo2.c ---------> foo2.o ----/ | (LLVM IR) foo.mod.o --/ (LLVM IR) Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64) Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
* kbuild: replace $(linked-object) with CONFIG optionsMasahiro Yamada2022-05-291-2/+1
| | | | | | | | | | | | | | | *.prelink.o is created when CONFIG_LTO_CLANG or CONFIG_X86_KERNEL_IBT is enabled. Replace $(linked-object) with $(CONFIG_LTO_CLANG)$(CONFIG_X86_KERNEL_IBT) so you will get a quick idea of when the --link option is passed. No functional change is intended. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14
* kbuild: do not try to parse *.cmd files for objects provided by compilerMasahiro Yamada2022-05-291-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Guenter Roeck reported the build breakage for parisc and csky. I confirmed nios2 and openrisc are broken as well. The reason is that they borrow libgcc.a from the toolchains. For example, see this line in arch/parisc/Makefile: LIBGCC := $(shell $(CC) -print-libgcc-file-name) Some objects in libgcc.a are linked to vmlinux.o, but they do not have .*.cmd files. Obviously, there is no EXPORT_SYMBOL in external objects. Ignore them. (Most of the architectures import library code into the kernel tree. Perhaps those 4 architectures can do similar, but I do not know how challenging it is.) Fixes: f292d875d0dc ("modpost: extract symbol versions from *.cmd files") Link: https://lore.kernel.org/linux-kbuild/20220528224745.GA2501857@roeck-us.net/T/#mac65c20c71c3e272db0350ecfba53fcd8905b0a0 Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Guenter Roeck <linux@roeck-us.net>
* kbuild: replace $(if A,A,B) with $(or A,B) in scripts/Makefile.modpostMasahiro Yamada2022-05-271-1/+1
| | | | | | | | Similar cleanup to commit 5c8166419acf ("kbuild: replace $(if A,A,B) with $(or A,B)"). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
* modpost: squash if...else-if in find_elf_symbol2()Masahiro Yamada2022-05-271-7/+3
| | | | | | | | | | | | | | | | | | | | | if ((addr - sym->st_value) < distance) { distance = addr - sym->st_value; near = sym; } else if ((addr - sym->st_value) == distance) { near = sym; } is equivalent to: if (addr - sym->st_value <= distance) { distance = addr - sym->st_value; near = sym; } (The else-if block can overwrite 'distance' with the same value). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
* modpost: reuse ARRAY_SIZE() macro for section_mismatch()Masahiro Yamada2022-05-273-6/+6
| | | | | | | | | | Move ARRAY_SIZE() from file2alias.c to modpost.h to reuse it in section_mismatch(). Also, move the variable 'check' inside the for-loop. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
* modpost: remove the unused argument of check_sec_ref()Masahiro Yamada2022-05-271-3/+2
| | | | | | | check_sec_ref() does not use the first parameter 'mod'. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
* modpost: fix undefined behavior of is_arm_mapping_symbol()Masahiro Yamada2022-05-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return value of is_arm_mapping_symbol() is unpredictable when "$" is passed in. strchr(3) says: The strchr() and strrchr() functions return a pointer to the matched character or NULL if the character is not found. The terminating null byte is considered part of the string, so that if c is specified as '\0', these functions return a pointer to the terminator. When str[1] is '\0', strchr("axtd", str[1]) is not NULL, and str[2] is referenced (i.e. buffer overrun). Test code --------- char str1[] = "abc"; char str2[] = "ab"; strcpy(str1, "$"); strcpy(str2, "$"); printf("test1: %d\n", is_arm_mapping_symbol(str1)); printf("test2: %d\n", is_arm_mapping_symbol(str2)); Result ------ test1: 0 test2: 1 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
* modpost: fix removing numeric suffixesAlexander Lobakin2022-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the `-z unique-symbol` linker flag or any similar mechanism, it is possible to trigger the following: ERROR: modpost: "param_set_uint.0" [vmlinux] is a static EXPORT_SYMBOL The reason is that for now the condition from remove_dot(): if (m && (s[n + m] == '.' || s[n + m] == 0)) which was designed to test if it's a dot or a '\0' after the suffix is never satisfied. This is due to that `s[n + m]` always points to the last digit of a numeric suffix, not on the symbol next to it (from a custom debug print added to modpost): param_set_uint.0, s[n + m] is '0', s[n + m + 1] is '\0' So it's off-by-one and was like that since 2014. Fix this for the sake of any potential upcoming features, but don't bother stable-backporting, as it's well hidden -- apart from that LD flag, it can be triggered only with GCC LTO which never landed upstream. Fixes: fcd38ed0ff26 ("scripts: modpost: fix compilation warning") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* scripts/kallsyms: update usage message of the kallsyms programYuntao Wang2022-05-271-1/+1
| | | | | | | | The kallsyms program supports --absolute-percpu option but does not display it in the usage message, fix it. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* kbuild: Fix include path in scripts/Makefile.modpostJing Leng2022-05-271-2/+1
| | | | | | | | | | | | | | | | | | | When building an external module, if users don't need to separate the compilation output and source code, they run the following command: "make -C $(LINUX_SRC_DIR) M=$(PWD)". At this point, "$(KBUILD_EXTMOD)" and "$(src)" are the same. If they need to separate them, they run "make -C $(KERNEL_SRC_DIR) O=$(KERNEL_OUT_DIR) M=$(OUT_DIR) src=$(PWD)". Before running the command, they need to copy "Kbuild" or "Makefile" to "$(OUT_DIR)" to prevent compilation failure. So the kernel should change the included path to avoid the copy operation. Signed-off-by: Jing Leng <jleng@ambarella.com> [masahiro: I do not think "M=$(OUT_DIR) src=$(PWD)" is the official way, but this patch is a nice clean up anyway.] Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* Merge tag 'for-5.19/dm-changes' of ↵Linus Torvalds2022-05-2615-287/+409
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Enable DM core bioset's per-cpu bio cache if QUEUE_FLAG_POLL set. This change improves DM's hipri bio polling (REQ_POLLED) performance by 7 - 20% depending on the system. - Update DM core to use jump_labels to further reduce cost of unlikely branches for zoned block devices, dm-stats and swap_bios throttling. - Various DM core changes to reduce bio-based DM overhead and simplify IO accounting. - Fundamental DM core improvements to dm_io reference counting and the elimination of using bio_split()+bio_chain() -- instead DM's bio-based IO accounting is updated to account that a split occurred. - Improve DM core's abnormal bio processing to do less work. - Improve DM core's hipri polling support to use a single list rather than an hlist. - Update DM core to pass NULL bdev to bio_alloc_clone() so that initialization that isn't useful for DM can be elided. - Add cond_resched to DM stats' various loops that loop over all entries. - Fix incorrect error code return from DM integrity's constructor. - Make DM crypt's printing of the key constant-time. - Update bio-based DM multipath to provide high-resolution timer to the Historical Service Time (HST) path selector. * tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits) dm: pass NULL bdev to bio_alloc_clone dm cache metadata: remove unnecessary variable in __dump_mapping dm mpath: provide high-resolution timer to HST for bio-based dm crypt: make printing of the key constant-time dm integrity: fix error code in dm_integrity_ctr() dm stats: add cond_resched when looping over entries dm: improve abnormal bio processing dm: simplify bio-based IO accounting further dm: put all polled dm_io instances into a single list dm: improve dm_io reference counting dm: don't grab target io reference in dm_zone_map_bio dm: improve bio splitting and associated IO accounting dm: switch to bdev based IO accounting interfaces dm: pass dm_io instance to dm_io_acct directly dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct dm: use bio_sectors in dm_aceept_partial_bio dm: simplify basic targets dm: conditionally enable branching for less used features dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio dm: move hot dm_io members to same cacheline as dm_target_io ...
| * dm: pass NULL bdev to bio_alloc_cloneMike Snitzer2022-05-111-16/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most DM targets will remap the clone bio passed to their ->map function using bio_set_bdev(). So this change to pass NULL bdev to bio_alloc_clone avoids clone-time work that sets up resources for a bdev association that will not be used in practice (e.g. clone issued to underlying device will not use DM device's blk-cgroups resources). But clone->bi_bdev is still initialized following bio_alloc_clone to preserve DM target expectations that clone->bi_bdev will be set. Follow-up work is needed to audit DM targets to remove accesses to a clone->bi_bdev that the target didn't initialize with bio_set_dev(). Depends-on: 7ecc56c62b27 ("block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone") Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm cache metadata: remove unnecessary variable in __dump_mappingGuo Zhengkui2022-05-091-2/+1
| | | | | | | | | | | | | | | | | | | | Fix the following coccicheck warning: drivers/md/dm-cache-metadata.c:1512:5-6: Unneeded variable: "r". Return "0" on line 1520. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm mpath: provide high-resolution timer to HST for bio-basedGabriel Krisman Bertazi2022-05-093-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The precision loss of reading IO start_time with jiffies_to_nsecs instead of using a high resolution timer degrades HST path prediction for BIO-based mpath on high load workloads. Below, I show the utilization percentage of a 10 disk multipath with asymmetrical disk access cost, while being exercised by a randwrite FIO benchmark with high submission queue depth (depth=64). It is possible to see that the HST path selection degrades heavily for high-iops in BIO-mpath, underutilizing the slower paths way beyond expected. This seems to be caused by the start_time truncation, which makes some IO to seem much slower than it actually is. In this scenario ST outperforms HST for bio-mpath, but not for mq-mpath, which already uses ktime_get_ns(). The third column shows utilization with this patch applied. It is easy to see that now HST prediction is much closer to the ideal distribution (calculated considering the real cost of each path). | | ST | HST (orig) | HST(ktime) | Best | | sdd | 0.17 | 0.20 | 0.17 | 0.18 | | sde | 0.17 | 0.20 | 0.17 | 0.18 | | sdf | 0.17 | 0.20 | 0.17 | 0.18 | | sdg | 0.06 | 0.00 | 0.06 | 0.04 | | sdh | 0.03 | 0.00 | 0.03 | 0.02 | | sdi | 0.03 | 0.00 | 0.03 | 0.02 | | sdj | 0.02 | 0.00 | 0.01 | 0.01 | | sdk | 0.02 | 0.00 | 0.01 | 0.01 | | sdl | 0.17 | 0.20 | 0.17 | 0.18 | | sdm | 0.17 | 0.20 | 0.17 | 0.18 | This issue was originally discussed [1] when we first merged HST, and this patch was left as a low hanging fruit to be solved later. Regarding the implementation, as suggested by Mike in that mail thread, in order to avoid the overhead of ktime_get_ns for other selectors, this patch adds a flag for the selector code to request the high-resolution timer. I tested this using the same benchmark used in the original HST submission. Full test and benchmark scripts are available here: https://people.collabora.com/~krisman/HST-BIO-MPATH/ [1] https://lore.kernel.org/lkml/85tv0am9de.fsf@collabora.com/T/ Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> [snitzer: cleaned up various implementation details] Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm crypt: make printing of the key constant-timeMikulas Patocka2022-05-091-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device mapper dm-crypt target is using scnprintf("%02x", cc->key[i]) to report the current key to userspace. However, this is not a constant-time operation and it may leak information about the key via timing, via cache access patterns or via the branch predictor. Change dm-crypt's key printing to use "%c" instead of "%02x". Also introduce hex2asc() that carefully avoids any branching or memory accesses when converting a number in the range 0 ... 15 to an ascii character. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm integrity: fix error code in dm_integrity_ctr()Dan Carpenter2022-05-091-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "r" variable shadows an earlier "r" that has function scope. It means that we accidentally return success instead of an error code. Smatch has a warning for this: drivers/md/dm-integrity.c:4503 dm_integrity_ctr() warn: missing error code 'r' Fixes: 7eada909bfd7 ("dm: add integrity target") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm stats: add cond_resched when looping over entriesMikulas Patocka2022-05-091-0/+8
| | | | | | | | | | | | | | | | | | | | dm-stats can be used with a very large number of entries (it is only limited by 1/4 of total system memory), so add rescheduling points to the loops that iterate over the entries. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: improve abnormal bio processingMike Snitzer2022-05-051-31/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | Read/write/flush are the most common operations, optimize switch in is_abnormal_io() for those cases. Follows same pattern established in block perf-wip commit ("block: optimise blk_may_split for normal rw") Also, push is_abnormal_io() check and blk_queue_split() down from dm_submit_bio() to dm_split_and_process_bio() and set new 'is_abnormal_io' flag in clone_info. Optimize __split_and_process_bio and __process_abnormal_io by leveraging ci.is_abnormal_io flag. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: simplify bio-based IO accounting furtherMike Snitzer2022-05-052-34/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that io splitting is recorded prior to, or during, ->map IO accounting can happen immediately rather than defer until after bio splitting in dm_split_and_process_bio(). Remove the DM_IO_START_ACCT flag and also remove dm_io's map_task member because there is no longer any need to wait for splitting to occur before accounting. Also move dm_io struct's 'flags' member to consolidate struct holes. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: put all polled dm_io instances into a single listMing Lei2022-05-052-26/+28
| | | | | | | | | | | | | | | | | | | | | | | | Now that bio_split() isn't used by DM's bio splitting, it is a bit overkill to link dm_io into an hlist given there is only single dm_io in the list. Convert to using a single list for holding all dm_io instances associated with this bio. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: improve dm_io reference countingMing Lei2022-05-051-14/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently each dm_io's reference counter is grabbed before calling __map_bio(), this way isn't efficient since we can move this grabbing to initialization time inside alloc_io(). Meantime it becomes typical async io reference counter model: one is for submission side, the other is for completion side, and the io won't be completed until both sides are done. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: don't grab target io reference in dm_zone_map_bioMing Lei2022-05-053-18/+6
| | | | | | | | | | | | | | | | | | | | | | | | dm_zone_map_bio() is only called from __map_bio in which the io's reference is grabbed already, and the reference won't be released until the bio is submitted, so not necessary to do it dm_zone_map_bio any more. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: improve bio splitting and associated IO accountingMing Lei2022-05-052-24/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current DM code (ab)uses late assignment of dm_io->orig_bio (after __map_bio() returns and any bio splitting is complete) to indicate the FS bio has been processed and can be accounted. This results in awkward waiting until ->orig_bio is set in dm_submit_bio_remap(). Also the bio splitting was implemented using bio_split()+bio_chain() -- a well-worn pattern but it requires bio cloning purely for the benefit of more natural IO accounting. The bio_split() result was stored in ->orig_bio to represent the mapped part of the original FS bio. DM has switched to the bdev based IO accounting interface. DM's IO accounting can be implemented in terms of the original FS bio (now stored early in ->orig_bio) via access to its sectors/bio_op. And if/when splitting is needed, set a new DM_IO_WAS_SPLIT flag and use new dm_io fields of .sector_offset & .sectors to allow IO accounting for split bios _without_ needing to clone a new bio to store in ->orig_bio. Signed-off-by: Ming Lei <ming.lei@redhat.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: switch to bdev based IO accounting interfacesMing Lei2022-05-051-15/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | DM splits flush with data into empty flush followed by bio with data payload, switch dm_io_acct() to use bdev_{start,end}_io_acct() to do this accoiunting more naturally (rather than temporarily changing the bio's bi_size). This will allow DM to more easily account bios that are split (in following commit). Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: pass dm_io instance to dm_io_acct directlyMing Lei2022-05-051-4/+7
| | | | | | | | | | | | | | | | All the other 4 parameters are retrieved from the 'dm_io' instance, so it's not necessary to pass all four to dm_io_acct(). Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: don't pass bio to __dm_start_io_acct and dm_end_io_acctMing Lei2022-05-051-11/+8
| | | | | | | | | | | | | | | | dm->orig_bio is always passed to __dm_start_io_acct and dm_end_io_acct, so it isn't necessary to take one bio parameter for the two helpers. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: use bio_sectors in dm_aceept_partial_bioMike Snitzer2022-05-051-5/+5
| | | | | | | | | | | | | | Rename 'bi_size' to 'bio_sectors' given bi_size is being stored in sectors. Also, use bio_sectors() rather than open-coding it. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: simplify basic targetsMike Snitzer2022-05-053-14/+4
| | | | | | | | | | | | | | Remove needless factoring and remap bi_sector regardless of bio_sectors() being non-zero. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: conditionally enable branching for less used featuresMike Snitzer2022-05-054-23/+53
| | | | | | | | | | | | | | Use jump_labels to further reduce cost of unlikely branches for zoned block devices, dm-stats and swap_bios throttling. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bioMike Snitzer2022-05-051-4/+24
| | | | | | | | | | | | | | | | | | If a bio is marked REQ_NOWAIT optimize dm_submit_bio()'s dm_table RCU usage to dm_{get,put}_live_table_fast. DM core offers protection against blocking (via suspend) if REQ_NOWAIT. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: move hot dm_io members to same cacheline as dm_target_ioMike Snitzer2022-05-051-5/+7
| | | | | | | | | | | | | | Just saves some cacheline bouncing for members accessed during cloned bio submission and completion. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: add local variables to clone_endio and __map_bioMike Snitzer2022-05-051-13/+12
| | | | | | | | | | | | Avoid redundant dereferences in both functions. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: mark various branches unlikelyMike Snitzer2022-05-051-3/+3
| | | | | | | | Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: simplify dm_start_io_acctMike Snitzer2022-05-052-13/+11
| | | | | | | | | | | | | | Pull common DM_IO_ACCOUNTED check out to beginning of dm_start_io_acct. Also, use dm_tio_is_normal (and move it to dm-core.h). Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: simplify dm_io access in dm_split_and_process_bioMike Snitzer2022-05-051-6/+8
| | | | | | | | | | | | Use local variable instead of redudant access using ci.io Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: factor out dm_io_set_error and __dm_io_dec_pendingMike Snitzer2022-05-051-28/+36
| | | | | | | | | | | | Also eliminate need to use errno_to_blk_status(). Signed-off-by: Mike Snitzer <snitzer@kernel.org>
| * dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io biosetMike Snitzer2022-05-053-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | A bioset's per-cpu alloc cache may have broader utility in the future but for now constrain it to being tightly coupled to QUEUE_FLAG_POLL. Also change dm_io_complete() to use bio_clear_polled() so that it properly clears all associated bio state on requeue. This commit improves DM's hipri bio polling (REQ_POLLED) perf by 7 - 20% depending on the system. Signed-off-by: Mike Snitzer <snitzer@kernel.org>