summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* bpf: fix possible file descriptor leaks in verifierAnton Protopopov2024-03-291-0/+3
| | | | | | | | | | | | | | | | The resolve_pseudo_ldimm64() function might have leaked file descriptors when BPF_MAP_TYPE_ARENA was used in a program (some error paths missed a corresponding fdput). Add missing fdputs. v2: remove unrelated changes from the fix Fixes: 6082b6c328b5 ("bpf: Recognize addr_space_cast instruction in the verifier.") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://lore.kernel.org/r/20240329071106.67968-1-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* bpf: support deferring bpf_link dealloc to after RCU grace periodAndrii Nakryiko2024-03-282-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BPF link for some program types is passed as a "context" which can be used by those BPF programs to look up additional information. E.g., for multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values. Because of this runtime dependency, when bpf_link refcnt drops to zero there could still be active BPF programs running accessing link data. This patch adds generic support to defer bpf_link dealloc callback to after RCU GP, if requested. This is done by exposing two different deallocation callbacks, one synchronous and one deferred. If deferred one is provided, bpf_link_free() will schedule dealloc_deferred() callback to happen after RCU GP. BPF is using two flavors of RCU: "classic" non-sleepable one and RCU tasks trace one. The latter is used when sleepable BPF programs are used. bpf_link_free() accommodates that by checking underlying BPF program's sleepable flag, and goes either through normal RCU GP only for non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP (taking into account rcu_trace_implies_rcu_gp() optimization), if BPF program is sleepable. We use this for multi-kprobe and multi-uprobe links, which dereference link during program run. We also preventively switch raw_tp link to use deferred dealloc callback, as upcoming changes in bpf-next tree expose raw_tp link data (specifically, cookie value) to BPF program at runtime as well. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") Reported-by: syzbot+981935d9485a560bfbcb@syzkaller.appspotmail.com Reported-by: syzbot+2cb5a6c573e98db598cc@syzkaller.appspotmail.com Reported-by: syzbot+62d8b26793e8a2bd0516@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20240328052426.3042617-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* bpf: put uprobe link's path and task in release callbackAndrii Nakryiko2024-03-281-3/+3
| | | | | | | | | | | | | | | | | | There is no need to delay putting either path or task to deallocation step. It can be done right after bpf_uprobe_unregister. Between release and dealloc, there could be still some running BPF programs, but they don't access either task or path, only data in link->uprobes, so it is safe to do. On the other hand, doing path_put() in dealloc callback makes this dealloc sleepable because path_put() itself might sleep. Which is problematic due to the need to call uprobe's dealloc through call_rcu(), which is what is done in the next bug fix patch. So solve the problem by releasing these resources early. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240328052426.3042617-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* Merge tag 'net-6.9-rc2' of ↵Linus Torvalds2024-03-284-11/+56
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, WiFi and netfilter. Current release - regressions: - ipv6: fix address dump when IPv6 is disabled on an interface Current release - new code bugs: - bpf: temporarily disable atomic operations in BPF arena - nexthop: fix uninitialized variable in nla_put_nh_group_stats() Previous releases - regressions: - bpf: protect against int overflow for stack access size - hsr: fix the promiscuous mode in offload mode - wifi: don't always use FW dump trig - tls: adjust recv return with async crypto and failed copy to userspace - tcp: properly terminate timers for kernel sockets - ice: fix memory corruption bug with suspend and rebuild - at803x: fix kernel panic with at8031_probe - qeth: handle deferred cc1 Previous releases - always broken: - bpf: fix bug in BPF_LDX_MEMSX - netfilter: reject table flag and netdev basechain updates - inet_defrag: prevent sk release while still in use - wifi: pick the version of SESSION_PROTECTION_NOTIF - wwan: t7xx: split 64bit accesses to fix alignment issues - mlxbf_gige: call request_irq() after NAPI initialized - hns3: fix kernel crash when devlink reload during pf initialization" * tag 'net-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) inet: inet_defrag: prevent sk release while still in use Octeontx2-af: fix pause frame configuration in GMP mode net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips net: bcmasp: Remove phy_{suspend/resume} net: bcmasp: Bring up unimac after PHY link up net: phy: qcom: at803x: fix kernel panic with at8031_probe netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c netfilter: nf_tables: skip netdev hook unregistration if table is dormant netfilter: nf_tables: reject table flag and netdev basechain updates netfilter: nf_tables: reject destroy command to remove basechain hooks bpf: update BPF LSM designated reviewer list bpf: Protect against int overflow for stack access size bpf: Check bloom filter map value size bpf: fix warning for crash_kexec selftests: netdevsim: set test timeout to 10 minutes net: wan: framer: Add missing static inline qualifiers mlxbf_gige: call request_irq() after NAPI initialized tls: get psock ref after taking rxlock to avoid leak selftests: tls: add test with a partially invalid iov tls: adjust recv return with async crypto and failed copy to userspace ...
| * bpf: Protect against int overflow for stack access sizeAndrei Matei2024-03-271-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch re-introduces protection against the size of access to stack memory being negative; the access size can appear negative as a result of overflowing its signed int representation. This should not actually happen, as there are other protections along the way, but we should protect against it anyway. One code path was missing such protections (fixed in the previous patch in the series), causing out-of-bounds array accesses in check_stack_range_initialized(). This patch causes the verification of a program with such a non-sensical access size to fail. This check used to exist in a more indirect way, but was inadvertendly removed in a833a17aeac7. Fixes: a833a17aeac7 ("bpf: Fix verification of indirect var-off stack access") Reported-by: syzbot+33f4297b5f927648741a@syzkaller.appspotmail.com Reported-by: syzbot+aafd0513053a1cbf52ef@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/CAADnVQLORV5PT0iTAhRER+iLBTkByCYNBYyvBSgjN1T31K+gOw@mail.gmail.com/ Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Link: https://lore.kernel.org/r/20240327024245.318299-3-andreimatei1@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Check bloom filter map value sizeAndrei Matei2024-03-271-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a missing check to bloom filter creating, rejecting values above KMALLOC_MAX_SIZE. This brings the bloom map in line with many other map types. The lack of this protection can cause kernel crashes for value sizes that overflow int's. Such a crash was caught by syzkaller. The next patch adds more guard-rails at a lower level. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240327024245.318299-2-andreimatei1@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: fix warning for crash_kexecHari Bathini2024-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With [1], crash dump specific code is moved out of CONFIG_KEXEC_CORE and placed under CONFIG_CRASH_DUMP, where it is more appropriate. And since CONFIG_KEXEC & !CONFIG_CRASH_DUMP build option is supported with that, it led to the below warning: "WARN: resolve_btfids: unresolved symbol crash_kexec" Fix it by using the appropriate #ifdef. [1] https://lore.kernel.org/all/20240124051254.67105-1-bhe@redhat.com/ Acked-by: Baoquan He <bhe@redhat.com> Fixes: 02aff8480533 ("crash: split crash dumping code out from kexec_core.c") Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Link: https://lore.kernel.org/r/20240319080152.36987-1-hbathini@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * Merge tag 'for-netdev' of ↵Paolo Abeni2024-03-262-10/+37
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-03-25 The following pull-request contains BPF updates for your *net* tree. We've added 17 non-merge commits during the last 12 day(s) which contain a total of 19 files changed, 184 insertions(+), 61 deletions(-). The main changes are: 1) Fix an arm64 BPF JIT bug in BPF_LDX_MEMSX implementation's offset handling found via test_bpf module, from Puranjay Mohan. 2) Various fixups to the BPF arena code in particular in the BPF verifier and around BPF selftests to match latest corresponding LLVM implementation, from Puranjay Mohan and Alexei Starovoitov. 3) Fix xsk to not assume that metadata is always requested in TX completion, from Stanislav Fomichev. 4) Fix riscv BPF JIT's kfunc parameter incompatibility between BPF and the riscv ABI which requires sign-extension on int/uint, from Pu Lehui. 5) Fix s390x BPF JIT's bpf_plt pointer arithmetic which triggered a crash when testing struct_ops, from Ilya Leoshkevich. 6) Fix libbpf's arena mmap handling which had incorrect u64-to-pointer cast on 32-bit architectures, from Andrii Nakryiko. 7) Fix libbpf to define MFD_CLOEXEC when not available, from Arnaldo Carvalho de Melo. 8) Fix arm64 BPF JIT implementation for 32bit unconditional bswap which resulted in an incorrect swap as indicated by test_bpf, from Artem Savkov. 9) Fix BPF man page build script to use silent mode, from Hangbin Liu. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abi bpf: verifier: reject addr_space_cast insn without arena selftests/bpf: verifier_arena: fix mmap address for arm64 bpf: verifier: fix addr_space_cast from as(1) to as(0) libbpf: Define MFD_CLOEXEC if not available arm64: bpf: fix 32bit unconditional bswap bpf, arm64: fix bug in BPF_LDX_MEMSX libbpf: fix u64-to-pointer cast on 32-bit arches s390/bpf: Fix bpf_plt pointer arithmetic xsk: Don't assume metadata is always requested in TX completion selftests/bpf: Add arena test case for 4Gbyte corner case selftests/bpf: Remove hard coded PAGE_SIZE macro. libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM bpf: Clarify bpf_arena comments. MAINTAINERS: Update email address for Quentin Monnet scripts/bpf_doc: Use silent mode when exec make cmd bpf: Temporarily disable atomic operations in BPF arena ==================== Link: https://lore.kernel.org/r/20240325213520.26688-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| | * bpf: verifier: reject addr_space_cast insn without arenaPuranjay Mohan2024-03-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The verifier allows using the addr_space_cast instruction in a program that doesn't have an associated arena. This was caught in the form an invalid memory access in do_misc_fixups() when while converting addr_space_cast to a normal 32-bit mov, env->prog->aux->arena was dereferenced to check for BPF_F_NO_USER_CONV flag. Reject programs that include the addr_space_cast instruction but don't have an associated arena. root@rv-tester:~# ./reproducer Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000030 Oops [#1] [<ffffffff8017eeaa>] do_misc_fixups+0x43c/0x1168 [<ffffffff801936d6>] bpf_check+0xda8/0x22b6 [<ffffffff80174b32>] bpf_prog_load+0x486/0x8dc [<ffffffff80176566>] __sys_bpf+0xbd8/0x214e [<ffffffff80177d14>] __riscv_sys_bpf+0x22/0x2a [<ffffffff80d2493a>] do_trap_ecall_u+0x102/0x17c [<ffffffff80d3048c>] ret_from_exception+0x0/0x64 Fixes: 6082b6c328b5 ("bpf: Recognize addr_space_cast instruction in the verifier.") Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/bpf/CABOYnLz09O1+2gGVJuCxd_24a-7UueXzV-Ff+Fr+h5EKFDiYCQ@mail.gmail.com/ Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240322153518.11555-1-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * bpf: verifier: fix addr_space_cast from as(1) to as(0)Puranjay Mohan2024-03-221-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The verifier currently converts addr_space_cast from as(1) to as(0) that is: BPF_ALU64 | BPF_MOV | BPF_X with off=1 and imm=1 to BPF_ALU | BPF_MOV | BPF_X with imm=1 (32-bit mov) Because of this imm=1, the JITs that have bpf_jit_needs_zext() == true, interpret the converted instruction as BPF_ZEXT_REG(DST) which is a special form of mov32, used for doing explicit zero extension on dst. These JITs will just zero extend the dst reg and will not move the src to dst before the zext. Fix do_misc_fixups() to set imm=0 when converting addr_space_cast to a normal mov32. The JITs that have bpf_jit_needs_zext() == true rely on the verifier to emit zext instructions. Mark dst_reg as subreg when doing cast from as(1) to as(0) so the verifier emits a zext instruction after the mov. Fixes: 6082b6c328b5 ("bpf: Recognize addr_space_cast instruction in the verifier.") Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240321153939.113996-1-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * bpf: Clarify bpf_arena comments.Alexei Starovoitov2024-03-151-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clarify two bpf_arena comments, use existing SZ_4G #define, improve page_cnt check. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20240315021834.62988-2-alexei.starovoitov@gmail.com
| | * bpf: Temporarily disable atomic operations in BPF arenaPuranjay Mohan2024-03-141-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the x86 JIT handling PROBE_MEM32 tagged accesses is not equipped to handle atomic accesses into PTR_TO_ARENA, as no PROBE_MEM32 tagging is performed and no handling is enabled for them. This will lead to unsafety as the offset into arena will dereferenced directly without turning it into a base + offset access into the arena region. Since the changes to the x86 JIT will be fairly involved, for now, temporarily disallow use of PTR_TO_ARENA as the destination operand for atomics until support is added to the JIT backend. Fixes: 2fe99eb0ccf2 ("bpf: Add x86-64 JIT support for PROBE_MEM32 pseudo instructions.") Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Message-ID: <20240314174931.98702-1-puranjay12@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | | Merge tag 'mm-hotfixes-stable-2024-03-27-11-25' of ↵Linus Torvalds2024-03-272-2/+12
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Various hotfixes. About half are cc:stable and the remainder address post-6.8 issues or aren't considered suitable for backporting. zswap figures prominently in the post-6.8 issues - folloup against the large amount of changes we have just made to that code. Apart from that, all over the map" * tag 'mm-hotfixes-stable-2024-03-27-11-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) crash: use macro to add crashk_res into iomem early for specific arch mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices selftests/mm: fix ARM related issue with fork after pthread_create hexagon: vmlinux.lds.S: handle attributes section userfaultfd: fix deadlock warning when locking src and dst VMAs tmpfs: fix race on handling dquot rbtree selftests/mm: sigbus-wp test requires UFFD_FEATURE_WP_HUGETLBFS_SHMEM mm: zswap: fix writeback shinker GFP_NOIO/GFP_NOFS recursion ARM: prctl: reject PR_SET_MDWE on pre-ARMv6 prctl: generalize PR_SET_MDWE support check to be per-arch MAINTAINERS: remove incorrect M: tag for dm-devel@lists.linux.dev mm: zswap: fix kernel BUG in sg_init_one selftests: mm: restore settings from only parent process tools/Makefile: remove cgroup target mm: cachestat: fix two shmem bugs mm: increase folio batch size mm,page_owner: fix recursion mailmap: update entry for Leonard Crestez init: open /initrd.image with O_LARGEFILE selftests/mm: Fix build with _FORTIFY_SOURCE ...
| * | | crash: use macro to add crashk_res into iomem early for specific archBaoquan He2024-03-261-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are regression reports[1][2] that crashkernel region on x86_64 can't be added into iomem tree sometime. This causes the later failure of kdump loading. This happened after commit 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") was merged. Even though, these reported issues are proved to be related to other component, they are just exposed after above commmit applied, I still would like to keep crashk_res and crashk_low_res being added into iomem early as before because the early adding has been always there on x86_64 and working very well. For safety of kdump, Let's change it back. Here, add a macro HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY to limit that only ARCH defining the macro can have the early adding crashk_res/_low_res into iomem. Then define HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY on x86 to enable it. Note: In reserve_crashkernel_low(), there's a remnant of crashk_low_res handling which was mistakenly added back in commit 85fcde402db1 ("kexec: split crashkernel reservation code out from crash_core.c"). [1] [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data https://lore.kernel.org/all/Zfv8iCL6CT2JqLIC@darkstar.users.ipa.redhat.com/T/#u [2] Question about Address Range Validation in Crash Kernel Allocation https://lore.kernel.org/all/4eeac1f733584855965a2ea62fa4da58@huawei.com/T/#u Link: https://lkml.kernel.org/r/ZgDYemRQ2jxjLkq+@MiWiFi-R3L-srv Fixes: 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Li Huafei <lihuafei1@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | prctl: generalize PR_SET_MDWE support check to be per-archZev Weiss2024-03-261-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "ARM: prctl: Reject PR_SET_MDWE where not supported". I noticed after a recent kernel update that my ARM926 system started segfaulting on any execve() after calling prctl(PR_SET_MDWE). After some investigation it appears that ARMv5 is incapable of providing the appropriate protections for MDWE, since any readable memory is also implicitly executable. The prctl_set_mdwe() function already had some special-case logic added disabling it on PARISC (commit 793838138c15, "prctl: Disable prctl(PR_SET_MDWE) on parisc"); this patch series (1) generalizes that check to use an arch_*() function, and (2) adds a corresponding override for ARM to disable MDWE on pre-ARMv6 CPUs. With the series applied, prctl(PR_SET_MDWE) is rejected on ARMv5 and subsequent execve() calls (as well as mmap(PROT_READ|PROT_WRITE)) can succeed instead of unconditionally failing; on ARMv6 the prctl works as it did previously. [0] https://lore.kernel.org/all/2023112456-linked-nape-bf19@gregkh/ This patch (of 2): There exist systems other than PARISC where MDWE may not be feasible to support; rather than cluttering up the generic code with additional arch-specific logic let's add a generic function for checking MDWE support and allow each arch to override it as needed. Link: https://lkml.kernel.org/r/20240227013546.15769-4-zev@bewilderbeest.net Link: https://lkml.kernel.org/r/20240227013546.15769-5-zev@bewilderbeest.net Signed-off-by: Zev Weiss <zev@bewilderbeest.net> Acked-by: Helge Deller <deller@gmx.de> [parisc] Cc: Borislav Petkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: Florent Revest <revest@chromium.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kees Cook <keescook@chromium.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Russell King (Oracle) <linux@armlinux.org.uk> Cc: Sam James <sam@gentoo.org> Cc: Stefan Roesch <shr@devkernel.io> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: <stable@vger.kernel.org> [6.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | | Merge tag 'probes-fixes-v6.9-rc1' of ↵Linus Torvalds2024-03-271-1/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixlet from Masami Hiramatsu: - tracing/probes: initialize a 'val' local variable with zero. This variable is read by FETCH_OP_ST_EDATA in a loop, and is initialized by FETCH_OP_ARG in the same loop. Since this initialization is not obvious, smatch warns about it. Explicitly initializing 'val' with zero fixes this warning. * tag 'probes-fixes-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: probes: Fix to zero initialize a local variable
| * | | | tracing: probes: Fix to zero initialize a local variableMasami Hiramatsu (Google)2024-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix to initialize 'val' local variable with zero. Dan reported that Smatch static code checker reports an error that a local 'val' variable needs to be initialized. Actually, the 'val' is expected to be initialized by FETCH_OP_ARG in the same loop, but it is not obvious. So initialize it with zero. Link: https://lore.kernel.org/all/171092223833.237219.17304490075697026697.stgit@devnote2/ Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/b010488e-68aa-407c-add0-3e059254aaa0@moroto.mountain/ Fixes: 25f00e40ce79 ("tracing/probes: Support $argN in return probe (kprobe and fprobe)") Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
* | | | | Fix memory leak in posix_clock_open()Linus Torvalds2024-03-271-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the clk ops.open() function returns an error, we don't release the pccontext we allocated for this clock. Re-organize the code slightly to make it all more obvious. Reported-by: Rohit Keshri <rkeshri@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Fixes: 60c6946675fc ("posix-clock: introduce posix_clock_context concept") Cc: Jakub Kicinski <kuba@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linuxfoundation.org>
* | | | | Merge tag 'printk-for-6.9-rc2' of ↵Linus Torvalds2024-03-261-0/+6
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Prevent scheduling in an atomic context when printk() takes over the console flushing duty * tag 'printk-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Update @console_may_schedule in console_trylock_spinning()
| * | | | | printk: Update @console_may_schedule in console_trylock_spinning()John Ogness2024-03-151-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | console_trylock_spinning() may takeover the console lock from a schedulable context. Update @console_may_schedule to make sure it reflects a trylock acquire. Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Closes: https://lore.kernel.org/lkml/20240222090538.23017-1-quic_mojha@quicinc.com Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/875xybmo2z.fsf@jogness.linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
* | | | | | Merge tag 'v6.9-p2' of ↵Linus Torvalds2024-03-251-0/+5
|\ \ \ \ \ \ | |_|_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a regression that broke iwd as well as a divide by zero in iaa" * tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: iaa - Fix nr_cpus < nr_iaa case Revert "crypto: pkcs7 - remove sha1 support"
| * | | | | Revert "crypto: pkcs7 - remove sha1 support"Eric Biggers2024-03-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 16ab7cb5825fc3425c16ad2c6e53d827f382d7c6 because it broke iwd. iwd uses the KEYCTL_PKEY_* UAPIs via its dependency libell, and apparently it is relying on SHA-1 signature support. These UAPIs are fairly obscure, and their documentation does not mention which algorithms they support. iwd really should be using a properly supported userspace crypto library instead. Regardless, since something broke we have to revert the change. It may be possible that some parts of this commit can be reinstated without breaking iwd (e.g. probably the removal of MODULE_SIG_SHA1), but for now this just does a full revert to get things working again. Reported-by: Karel Balej <balejk@matfyz.cz> Closes: https://lore.kernel.org/r/CZSHRUIJ4RKL.34T4EASV5DNJM@matfyz.cz Cc: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Karel Balej <balejk@matfyz.cz> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | | | | Merge tag 'dma-mapping-6.9-2024-03-24' of ↵Linus Torvalds2024-03-241-12/+33
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: "This has a set of swiotlb alignment fixes for sometimes very long standing bugs from Will. We've been discussion them for a while and they should be solid now" * tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE iommu/dma: Force swiotlb_max_mapping_size on an untrusted device swiotlb: Fix alignment checks when both allocation and DMA masks are present swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc() swiotlb: Enforce page alignment in swiotlb_alloc() swiotlb: Fix double-allocation of slots due to broken alignment handling
| * | | | | | swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZEWill Deacon2024-03-131-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For swiotlb allocations >= PAGE_SIZE, the slab search historically adjusted the stride to avoid checking unaligned slots. This had the side-effect of aligning large mapping requests to PAGE_SIZE, but that was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks"). Since this alignment could be relied upon drivers, reinstate PAGE_SIZE alignment for swiotlb mappings >= PAGE_SIZE. Reported-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | swiotlb: Fix alignment checks when both allocation and DMA masks are presentWill Deacon2024-03-131-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nicolin reports that swiotlb buffer allocations fail for an NVME device behind an IOMMU using 64KiB pages. This is because we end up with a minimum allocation alignment of 64KiB (for the IOMMU to map the buffer safely) but a minimum DMA alignment mask corresponding to a 4KiB NVME page (i.e. preserving the 4KiB page offset from the original allocation). If the original address is not 4KiB-aligned, the allocation will fail because swiotlb_search_pool_area() erroneously compares these unmasked bits with the 64KiB-aligned candidate allocation. Tweak swiotlb_search_pool_area() so that the DMA alignment mask is reduced based on the required alignment of the allocation. Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Link: https://lore.kernel.org/r/cover.1707851466.git.nicolinc@nvidia.com Reported-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()Will Deacon2024-03-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | core-api/dma-api-howto.rst states the following properties of dma_alloc_coherent(): | The CPU virtual address and the DMA address are both guaranteed to | be aligned to the smallest PAGE_SIZE order which is greater than or | equal to the requested size. However, swiotlb_alloc() passes zero for the 'alloc_align_mask' parameter of swiotlb_find_slots() and so this property is not upheld. Instead, allocations larger than a page are aligned to PAGE_SIZE, Calculate the mask corresponding to the page order suitable for holding the allocation and pass that to swiotlb_find_slots(). Fixes: e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | swiotlb: Enforce page alignment in swiotlb_alloc()Will Deacon2024-03-131-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When allocating pages from a restricted DMA pool in swiotlb_alloc(), the buffer address is blindly converted to a 'struct page *' that is returned to the caller. In the unlikely event of an allocation bug, page-unaligned addresses are not detected and slots can silently be double-allocated. Add a simple check of the buffer alignment in swiotlb_alloc() to make debugging a little easier if something has gone wonky. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | swiotlb: Fix double-allocation of slots due to broken alignment handlingWill Deacon2024-03-131-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"), which was a fix for commit 0eee5ae10256 ("swiotlb: fix slot alignment checks"), causes a functional regression with vsock in a virtual machine using bouncing via a restricted DMA SWIOTLB pool. When virtio allocates the virtqueues for the vsock device using dma_alloc_coherent(), the SWIOTLB search can return page-unaligned allocations if 'area->index' was left unaligned by a previous allocation from the buffer: # Final address in brackets is the SWIOTLB address returned to the caller | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1645-1649/7168 (0x98326800) | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1649-1653/7168 (0x98328800) | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1653-1657/7168 (0x9832a800) This ends badly (typically buffer corruption and/or a hang) because swiotlb_alloc() is expecting a page-aligned allocation and so blindly returns a pointer to the 'struct page' corresponding to the allocation, therefore double-allocating the first half (2KiB slot) of the 4KiB page. Fix the problem by treating the allocation alignment separately to any additional alignment requirements from the device, using the maximum of the two as the stride to search the buffer slots and taking care to ensure a minimum of page-alignment for buffers larger than a page. This also resolves swiotlb allocation failures occuring due to the inclusion of ~PAGE_MASK in 'iotlb_align_mask' for large allocations and resulting in alignment requirements exceeding swiotlb_max_mapping_size(). Fixes: bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix") Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | | | | | Merge tag 'timers-urgent-2024-03-23' of ↵Linus Torvalds2024-03-232-3/+20
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two regression fixes for the timer and timer migration code: - Prevent endless timer requeuing which is caused by two CPUs racing out of idle. This happens when the last CPU goes idle and therefore has to ensure to expire the pending global timers and some other CPU come out of idle at the same time and the other CPU wins the race and expires the global queue. This causes the last CPU to chase ghost timers forever and reprogramming it's clockevent device endlessly. Cure this by re-evaluating the wakeup time unconditionally. - The split into local (pinned) and global timers in the timer wheel caused a regression for NOHZ full as it broke the idle tracking of global timers. On NOHZ full this prevents an self IPI being sent which in turn causes the timer to be not programmed and not being expired on time. Restore the idle tracking for the global timer base so that the self IPI condition for NOHZ full is working correctly again" * tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Fix removed self-IPI on global timer's enqueue in nohz_full timers/migration: Fix endless timer requeue after idle interrupts
| * | | | | | | timers: Fix removed self-IPI on global timer's enqueue in nohz_fullFrederic Weisbecker2024-03-191-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While running in nohz_full mode, a task may enqueue a timer while the tick is stopped. However the only places where the timer wheel, alongside the timer migration machinery's decision, may reprogram the next event accordingly with that new timer's expiry are the idle loop or any IRQ tail. However neither the idle task nor an interrupt may run on the CPU if it resumes busy work in userspace for a long while in full dynticks mode. To solve this, the timer enqueue path raises a self-IPI that will re-evaluate the timer wheel on its IRQ tail. This asynchronous solution avoids potential locking inversion. This is supposed to happen both for local and global timers but commit: b2cf7507e186 ("timers: Always queue timers on the local CPU") broke the global timers case with removing the ->is_idle field handling for the global base. As a result, global timers enqueue may go unnoticed in nohz_full. Fix this with restoring the idle tracking of the global timer's base, allowing self-IPIs again on enqueue time. Fixes: b2cf7507e186 ("timers: Always queue timers on the local CPU") Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240318230729.15497-3-frederic@kernel.org
| * | | | | | | timers/migration: Fix endless timer requeue after idle interruptsFrederic Weisbecker2024-03-191-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a CPU is an idle migrator, but another CPU wakes up before it, becomes an active migrator and handles the queue, the initial idle migrator may end up endlessly reprogramming its clockevent, chasing ghost timers forever such as in the following scenario: [GRP0:0] migrator = 0 active = 0 nextevt = T1 / \ 0 1 active idle (T1) 0) CPU 1 is idle and has a timer queued (T1), CPU 0 is active and is the active migrator. [GRP0:0] migrator = NONE active = NONE nextevt = T1 / \ 0 1 idle idle (T1) wakeup = T1 1) CPU 0 is now idle and is therefore the idle migrator. It has programmed its next timer interrupt to handle T1. [GRP0:0] migrator = 1 active = 1 nextevt = KTIME_MAX / \ 0 1 idle active wakeup = T1 2) CPU 1 has woken up, it is now active and it has just handled its own timer T1. 3) CPU 0 gets a timer interrupt to handle T1 but tmigr_handle_remote() realize it is not the migrator anymore. So it early returns without observing that T1 has been expired already and therefore without updating its ->wakeup value. 4) CPU 0 goes into tmigr_cpu_new_timer() which also early returns because it doesn't queue a timer of its own. So ->wakeup is left unchanged and the next timer is programmed to fire now. 5) goto 3) forever This results in timer interrupt storms in idle and also in nohz_full (as observed in rcutorture's TREE07 scenario). Fix this with forcing a re-evaluation of tmc->wakeup while trying remote timer handling when the CPU isn't the migrator anymmore. The check is inherently racy but in the worst case the CPU just races setting the KTIME_MAX value that a remote expiry also tries to set. Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model") Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240318230729.15497-2-frederic@kernel.org
* | | | | | | | Merge tag 'core-entry-2024-03-23' of ↵Linus Torvalds2024-03-231-1/+7
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core entry fix from Thomas Gleixner: "A single fix for the generic entry code: The trace_sys_enter() tracepoint can modify the syscall number via kprobes or BPF in pt_regs, but that requires that the syscall number is re-evaluted from pt_regs after the tracepoint. A seccomp fix in that area removed the re-evaluation so the change does not take effect as the code just uses the locally cached number. Restore the original behaviour by re-evaluating the syscall number after the tracepoint" * tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Respect changes to system call number by trace_sys_enter()
| * | | | | | | | entry: Respect changes to system call number by trace_sys_enter()André Rösti2024-03-121-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a probe is registered at the trace_sys_enter() tracepoint, and that probe changes the system call number, the old system call still gets executed. This worked correctly until commit b6ec41346103 ("core/entry: Report syscall correctly for trace and audit"), which removed the re-evaluation of the syscall number after the trace point. Restore the original semantics by re-evaluating the system call number after trace_sys_enter(). The performance impact of this re-evaluation is minimal because it only takes place when a trace point is active, and compared to the actual trace point overhead the read from a cache hot variable is negligible. Fixes: b6ec41346103 ("core/entry: Report syscall correctly for trace and audit") Signed-off-by: André Rösti <an.roesti@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240311211704.7262-1-an.roesti@gmail.com
* | | | | | | | | Merge tag 'riscv-for-linus-6.9-mw2' of ↵Linus Torvalds2024-03-222-7/+22
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various vector-accelerated crypto routines - Hibernation is now enabled for portable kernel builds - mmap_rnd_bits_max is larger on systems with larger VAs - Support for fast GUP - Support for membarrier-based instruction cache synchronization - Support for the Andes hart-level interrupt controller and PMU - Some cleanups around unaligned access speed probing and Kconfig settings - Support for ACPI LPI and CPPC - Various cleanus related to barriers - A handful of fixes * tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits) riscv: Fix syscall wrapper for >word-size arguments crypto: riscv - add vector crypto accelerated AES-CBC-CTS crypto: riscv - parallelize AES-CBC decryption riscv: Only flush the mm icache when setting an exec pte riscv: Use kcalloc() instead of kzalloc() riscv/barrier: Add missing space after ',' riscv/barrier: Consolidate fence definitions riscv/barrier: Define RISCV_FULL_BARRIER riscv/barrier: Define __{mb,rmb,wmb} RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ cpufreq: Move CPPC configs to common Kconfig and add RISC-V ACPI: RISC-V: Add CPPC driver ACPI: Enable ACPI_PROCESSOR for RISC-V ACPI: RISC-V: Add LPI driver cpuidle: RISC-V: Move few functions to arch/riscv riscv: Introduce set_compat_task() in asm/compat.h riscv: Introduce is_compat_thread() into compat.h riscv: add compile-time test into is_compat_task() riscv: Replace direct thread flag check with is_compat_task() riscv: Improve arch_get_mmap_end() macro ...
| * | | | | | | | | membarrier: riscv: Provide core serializing commandAndrea Parri2024-02-152-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RISC-V uses xRET instructions on return from interrupt and to go back to user-space; the xRET instruction is not core serializing. Use FENCE.I for providing core serialization as follows: - by calling sync_core_before_usermode() on return from interrupt (cf. ipi_sync_core()), - via switch_mm() and sync_core_before_usermode() (respectively, for uthread->uthread and kthread->uthread transitions) before returning to user-space. On RISC-V, the serialization in switch_mm() is activated by resetting the icache_stale_mask of the mm at prepare_sync_core_cmd(). Suggested-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-5-parri.andrea@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | locking: Introduce prepare_sync_core_cmd()Andrea Parri2024-02-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce an architecture function that architectures can use to set up ("prepare") SYNC_CORE commands. The function will be used by RISC-V to update its "deferred icache- flush" data structures (icache_stale_mask). Architectures defining prepare_sync_core_cmd() static inline need to select ARCH_HAS_PREPARE_SYNC_CORE_CMD. Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-4-parri.andrea@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | membarrier: Create Documentation/scheduler/membarrier.rstAndrea Parri2024-02-152-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To gather the architecture requirements of the "private/global expedited" membarrier commands. The file will be expanded to integrate further information about the membarrier syscall (as needed/desired in the future). While at it, amend some related inline comments in the membarrier codebase. Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-3-parri.andrea@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | membarrier: riscv: Add full memory barrier in switch_mm()Andrea Parri2024-02-151-2/+3
| | |_|_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The membarrier system call requires a full memory barrier after storing to rq->curr, before going back to user-space. The barrier is only needed when switching between processes: the barrier is implied by mmdrop() when switching from kernel to userspace, and it's not needed when switching from userspace to kernel. Rely on the feature/mechanism ARCH_HAS_MEMBARRIER_CALLBACKS and on the primitive membarrier_arch_switch_mm(), already adopted by the PowerPC architecture, to insert the required barrier. Fixes: fab957c11efe2f ("RISC-V: Atomic and Locking Code") Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-2-parri.andrea@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* | | | | | | | | Merge tag 'rtc-6.9' of ↵Linus Torvalds2024-03-212-2/+2
|\ \ \ \ \ \ \ \ \ | |_|_|_|_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "Subsytem: - rtc_class is now const Drivers: - ds1511: cleanup, set date and time range and alarm offset limit - max31335: fix interrupt handler - pcf8523: improve suspend support" * tag 'rtc-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (28 commits) MAINTAINER: Include linux-arm-msm for Qualcomm RTC patches dt-bindings: rtc: zynqmp: Add support for Versal/Versal NET SoCs rtc: class: make rtc_class constant dt-bindings: rtc: abx80x: Improve checks on trickle charger constraints MAINTAINERS: adjust file entry in ARM/Mediatek RTC DRIVER rtc: nct3018y: fix possible NULL dereference rtc: max31335: fix interrupt status reg rtc: mt6397: select IRQ_DOMAIN instead of depending on it dt-bindings: rtc: abx80x: convert to yaml rtc: m41t80: Use the unified property API get the wakeup-source property dt-bindings: at91rm9260-rtt: add sam9x7 compatible dt-bindings: rtc: convert MT7622 RTC to the json-schema dt-bindings: rtc: convert MT2717 RTC to the json-schema rtc: pcf8523: add suspend handlers for alarm IRQ rtc: ds1511: set alarm offset limit rtc: ds1511: set range rtc: ds1511: drop inline/noinline hints rtc: ds1511: rename pdata rtc: ds1511: implement ds1511_rtc_read_alarm properly rtc: ds1511: remove partial alarm support ...
| * | | | | | | | rtc: class: make rtc_class constantRicardo B. Marliere2024-03-082-2/+2
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the rtc_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240305-class_cleanup-abelloni-v1-1-944c026137c8@marliere.net Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
* | | | | | | | Merge tag 'net-6.9-rc1' of ↵Linus Torvalds2024-03-211-0/+3
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from CAN, netfilter, wireguard and IPsec. I'd like to highlight [ lowlight? - Linus ] Florian W stepping down as a netfilter maintainer due to constant stream of bug reports. Not sure what we can do but IIUC this is not the first such case. Current release - regressions: - rxrpc: fix use of page_frag_alloc_align(), it changed semantics and we added a new caller in a different subtree - xfrm: allow UDP encapsulation only in offload modes Current release - new code bugs: - tcp: fix refcnt handling in __inet_hash_connect() - Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets", conflicted with some expectations in BPF uAPI Previous releases - regressions: - ipv4: raw: fix sending packets from raw sockets via IPsec tunnels - devlink: fix devlink's parallel command processing - veth: do not manipulate GRO when using XDP - esp: fix bad handling of pages from page_pool Previous releases - always broken: - report RCU QS for busy network kthreads (with Paul McK's blessing) - tcp/rds: fix use-after-free on netns with kernel TCP reqsk - virt: vmxnet3: fix missing reserved tailroom with XDP Misc: - couple of build fixes for Documentation" * tag 'net-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits) selftests: forwarding: Fix ping failure due to short timeout MAINTAINERS: step down as netfilter maintainer netfilter: nf_tables: Fix a memory leak in nf_tables_updchain net: dsa: mt7530: fix handling of all link-local frames net: dsa: mt7530: fix link-local frames that ingress vlan filtering ports bpf: report RCU QS in cpumap kthread net: report RCU QS on threaded NAPI repolling rcu: add a helper to report consolidated flavor QS ionic: update documentation for XDP support lib/bitmap: Fix bitmap_scatter() and bitmap_gather() kernel doc netfilter: nf_tables: do not compare internal table flags on updates netfilter: nft_set_pipapo: release elements in clone only from destroy path octeontx2-af: Use separate handlers for interrupts octeontx2-pf: Send UP messages to VF only when VF is up. octeontx2-pf: Use default max_active works instead of one octeontx2-pf: Wait till detach_resources msg is complete octeontx2: Detect the mbox up or down message via register devlink: fix port new reply cmd type tcp: Clear req->syncookie in reqsk_alloc(). net/bnx2x: Prevent access to a freed page in page_pool ...
| * | | | | | | | bpf: report RCU QS in cpumap kthreadYan Zhai2024-03-201-0/+3
| | |_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there are heavy load, cpumap kernel threads can be busy polling packets from redirect queues and block out RCU tasks from reaching quiescent states. It is insufficient to just call cond_resched() in such context. Periodically raise a consolidated RCU QS before cond_resched fixes the problem. Fixes: 6710e1126934 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP") Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/c17b9f1517e19d813da3ede5ed33ee18496bb5d8.1710877680.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | | | | | | Merge tag 'kbuild-v6.9' of ↵Linus Torvalds2024-03-211-2/+1
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Generate a list of built DTB files (arch/*/boot/dts/dtbs-list) - Use more threads when building Debian packages in parallel - Fix warnings shown during the RPM kernel package uninstallation - Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to Makefile - Support GCC's -fmin-function-alignment flag - Fix a null pointer dereference bug in modpost - Add the DTB support to the RPM package - Various fixes and cleanups in Kconfig * tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (67 commits) kconfig: tests: test dependency after shuffling choices kconfig: tests: add a test for randconfig with dependent choices kconfig: tests: support KCONFIG_SEED for the randconfig runner kbuild: rpm-pkg: add dtb files in kernel rpm kconfig: remove unneeded menu_is_visible() call in conf_write_defconfig() kconfig: check prompt for choice while parsing kconfig: lxdialog: remove unused dialog colors kconfig: lxdialog: fix button color for blackbg theme modpost: fix null pointer dereference kbuild: remove GCC's default -Wpacked-bitfield-compat flag kbuild: unexport abs_srctree and abs_objtree kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1 kconfig: remove named choice support kconfig: use linked list in get_symbol_str() to iterate over menus kconfig: link menus to a symbol kbuild: fix inconsistent indentation in top Makefile kbuild: Use -fmin-function-alignment when available alpha: merge two entries for CONFIG_ALPHA_GAMMA alpha: merge two entries for CONFIG_ALPHA_EV4 kbuild: change DTC_FLAGS_<basetarget>.o to take the path relative to $(obj) ...
| * | | | | | | | kbuild: remove EXPERT and !COMPILE_TEST guarding from TRIM_UNUSED_KSYMSMasahiro Yamada2024-02-201-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts the following two commits: - a555bdd0c58c ("Kbuild: enable TRIM_UNUSED_KSYMS again, with some guarding") - 5cf0fd591f2e ("Kbuild: disable TRIM_UNUSED_KSYMS option") Commit 5e9e95cc9148 ("kbuild: implement CONFIG_TRIM_UNUSED_KSYMS without recursion") solved the build time issue. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* | | | | | | | | Merge tag 'driver-core-6.9-rc1' of ↵Linus Torvalds2024-03-212-2/+2
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core and kernfs changes for 6.9-rc1. Nothing all that crazy here, just some good updates that include: - automatic attribute group hiding from Dan Williams (he fixed up my horrible attempt at doing this.) - kobject lock contention fixes from Eric Dumazet - driver core cleanups from Andy - kernfs rcu work from Tejun - fw_devlink changes to resolve some reported issues - other minor changes, all details in the shortlog All of these have been in linux-next for a long time with no reported issues" * tag 'driver-core-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (28 commits) device: core: Log warning for devices pending deferred probe on timeout driver: core: Use dev_* instead of pr_* so device metadata is added driver: core: Log probe failure as error and with device metadata of: property: fw_devlink: Add support for "post-init-providers" property driver core: Add FWLINK_FLAG_IGNORE to completely ignore a fwnode link driver core: Adds flags param to fwnode_link_add() debugfs: fix wait/cancellation handling during remove device property: Don't use "proxy" headers device property: Move enum dev_dma_attr to fwnode.h driver core: Move fw_devlink stuff to where it belongs driver core: Drop unneeded 'extern' keyword in fwnode.h firmware_loader: Suppress warning on FW_OPT_NO_WARN flag sysfs:Addresses documentation in sysfs_merge_group and sysfs_unmerge_group. firmware_loader: introduce __free() cleanup hanler platform-msi: Remove usage of the deprecated ida_simple_xx() API sysfs: Introduce DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE() sysfs: Document new "group visible" helpers sysfs: Fix crash on empty group attributes array sysfs: Introduce a mechanism to hide static attribute_groups sysfs: Introduce a mechanism to hide static attribute_groups ...
| * | | | | | | | | Merge 6.8-rc5 into driver-core-nextGreg Kroah-Hartman2024-02-1915-97/+126
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need the driver core changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | | | | | | kobject: make uevent_seqnum atomicEric Dumazet2024-02-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will soon no longer acquire uevent_sock_mutex for most kobject_uevent_net_broadcast() calls, and also while calling uevent_net_broadcast(). Make uevent_seqnum an atomic64_t to get its own protection. This fixes a race while reading /sys/kernel/uevent_seqnum. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christian Brauner <brauner@kernel.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240214084829.684541-2-edumazet@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | | | | | | workqueue: make wq_subsys constRicardo B. Marliere2024-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the driver core can properly handle constant struct bus_type, move the wq_subsys variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net> Cc: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240206-bus_cleanup-workqueue-v1-1-72b10d282d58@marliere.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | | | | | | | Merge tag 'tty-6.9-rc1' of ↵Linus Torvalds2024-03-211-3/+18
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver updates from Greg KH: "Here is the big set of TTY/Serial driver updates and cleanups for 6.9-rc1. Included in here are: - more tty cleanups from Jiri - loads of 8250 driver cleanups from Andy - max310x driver updates - samsung serial driver updates - uart_prepare_sysrq_char() updates for many drivers - platform driver remove callback void cleanups - stm32 driver updates - other small tty/serial driver updates All of these have been in linux-next for a long time with no reported issues" * tag 'tty-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits) dt-bindings: serial: stm32: add power-domains property serial: 8250_dw: Replace ACPI device check by a quirk serial: Lock console when calling into driver before registration serial: 8250_uniphier: Switch to use uart_read_port_properties() serial: 8250_tegra: Switch to use uart_read_port_properties() serial: 8250_pxa: Switch to use uart_read_port_properties() serial: 8250_omap: Switch to use uart_read_port_properties() serial: 8250_of: Switch to use uart_read_port_properties() serial: 8250_lpc18xx: Switch to use uart_read_port_properties() serial: 8250_ingenic: Switch to use uart_read_port_properties() serial: 8250_dw: Switch to use uart_read_port_properties() serial: 8250_bcm7271: Switch to use uart_read_port_properties() serial: 8250_bcm2835aux: Switch to use uart_read_port_properties() serial: 8250_aspeed_vuart: Switch to use uart_read_port_properties() serial: port: Introduce a common helper to read properties serial: core: Add UPIO_UNKNOWN constant for unknown port type serial: core: Move struct uart_port::quirks closer to possible values serial: sh-sci: Call sci_serial_{in,out}() directly serial: core: only stop transmit when HW fifo is empty serial: pch: Use uart_prepare_sysrq_char(). ...
| * | | | | | | | | | serial: Lock console when calling into driver before registrationPeter Collingbourne2024-03-051-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the handoff from earlycon to the real console driver, we have two separate drivers operating on the same device concurrently. In the case of the 8250 driver these concurrent accesses cause problems due to the driver's use of banked registers, controlled by LCR.DLAB. It is possible for the setup(), config_port(), pm() and set_mctrl() callbacks to set DLAB, which can cause the earlycon code that intends to access TX to instead access DLL, leading to missed output and corruption on the serial line due to unintended modifications to the baud rate. In particular, for setup() we have: univ8250_console_setup() -> serial8250_console_setup() -> uart_set_options() -> serial8250_set_termios() -> serial8250_do_set_termios() -> serial8250_do_set_divisor() For config_port() we have: serial8250_config_port() -> autoconfig() For pm() we have: serial8250_pm() -> serial8250_do_pm() -> serial8250_set_sleep() For set_mctrl() we have (for some devices): serial8250_set_mctrl() -> omap8250_set_mctrl() -> __omap8250_set_mctrl() To avoid such problems, let's make it so that the console is locked during pre-registration calls to these callbacks, which will prevent the earlycon driver from running concurrently. Remove the partial solution to this problem in the 8250 driver that locked the console only during autoconfig_irq(), as this would result in a deadlock with the new approach. The console continues to be locked during autoconfig_irq() because it can only be called through uart_configure_port(). Although this patch introduces more locking than strictly necessary (and in particular it also locks during the call to rs485_config() which is not affected by this issue as far as I can tell), it follows the principle that it is the responsibility of the generic console code to manage the earlycon handoff by ensuring that earlycon and real console driver code cannot run concurrently, and not the individual drivers. Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://linux-review.googlesource.com/id/I7cf8124dcebf8618e6b2ee543fa5b25532de55d8 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240304214350.501253-1-pcc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>