summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* tracing: Have trace_event_file have ref countersSteven Rostedt (Google)2023-11-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream. The following can crash the kernel: # cd /sys/kernel/tracing # echo 'p:sched schedule' > kprobe_events # exec 5>>events/kprobes/sched/enable # > kprobe_events # exec 5>&- The above commands: 1. Change directory to the tracefs directory 2. Create a kprobe event (doesn't matter what one) 3. Open bash file descriptor 5 on the enable file of the kprobe event 4. Delete the kprobe event (removes the files too) 5. Close the bash file descriptor 5 The above causes a crash! BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:tracing_release_file_tr+0xc/0x50 What happens here is that the kprobe event creates a trace_event_file "file" descriptor that represents the file in tracefs to the event. It maintains state of the event (is it enabled for the given instance?). Opening the "enable" file gets a reference to the event "file" descriptor via the open file descriptor. When the kprobe event is deleted, the file is also deleted from the tracefs system which also frees the event "file" descriptor. But as the tracefs file is still opened by user space, it will not be totally removed until the final dput() is called on it. But this is not true with the event "file" descriptor that is already freed. If the user does a write to or simply closes the file descriptor it will reference the event "file" descriptor that was just freed, causing a use-after-free bug. To solve this, add a ref count to the event "file" descriptor as well as a new flag called "FREED". The "file" will not be freed until the last reference is released. But the FREE flag will be set when the event is removed to prevent any more modifications to that event from happening, even if there's still a reference to the event "file" descriptor. Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files") Reported-by: Beau Belgrave <beaub@linux.microsoft.com> Tested-by: Beau Belgrave <beaub@linux.microsoft.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: ethtool: Fix documentation of ethtool_sprintf()Andrew Lunn2023-11-281-2/+2
| | | | | | | | | | | | | | | | | commit f55d8e60f10909dbc5524e261041e1d28d7d20d8 upstream. This function takes a pointer to a pointer, unlike sprintf() which is passed a plain pointer. Fix up the documentation to make this clear. Fixes: 7888fe53b706 ("ethtool: Add common function for filling out strings") Cc: Alexander Duyck <alexanderduyck@fb.com> Cc: Justin Stitt <justinstitt@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20231028192511.100001-1-andrew@lunn.ch Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lsm: fix default return value for inode_getsecctxOndrej Mosnacek2023-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | commit b36995b8609a5a8fe5cf259a1ee768fcaed919f8 upstream. -EOPNOTSUPP is the return value that implements a "no-op" hook, not 0. Without this fix having only the BPF LSM enabled (with no programs attached) can cause uninitialized variable reads in nfsd4_encode_fattr(), because the BPF hook returns 0 without touching the 'ctxlen' variable and the corresponding 'contextlen' variable in nfsd4_encode_fattr() remains uninitialized, yet being treated as valid based on the 0 return value. Cc: stable@vger.kernel.org Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lsm: fix default return value for vm_enough_memoryOndrej Mosnacek2023-11-281-1/+1
| | | | | | | | | | | | commit 866d648059d5faf53f1cd960b43fe8365ad93ea7 upstream. 1 is the return value that implements a "no-op" hook, not 0. Cc: stable@vger.kernel.org Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fs: add ctime accessors infrastructureJeff Layton2023-11-281-1/+44
| | | | | | | | | | | | | | | | | | | | commit 9b6304c1d53745c300b86f202d0dcff395e2d2db upstream. struct timespec64 has unused bits in the tv_nsec field that can be used for other purposes. In future patches, we're going to change how the inode->i_ctime is accessed in certain inodes in order to make use of them. In order to do that safely though, we'll need to eradicate raw accesses of the inode->i_ctime field from the kernel. Add new accessor functions for the ctime that we use to replace them. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Message-Id: <20230705185812.579118-2-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* torture: Make torture_hrtimeout_ns() take an hrtimer mode parameterPaul E. McKenney2023-11-281-1/+2
| | | | | | | | | | | | | | | | [ Upstream commit a741deac787f0d2d7068638c067db20af9e63752 ] The current torture-test sleeps are waiting for a duration, but there are situations where it is better to wait for an absolute time, for example, when ending a stutter interval. This commit therefore adds an hrtimer mode parameter to torture_hrtimeout_ns(). Why not also the other torture_hrtimeout_*() functions? The theory is that most absolute times will be in nanoseconds, especially not (say) jiffies. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Stable-dep-of: cca42bd8eb1b ("rcutorture: Fix stuttering races and other issues") Signed-off-by: Sasha Levin <sashal@kernel.org>
* torture: Add a kthread-creation callback to _torture_create_kthread()Paul E. McKenney2023-11-281-2/+5
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 67d5404d274376890d6d095a10e6565854918f8e ] This commit adds a kthread-creation callback to the _torture_create_kthread() function, which allows callers of a new torture_create_kthread_cb() macro to specify a function to be invoked after the kthread is created but before it is awakened for the first time. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: John Stultz <jstultz@google.com> Stable-dep-of: cca42bd8eb1b ("rcutorture: Fix stuttering races and other issues") Signed-off-by: Sasha Levin <sashal@kernel.org>
* mm: make PR_MDWE_REFUSE_EXEC_GAIN an unsigned longFlorent Revest2023-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0da668333fb07805c2836d5d50e26eda915b24a1 upstream. Defining a prctl flag as an int is a footgun because on a 64 bit machine and with a variadic implementation of prctl (like in musl and glibc), when used directly as a prctl argument, it can get casted to long with garbage upper bits which would result in unexpected behaviors. This patch changes the constant to an unsigned long to eliminate that possibilities. This does not break UAPI. I think that a stable backport would be "nice to have": to reduce the chances that users build binaries that could end up with garbage bits in their MDWE prctl arguments. We are not aware of anyone having yet encountered this corner case with MDWE prctls but a backport would reduce the likelihood it happens, since this sort of issues has happened with other prctls. But If this is perceived as a backporting burden, I suppose we could also live without a stable backport. Link: https://lkml.kernel.org/r/20230828150858.393570-5-revest@chromium.org Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl") Signed-off-by: Florent Revest <revest@chromium.org> Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ayush Jain <ayush.jain3@amd.com> Cc: Greg Thelen <gthelen@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: Topi Miettinen <toiwoton@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ASoC: soc-dai: add flag to mute and unmute stream during triggerSrinivas Kandagatla2023-11-281-0/+1
| | | | | | | | | | | | | | | | | commit f0220575e65abe09c09cd17826a3cdea76e8d58f upstream. In some setups like Speaker amps which are very sensitive, ex: keeping them unmute without actual data stream for very short duration results in a static charge and results in pop and clicks. To minimize this, provide a way to mute and unmute such codecs during trigger callbacks. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20231027105747.32450-2-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org> [ johan: backport to v6.6.2 ] Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mmc: Add quirk MMC_QUIRK_BROKEN_CACHE_FLUSH for Micron eMMC Q2J54ABean Huo2023-11-281-0/+2
| | | | | | | | | | | | | | | commit ed9009ad300c0f15a3ecfe9613547b1962bde02c upstream. Micron MTFC4GACAJCN eMMC supports cache but requires that flush cache operation be allowed only after a write has occurred. Otherwise, the cache flush command or subsequent commands will time out. Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Rafael Beims <rafael.beims@toradex.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231030224809.59245-1-beanhuo@iokpp.de Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm/damon: implement a function for max nr_accesses safe calculationSeongJae Park2023-11-281-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 35f5d94187a6a3a8df2cba54beccca1c2379edb8 upstream. Patch series "avoid divide-by-zero due to max_nr_accesses overflow". The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Avoid the divide-by-zero by implementing a function that handles the corner case (first patch), and replaces the vulnerable direct max nr_accesses calculations (remaining patches). Note that the patches for the replacements are divided for broken commits, to make backporting on required tres easier. Especially, the last patch is for a patch that not yet merged into the mainline but in mm tree. This patch (of 4): The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Implement a function that handles the corner case. Note that this commit is not fixing the real issue since this is only introducing the safe function that will replaces the problematic divisions. The replacements will be made by followup commits, to make backporting on stable series easier. Link: https://lkml.kernel.org/r/20231019194924.100347-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231019194924.100347-2-sj@kernel.org Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> [5.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fbdev: stifb: Make the STI next font pointer a 32-bit signed offsetHelge Deller2023-11-281-1/+1
| | | | | | | | | | | | | | | commit 8a32aa17c1cd48df1ddaa78e45abcb8c7a2220d6 upstream. The pointer to the next STI font is actually a signed 32-bit offset. With this change the 64-bit kernel will correctly subract the (signed 32-bit) offset instead of adding a (unsigned 32-bit) offset. It has no effect on 32-bit kernels. This fixes the stifb driver with a 64-bit kernel on qemu. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* proc: sysctl: prevent aliased sysctls from getting passed to initKrister Johansen2023-11-281-0/+6
| | | | | | | | | | | | | | | | | | commit 8001f49394e353f035306a45bcf504f06fca6355 upstream. The code that checks for unknown boot options is unaware of the sysctl alias facility, which maps bootparams to sysctl values. If a user sets an old value that has a valid alias, a message about an invalid parameter will be printed during boot, and the parameter will get passed to init. Fix by checking for the existence of aliased parameters in the unknown boot parameter code. If an alias exists, don't return an error or pass the value to init. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Cc: stable@vger.kernel.org Fixes: 0a477e1ae21b ("kernel/sysctl: support handling command line aliases") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/apic/msi: Fix misconfigured non-maskable MSI quirkKoichiro Den2023-11-282-28/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b56ebe7c896dc78b5865ec2c4b1dae3c93537517 upstream. commit ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted"), reworked the code so that the x86 specific quirk for affinity setting of non-maskable PCI/MSI interrupts is not longer activated if necessary. This could be solved by restoring the original logic in the core MSI code, but after a deeper analysis it turned out that the quirk flag is not required at all. The quirk is only required when the PCI/MSI device cannot mask the MSI interrupts, which in turn also prevents reservation mode from being enabled for the affected interrupt. This allows ot remove the NOMASK quirk bit completely as msi_set_affinity() can instead check whether reservation mode is enabled for the interrupt, which gives exactly the same answer. Even in the momentary non-existing case that the reservation mode would be not set for a maskable MSI interrupt this would not cause any harm as it just would cause msi_set_affinity() to go needlessly through the functionaly equivalent slow path, which works perfectly fine with maskable interrupts as well. Rework msi_set_affinity() to query the reservation mode and remove all NOMASK quirk logic from the core code. [ tglx: Massaged changelog ] Fixes: ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231026032036.2462428-1-den@valinux.co.jp Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* perf/core: Fix cpuctx refcountingPeter Zijlstra2023-11-281-5/+8
| | | | | | | | | | | | | | commit 889c58b3155ff4c8e8671c95daef63d6fabbb6b1 upstream. Audit of the refcounting turned up that perf_pmu_migrate_context() fails to migrate the ctx refcount. Fixes: bd2756811766 ("perf: Rewrite core context handling") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20230612093539.085862001@infradead.org Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: sched: do not offload flows with a helper in act_ctXin Long2023-11-281-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7cd5af0e937a197295f3aa3721031f0fbae49cff ] There is no hardware supporting ct helper offload. However, prior to this patch, a flower filter with a helper in the ct action can be successfully set into the HW, for example (eth1 is a bnxt NIC): # tc qdisc add dev eth1 ingress_block 22 ingress # tc filter add block 22 proto ip flower skip_sw ip_proto tcp \ dst_port 21 ct_state -trk action ct helper ipv4-tcp-ftp # tc filter show dev eth1 ingress filter block 22 protocol ip pref 49152 flower chain 0 handle 0x1 eth_type ipv4 ip_proto tcp dst_port 21 ct_state -trk skip_sw in_hw in_hw_count 1 <---- action order 1: ct zone 0 helper ipv4-tcp-ftp pipe index 2 ref 1 bind 1 used_hw_stats delayed This might cause the flower filter not to work as expected in the HW. This patch avoids this problem by simply returning -EOPNOTSUPP in tcf_ct_offload_act_setup() to not allow to offload flows with a helper in act_ct. Fixes: a21b06e73191 ("net: sched: add helper support in act_ct") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/f8685ec7702c4a448a1371a8b34b43217b583b9d.1699898008.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()Dan Carpenter2023-11-281-2/+2
| | | | | | | | | | | | | | | | | | | | [ Upstream commit c301f0981fdd3fd1ffac6836b423c4d7a8e0eb63 ] The problem is in nft_byteorder_eval() where we are iterating through a loop and writing to dst[0], dst[1], dst[2] and so on... On each iteration we are writing 8 bytes. But dst[] is an array of u32 so each element only has space for 4 bytes. That means that every iteration overwrites part of the previous element. I spotted this bug while reviewing commit caf3ef7468f7 ("netfilter: nf_tables: prevent OOB access in nft_byteorder_eval") which is a related issue. I think that the reason we have not detected this bug in testing is that most of time we only write one element. Fixes: ce1e7989d989 ("netfilter: nft_byteorder: provide 64bit le/be conversion") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* bpf: handle ldimm64 properly in check_cfg()Andrii Nakryiko2023-11-281-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3feb263bb516ee7e1da0acd22b15afbb9a7daa19 ] ldimm64 instructions are 16-byte long, and so have to be handled appropriately in check_cfg(), just like the rest of BPF verifier does. This has implications in three places: - when determining next instruction for non-jump instructions; - when determining next instruction for callback address ldimm64 instructions (in visit_func_call_insn()); - when checking for unreachable instructions, where second half of ldimm64 is expected to be unreachable; We take this also as an opportunity to report jump into the middle of ldimm64. And adjust few test_verifier tests accordingly. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Reported-by: Hao Sun <sunhao.th@gmail.com> Fixes: 475fb78fbf48 ("bpf: verifier (add branch/goto checks)") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231110002638.4168352-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* SUNRPC: Fix RPC client cleaned up the freed pipefs dentriesfelix2023-11-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit bfca5fb4e97c46503ddfc582335917b0cc228264 ] RPC client pipefs dentries cleanup is in separated rpc_remove_pipedir() workqueue,which takes care about pipefs superblock locking. In some special scenarios, when kernel frees the pipefs sb of the current client and immediately alloctes a new pipefs sb, rpc_remove_pipedir function would misjudge the existence of pipefs sb which is not the one it used to hold. As a result, the rpc_remove_pipedir would clean the released freed pipefs dentries. To fix this issue, rpc_remove_pipedir should check whether the current pipefs sb is consistent with the original pipefs sb. This error can be catched by KASAN: ========================================================= [ 250.497700] BUG: KASAN: slab-use-after-free in dget_parent+0x195/0x200 [ 250.498315] Read of size 4 at addr ffff88800a2ab804 by task kworker/0:18/106503 [ 250.500549] Workqueue: events rpc_free_client_work [ 250.501001] Call Trace: [ 250.502880] kasan_report+0xb6/0xf0 [ 250.503209] ? dget_parent+0x195/0x200 [ 250.503561] dget_parent+0x195/0x200 [ 250.503897] ? __pfx_rpc_clntdir_depopulate+0x10/0x10 [ 250.504384] rpc_rmdir_depopulate+0x1b/0x90 [ 250.504781] rpc_remove_client_dir+0xf5/0x150 [ 250.505195] rpc_free_client_work+0xe4/0x230 [ 250.505598] process_one_work+0x8ee/0x13b0 ... [ 22.039056] Allocated by task 244: [ 22.039390] kasan_save_stack+0x22/0x50 [ 22.039758] kasan_set_track+0x25/0x30 [ 22.040109] __kasan_slab_alloc+0x59/0x70 [ 22.040487] kmem_cache_alloc_lru+0xf0/0x240 [ 22.040889] __d_alloc+0x31/0x8e0 [ 22.041207] d_alloc+0x44/0x1f0 [ 22.041514] __rpc_lookup_create_exclusive+0x11c/0x140 [ 22.041987] rpc_mkdir_populate.constprop.0+0x5f/0x110 [ 22.042459] rpc_create_client_dir+0x34/0x150 [ 22.042874] rpc_setup_pipedir_sb+0x102/0x1c0 [ 22.043284] rpc_client_register+0x136/0x4e0 [ 22.043689] rpc_new_client+0x911/0x1020 [ 22.044057] rpc_create_xprt+0xcb/0x370 [ 22.044417] rpc_create+0x36b/0x6c0 ... [ 22.049524] Freed by task 0: [ 22.049803] kasan_save_stack+0x22/0x50 [ 22.050165] kasan_set_track+0x25/0x30 [ 22.050520] kasan_save_free_info+0x2b/0x50 [ 22.050921] __kasan_slab_free+0x10e/0x1a0 [ 22.051306] kmem_cache_free+0xa5/0x390 [ 22.051667] rcu_core+0x62c/0x1930 [ 22.051995] __do_softirq+0x165/0x52a [ 22.052347] [ 22.052503] Last potentially related work creation: [ 22.052952] kasan_save_stack+0x22/0x50 [ 22.053313] __kasan_record_aux_stack+0x8e/0xa0 [ 22.053739] __call_rcu_common.constprop.0+0x6b/0x8b0 [ 22.054209] dentry_free+0xb2/0x140 [ 22.054540] __dentry_kill+0x3be/0x540 [ 22.054900] shrink_dentry_list+0x199/0x510 [ 22.055293] shrink_dcache_parent+0x190/0x240 [ 22.055703] do_one_tree+0x11/0x40 [ 22.056028] shrink_dcache_for_umount+0x61/0x140 [ 22.056461] generic_shutdown_super+0x70/0x590 [ 22.056879] kill_anon_super+0x3a/0x60 [ 22.057234] rpc_kill_sb+0x121/0x200 Fixes: 0157d021d23a ("SUNRPC: handle RPC client pipefs dentries by network namespace aware routines") Signed-off-by: felix <fuzhen5@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* sched/core: Optimize in_task() and in_interrupt() a bitFinn Thain2023-11-281-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 87c3a5893e865739ce78aa7192d36011022e0af7 ] Except on x86, preempt_count is always accessed with READ_ONCE(). Repeated invocations in macros like irq_count() produce repeated loads. These redundant instructions appear in various fast paths. In the one shown below, for example, irq_count() is evaluated during kernel entry if !tick_nohz_full_cpu(smp_processor_id()). 0001ed0a <irq_enter_rcu>: 1ed0a: 4e56 0000 linkw %fp,#0 1ed0e: 200f movel %sp,%d0 1ed10: 0280 ffff e000 andil #-8192,%d0 1ed16: 2040 moveal %d0,%a0 1ed18: 2028 0008 movel %a0@(8),%d0 1ed1c: 0680 0001 0000 addil #65536,%d0 1ed22: 2140 0008 movel %d0,%a0@(8) 1ed26: 082a 0001 000f btst #1,%a2@(15) 1ed2c: 670c beqs 1ed3a <irq_enter_rcu+0x30> 1ed2e: 2028 0008 movel %a0@(8),%d0 1ed32: 2028 0008 movel %a0@(8),%d0 1ed36: 2028 0008 movel %a0@(8),%d0 1ed3a: 4e5e unlk %fp 1ed3c: 4e75 rts This patch doesn't prevent the pointless btst and beqs instructions above, but it does eliminate 2 of the 3 pointless move instructions here and elsewhere. On x86, preempt_count is per-cpu data and the problem does not arise presumably because the compiler is free to optimize more effectively. This patch was tested on m68k and x86. I was expecting no changes to object code for x86 and mostly that's what I saw. However, there were a few places where code generation was perturbed for some reason. The performance issue addressed here is minor on uniprocessor m68k. I got a 0.01% improvement from this patch for a simple "find /sys -false" benchmark. For architectures and workloads susceptible to cache line bounce the improvement is expected to be larger. The only SMP architecture I have is x86, and as x86 unaffected I have not done any further measurements. Fixes: 15115830c887 ("preempt: Cleanup the macro maze a bit") Signed-off-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/0a403120a682a525e6db2d81d1a3ffcc137c3742.1694756831.git.fthain@linux-m68k.org Signed-off-by: Sasha Levin <sashal@kernel.org>
* pwm: Fix double shift bugDan Carpenter2023-11-281-2/+2
| | | | | | | | | | | | | | | | [ Upstream commit d27abbfd4888d79dd24baf50e774631046ac4732 ] These enums are passed to set/test_bit(). The set/test_bit() functions take a bit number instead of a shifted value. Passing a shifted value is a double shift bug like doing BIT(BIT(1)). The double shift bug doesn't cause a problem here because we are only checking 0 and 1 but if the value was 5 or above then it can lead to a buffer overflow. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* f2fs: fix error path of __f2fs_build_free_nidsZhiguo Niu2023-11-281-0/+1
| | | | | | | | | | | | | | | | | [ Upstream commit a5e80e18f268ea7c7a36bc4159de0deb3b5a2171 ] If NAT is corrupted, let scan_nat_page() return EFSCORRUPTED, so that, caller can set SBI_NEED_FSCK flag into checkpoint for later repair by fsck. Also, this patch introduces a new fscorrupted error flag, and in above scenario, it will persist the error flag into superblock synchronously to avoid it has no luck to trigger a checkpoint to record SBI_NEED_FSCK Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ASoC: SOF: Pass PCI SSID to machine driverRichard Fitzgerald2023-11-282-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ba2de401d32625fe538d3f2c00ca73740dd2d516 ] Pass the PCI SSID of the audio interface through to the machine driver. This allows the machine driver to use the SSID to uniquely identify the specific hardware configuration and apply any platform-specific configuration. struct snd_sof_pdata is passed around inside the SOF code, but it then passes configuration information to the machine driver through struct snd_soc_acpi_mach and struct snd_soc_acpi_mach_params. So SSID information has been added to both snd_sof_pdata and snd_soc_acpi_mach_params. PCI does not define 0x0000 as an invalid value so we can't use zero to indicate that the struct member was not written. Instead a flag is included to indicate that a value has been written to the subsystem_vendor and subsystem_device members. sof_pci_probe() creates the struct snd_sof_pdata. It is passed a struct pci_dev so it can fill in the SSID value. sof_machine_check() finds the appropriate struct snd_soc_acpi_mach. It copies the SSID information across to the struct snd_soc_acpi_mach_params. This done before calling any custom set_mach_params() so that it could be used by the set_mach_params() callback to apply variant params. The machine driver receives the struct snd_soc_acpi_mach as its platform_data. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20230912163207.3498161-3-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ASoC: soc-card: Add storage for PCI SSIDRichard Fitzgerald2023-11-282-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 47f56e38a199bd45514b8e0142399cba4feeaf1a ] Add members to struct snd_soc_card to store the PCI subsystem ID (SSID) of the soundcard. The PCI specification provides two registers to store a vendor-specific SSID that can be read by drivers to uniquely identify a particular "soundcard". This is defined in the PCI specification to distinguish products that use the same silicon (and therefore have the same silicon ID) so that product-specific differences can be applied. PCI only defines 0xFFFF as an invalid value. 0x0000 is not defined as invalid. So the usual pattern of zero-filling the struct and then assuming a zero value unset will not work. A flag is included to indicate when the SSID information has been filled in. Unlike DMI information, which has a free-format entirely up to the vendor, the PCI SSID has a strictly defined format and a registry of vendor IDs. It is usual in Windows drivers that the SSID is used as the sole identifier of the specific end-product and the Windows driver contains tables mapping that to information about the hardware setup, rather than using ACPI properties. This SSID is important information for ASoC components that need to apply hardware-specific configuration on PCI-based systems. As the SSID is a generic part of the PCI specification and is treated as identifying the "soundcard", it is reasonable to include this information in struct snd_soc_card, instead of components inventing their own custom ways to pass this information around. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20230912163207.3498161-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* string.h: add array-wrappers for (v)memdup_user()Philipp Stanner2023-11-281-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 313ebe47d75558511aa1237b6e35c663b5c0ec6f ] Currently, user array duplications are sometimes done without an overflow check. Sometimes the checks are done manually; sometimes the array size is calculated with array_size() and sometimes by calculating n * size directly in code. Introduce wrappers for arrays for memdup_user() and vmemdup_user() to provide a standardized and safe way for duplicating user arrays. This is both for new code as well as replacing usage of (v)memdup_user() in existing code that uses, e.g., n * size to calculate array sizes. Suggested-by: David Airlie <airlied@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Zack Rusin <zackr@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230920123612.16914-3-pstanner@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock: read from socket's error queueArseniy Krasnov2023-11-282-0/+18
| | | | | | | | | | | | | | | [ Upstream commit 49dbe25adac42d3e06f65d1420946bec65896222 ] This adds handling of MSG_ERRQUEUE input flag in receive call. This flag is used to read socket's error queue instead of data queue. Possible scenario of error queue usage is receiving completions for transmission with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK' and 'VSOCK_RECVERR'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: annotate data-races around sk->sk_dst_pending_confirmEric Dumazet2023-11-281-3/+3
| | | | | | | | | | | | [ Upstream commit eb44ad4e635132754bfbcb18103f1dcb7058aedd ] This field can be read or written without socket lock being held. Add annotations to avoid load-store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: annotate data-races around sk->sk_tx_queue_mappingEric Dumazet2023-11-281-4/+16
| | | | | | | | | | | | [ Upstream commit 0bb4d124d34044179b42a769a0c76f389ae973b6 ] This field can be read or written without socket lock being held. Add annotations to avoid load-store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* wifi: mt76: mt7921e: Support MT7992 IP in Xiaomi Redmibook 15 Pro (2023)Ingo Rohloff2023-11-281-0/+2
| | | | | | | | | | | | | | | | | | | [ Upstream commit fce9c967820a72f600abbf061d7077861685a14d ] In the Xiaomi Redmibook 15 Pro (2023) laptop I have got, a wifi chip is used, which according to its PCI Vendor ID is from "ITTIM Technology". This chip works flawlessly with the mt7921e module. The driver doesn't bind to this PCI device, because the Vendor ID from "ITTIM Technology" is not recognized. This patch adds the PCI Vendor ID from "ITTIM Technology" to the list of PCI Vendor IDs and lets the mt7921e driver bind to the mentioned wifi chip. Signed-off-by: Ingo Rohloff <lundril@gmx.de> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ACPI: APEI: Fix AER info corruption when error status data has multiple sectionsShiju Jose2023-11-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e2abc47a5a1a9f641e7cacdca643fdd40729bf6e ] ghes_handle_aer() passes AER data to the PCI core for logging and recovery by calling aer_recover_queue() with a pointer to struct aer_capability_regs. The problem was that aer_recover_queue() queues the pointer directly without copying the aer_capability_regs data. The pointer was to the ghes->estatus buffer, which could be reused before aer_recover_work_func() reads the data. To avoid this problem, allocate a new aer_capability_regs structure from the ghes_estatus_pool, copy the AER data from the ghes->estatus buffer into it, pass a pointer to the new struct to aer_recover_queue(), and free it after aer_recover_work_func() has processed it. Reported-by: Bjorn Helgaas <helgaas@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* workqueue: Provide one lock class key per work_on_cpu() callsiteFrederic Weisbecker2023-11-281-7/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 265f3ed077036f053981f5eea0b5b43e7c5b39ff ] All callers of work_on_cpu() share the same lock class key for all the functions queued. As a result the workqueue related locking scenario for a function A may be spuriously accounted as an inversion against the locking scenario of function B such as in the following model: long A(void *arg) { mutex_lock(&mutex); mutex_unlock(&mutex); } long B(void *arg) { } void launchA(void) { work_on_cpu(0, A, NULL); } void launchB(void) { mutex_lock(&mutex); work_on_cpu(1, B, NULL); mutex_unlock(&mutex); } launchA and launchB running concurrently have no chance to deadlock. However the above can be reported by lockdep as a possible locking inversion because the works containing A() and B() are treated as belonging to the same locking class. The following shows an existing example of such a spurious lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.6.0-rc1-00065-g934ebd6e5359 #35409 Not tainted ------------------------------------------------------ kworker/0:1/9 is trying to acquire lock: ffffffff9bc72f30 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x57/0x2b0 but task is already holding lock: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}: __flush_work+0x83/0x4e0 work_on_cpu+0x97/0xc0 rcu_nocb_cpu_offload+0x62/0xb0 rcu_nocb_toggle+0xd0/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #1 (rcu_state.barrier_mutex){+.+.}-{3:3}: __mutex_lock+0x81/0xc80 rcu_nocb_cpu_deoffload+0x38/0xb0 rcu_nocb_toggle+0x144/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 percpu_down_write+0x31/0x200 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> rcu_state.barrier_mutex --> (work_completion)(&wfc.work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&wfc.work)); lock(rcu_state.barrier_mutex); lock((work_completion)(&wfc.work)); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by kworker/0:1/9: #0: ffff900481068b38 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x212/0x500 #1: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 stack backtrace: CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1-00065-g934ebd6e5359 #35409 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events work_for_cpu_fn Call Trace: rcu-torture: rcu_torture_read_exit: Start of episode <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x132/0x150 __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 ? _cpu_down+0x57/0x2b0 percpu_down_write+0x31/0x200 ? _cpu_down+0x57/0x2b0 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 ? __pfx_worker_thread+0x10/0x10 kthread+0xe6/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK Fix this with providing one lock class key per work_on_cpu() caller. Reported-and-tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/generic-radix-tree.c: Don't overflow in peek()Kent Overstreet2023-11-281-0/+7
| | | | | | | | | | | [ Upstream commit 9492261ff2460252cf2d8de89cdf854c7e2b28a0 ] When we started spreading new inode numbers throughout most of the 64 bit inode space, that triggered some corner case bugs, in particular some integer overflows related to the radix tree code. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net/sched: act_ct: Always fill offloading tuple iifidxVlad Buslov2023-11-201-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9bc64bd0cd765f696fcd40fc98909b1f7c73b2ba ] Referenced commit doesn't always set iifidx when offloading the flow to hardware. Fix the following cases: - nf_conn_act_ct_ext_fill() is called before extension is created with nf_conn_act_ct_ext_add() in tcf_ct_act(). This can cause rule offload with unspecified iifidx when connection is offloaded after only single original-direction packet has been processed by tc data path. Always fill the new nf_conn_act_ct_ext instance after creating it in nf_conn_act_ct_ext_add(). - Offloading of unidirectional UDP NEW connections is now supported, but ct flow iifidx field is not updated when connection is promoted to bidirectional which can result reply-direction iifidx to be zero when refreshing the connection. Fill in the extension and update flow iifidx before calling flow_offload_refresh(). Fixes: 9795ded7f924 ("net/sched: act_ct: Fill offloading tuple iifidx") Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections") Link: https://lore.kernel.org/r/20231103151410.764271-1-vladbu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Fix termination state for idr_for_each_entry_ul()NeilBrown2023-11-201-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e8ae8ad479e2d037daa33756e5e72850a7bd37a9 ] The comment for idr_for_each_entry_ul() states after normal termination @entry is left with the value NULL This is not correct in the case where UINT_MAX has an entry in the idr. In that case @entry will be non-NULL after termination. No current code depends on the documentation being correct, but to save future code we should fix it. Also fix idr_for_each_entry_continue_ul(). While this is not documented as leaving @entry as NULL, the mellanox driver appears to depend on it doing so. So make that explicit in the documentation as well as in the code. Fixes: e33d2b74d805 ("idr: fix overflow case for idr_for_each_entry_ul()") Cc: Matthew Wilcox <willy@infradead.org> Cc: Chris Mi <chrism@mellanox.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* inet: shrink struct flowi_commonEric Dumazet2023-11-201-1/+1
| | | | | | | | | | | | | | | | | | | [ Upstream commit 1726483b79a72e0150734d5367e4a0238bf8fcff ] I am looking at syzbot reports triggering kernel stack overflows involving a cascade of ipvlan devices. We can save 8 bytes in struct flowi_common. This patch alone will not fix the issue, but is a start. Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: wenxu <wenxu@ucloud.cn> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231025141037.3448203-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mfd: core: Un-constify mfd_cell.of_regMichał Mirosław2023-11-201-1/+1
| | | | | | | | | | | | | [ Upstream commit 3c70342f1f0045dc827bb2f02d814ce31e0e0d05 ] Enable dynamically filling in the whole mfd_cell structure. All other fields already allow that. Fixes: 466a62d7642f ("mfd: core: Make a best effort attempt to match devices with the correct of_nodes") Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Link: https://lore.kernel.org/r/b73fe4bc4bd6ba1af90940a640ed65fe254c0408.1693253717.git.mirq-linux@rere.qmqm.pl Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* crypto: hisilicon/qm - fix PF queue parameter issueLongfang Liu2023-11-201-0/+7
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5831fc1fd4a578232fea708b82de0c666ed17153 ] If the queue isolation feature is enabled, the number of queues supported by the device changes. When PF is enabled using the current default number of queues, the default number of queues may be greater than the number supported by the device. As a result, the PF fails to be bound to the driver. After modification, if queue isolation feature is enabled, when the default queue parameter is greater than the number supported by the device, the number of enabled queues will be changed to the number supported by the device, so that the PF and driver can be properly bound. Fixes: 8bbecfb402f7 ("crypto: hisilicon/qm - add queue isolation support for Kunpeng930") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* hwrng: bcm2835 - Fix hwrng throughput regressionStefan Wahren2023-11-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b58a36008bfa1aadf55f516bcbfae40c779eb54b ] The last RCU stall fix caused a massive throughput regression of the hwrng on Raspberry Pi 0 - 3. hwrng_msleep doesn't sleep precisely enough and usleep_range doesn't allow scheduling. So try to restore the best possible throughput by introducing hwrng_yield which interruptable sleeps for one jiffy. Some performance measurements on Raspberry Pi 3B+ (arm64/defconfig): sudo dd if=/dev/hwrng of=/dev/null count=1 bs=10000 cpu_relax ~138025 Bytes / sec hwrng_msleep(1000) ~13 Bytes / sec hwrng_yield ~2510 Bytes / sec Fixes: 96cb9d055445 ("hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()") Link: https://lore.kernel.org/linux-arm-kernel/bc97ece5-44a3-4c4e-77da-2db3eb66b128@gmx.net/ Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* KEYS: Include linux/errno.h in linux/verification.hHerbert Xu2023-11-201-0/+1
| | | | | | | | | | | | | [ Upstream commit 0a596b0682a7ce37e26c36629816f105c6459d06 ] Add inclusion of linux/errno.h as otherwise the reference to EINVAL may be invalid. Fixes: f3cf4134c5c6 ("bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308261414.HKw1Mrip-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* firmware: tegra: Add suspend hook and reset BPMP IPC early on resumeSumit Gupta2023-11-201-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ea608a01d4ee66f8b51070e623f9adb8684c0dd4 ] Add suspend hook and a 'suspended' field in the 'struct tegra_bpmp' to mark if BPMP is suspended. Also, add a 'flags' field in the 'struct tegra_bpmp_message' whose 'TEGRA_BPMP_MESSAGE_RESET' bit can be set from the Tegra MC driver to signal that the reset of BPMP IPC channels is required before sending MRQ to the BPMP FW. Together both the fields allow us to handle any requests that might be sent too soon as they can cause hang during system resume. One case where we see BPMP requests being sent before the BPMP driver has resumed is the memory bandwidth requests which are triggered by onlining the CPUs during system resume. The CPUs are onlined before the BPMP has resumed and we need to reset the BPMP IPC channels to handle these requests. The additional check for 'flags' is done to avoid any un-intended BPMP IPC reset if the tegra_bpmp_transfer*() API gets called during suspend sequence after the BPMP driver is suspended. Fixes: f41e1442ac5b ("cpufreq: tegra194: add OPP support and set bandwidth") Co-developed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64/arm: xen: enlighten: Fix KPTI checksMark Rutland2023-11-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 20f3b8eafe0ba5d3c69d5011a9b07739e9645132 ] When KPTI is in use, we cannot register a runstate region as XEN requires that this is always a valid VA, which we cannot guarantee. Due to this, xen_starting_cpu() must avoid registering each CPU's runstate region, and xen_guest_init() must avoid setting up features that depend upon it. We tried to ensure that in commit: f88af7229f6f22ce (" xen/arm: do not setup the runstate info page if kpti is enabled") ... where we added checks for xen_kernel_unmapped_at_usr(), which wraps arm64_kernel_unmapped_at_el0() on arm64 and is always false on 32-bit arm. Unfortunately, as xen_guest_init() is an early_initcall, this happens before secondary CPUs are booted and arm64 has finalized the ARM64_UNMAP_KERNEL_AT_EL0 cpucap which backs arm64_kernel_unmapped_at_el0(), and so this can subsequently be set as secondary CPUs are onlined. On a big.LITTLE system where the boot CPU does not require KPTI but some secondary CPUs do, this will result in xen_guest_init() intializing features that depend on the runstate region, and xen_starting_cpu() registering the runstate region on some CPUs before KPTI is subsequent enabled, resulting the the problems the aforementioned commit tried to avoid. Handle this more robsutly by deferring the initialization of the runstate region until secondary CPUs have been initialized and the ARM64_UNMAP_KERNEL_AT_EL0 cpucap has been finalized. The per-cpu work is moved into a new hotplug starting function which is registered later when we're certain that KPTI will not be used. Fixes: f88af7229f6f ("xen/arm: do not setup the runstate info page if kpti is enabled") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Bertrand Marquis <bertrand.marquis@arm.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* drm: bridge: samsung-dsim: Fix waiting for empty cmd transfer FIFO on older ↵Marek Szyprowski2023-11-201-0/+1
| | | | | | | | | | | | | | | | | | Exynos [ Upstream commit 15f389da11257b806da75a070cfa41ca0cc15aae ] Samsung DSIM used in older Exynos SoCs (like Exynos 4210, 4x12, 3250) doesn't report empty level of packer header FIFO. In case of those SoCs, use the old way of waiting for empty command tranfsfer FIFO, removed recently by commit 14806c641582 ("drm: bridge: samsung-dsim: Drain command transfer FIFO before transfer"). Fixes: 14806c641582 ("drm: bridge: samsung-dsim: Drain command transfer FIFO before transfer") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230809145641.3213210-1-m.szyprowski@samsung.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* clk: linux/clk-provider.h: fix kernel-doc warnings and typosRandy Dunlap2023-11-201-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 84aefafe6b294041b7fa0757414c4a29c1bdeea2 ] Fix spelling of "Structure". Fix multiple kernel-doc warnings: clk-provider.h:269: warning: Function parameter or member 'recalc_rate' not described in 'clk_ops' clk-provider.h:468: warning: Function parameter or member 'parent_data' not described in 'clk_hw_register_fixed_rate_with_accuracy_parent_data' clk-provider.h:468: warning: Excess function parameter 'parent_name' description in 'clk_hw_register_fixed_rate_with_accuracy_parent_data' clk-provider.h:482: warning: Function parameter or member 'parent_data' not described in 'clk_hw_register_fixed_rate_parent_accuracy' clk-provider.h:482: warning: Excess function parameter 'parent_name' description in 'clk_hw_register_fixed_rate_parent_accuracy' clk-provider.h:687: warning: Function parameter or member 'flags' not described in 'clk_divider' clk-provider.h:1164: warning: Function parameter or member 'flags' not described in 'clk_fractional_divider' clk-provider.h:1164: warning: Function parameter or member 'approximation' not described in 'clk_fractional_divider' clk-provider.h:1213: warning: Function parameter or member 'flags' not described in 'clk_multiplier' Fixes: 9fba738a53dd ("clk: add duty cycle support") Fixes: b2476490ef11 ("clk: introduce the common clock framework") Fixes: 2d34f09e79c9 ("clk: fixed-rate: Add support for specifying parents via DT/pointers") Fixes: f5290d8e4f0c ("clk: asm9260: use parent index to link the reference clock") Fixes: 9d9f78ed9af0 ("clk: basic clock hardware types") Fixes: e2d0e90fae82 ("clk: new basic clk type for fractional divider") Fixes: f2e0a53271a4 ("clk: Add a basic multiplier clock") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Michael Turquette <mturquette@baylibre.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: linux-clk@vger.kernel.org Link: https://lore.kernel.org/r/20230930221428.18463-1-rdunlap@infradead.org Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: add DEV_STATS_READ() helperEric Dumazet2023-11-201-0/+1
| | | | | | | | | | | | | | | [ Upstream commit 0b068c714ca9479d2783cc333fff5bc2d4a6d45c ] Companion of DEV_STATS_INC() & DEV_STATS_ADD(). This is going to be used in the series. Use it in macsec_get_stats64(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: ff672b9ffeb3 ("ipvlan: properly track tx_errors") Signed-off-by: Sasha Levin <sashal@kernel.org>
* Bluetooth: Make handle of hci_conn be uniqueZiyang Xuan2023-11-201-1/+5
| | | | | | | | | | | | | | [ Upstream commit 181a42edddf51d5d9697ecdf365d72ebeab5afb0 ] The handle of new hci_conn is always HCI_CONN_HANDLE_MAX + 1 if the handle of the first hci_conn entry in hci_dev->conn_hash->list is not HCI_CONN_HANDLE_MAX + 1. Use ida to manage the allocation of hci_conn->handle to make it be unique. Fixes: 9f78191cc9f1 ("Bluetooth: hci_conn: Always allocate unique handles") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Bluetooth: ISO: Pass BIG encryption info through QoSIulia Tanasescu2023-11-202-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1d11d70d1f6b23e7d3fc00396c17b90b876162a4 ] This enables a broadcast sink to be informed if the PA it has synced with is associated with an encrypted BIG, by retrieving the socket QoS and checking the encryption field. After PA sync has been successfully established and the first BIGInfo advertising report is received, a new hcon is added and notified to the ISO layer. The ISO layer sets the encryption field of the socket and hcon QoS according to the encryption parameter of the BIGInfo advertising report event. After that, the userspace is woken up, and the QoS of the new PA sync socket can be read, to inspect the encryption field and follow up accordingly. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 181a42edddf5 ("Bluetooth: Make handle of hci_conn be unique") Signed-off-by: Sasha Levin <sashal@kernel.org>
* Bluetooth: ISO: Use defer setup to separate PA sync and BIG syncIulia Tanasescu2023-11-201-2/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit fbdc4bc47268953c80853489f696e02d61f9a2c6 ] This commit implements defer setup support for the Broadcast Sink scenario: By setting defer setup on a broadcast socket before calling listen, the user is able to trigger the PA sync and BIG sync procedures separately. This is useful if the user first wants to synchronize to the periodic advertising transmitted by a Broadcast Source, and trigger the BIG sync procedure later on. If defer setup is set, once a PA sync established event arrives, a new hcon is created and notified to the ISO layer. A child socket associated with the PA sync connection will be added to the accept queue of the listening socket. Once the accept call returns the fd for the PA sync child socket, the user should call read on that fd. This will trigger the BIG create sync procedure, and the PA sync socket will become a listening socket itself. When the BIG sync established event is notified to the ISO layer, the bis connections will be added to the accept queue of the PA sync parent. The user should call accept on the PA sync socket to get the final bis connections. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 181a42edddf5 ("Bluetooth: Make handle of hci_conn be unique") Signed-off-by: Sasha Levin <sashal@kernel.org>
* tcp: fix cookie_init_timestamp() overflowsEric Dumazet2023-11-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 73ed8e03388d16c12fc577e5c700b58a29045a15 ] cookie_init_timestamp() is supposed to return a 64bit timestamp suitable for both TSval determination and setting of skb->tstamp. Unfortunately it uses 32bit fields and overflows after 2^32 * 10^6 nsec (~49 days) of uptime. Generated TSval are still correct, but skb->tstamp might be set far away in the past, potentially confusing other layers. tcp_ns_to_ts() is changed to return a full 64bit value, ts and ts_now variables are changed to u64 type, and TSMASK is removed in favor of shifts operations. While we are at it, change this sequence: ts >>= TSBITS; ts--; ts <<= TSBITS; ts |= options; to: ts -= (1UL << TSBITS); Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* PM: sleep: Fix symbol export for _SIMPLE_ variants of _PM_OPS()Raag Jadav2023-11-201-14/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8d74f1da776da9b0306630b13a3025214fa44618 ] Currently EXPORT_*_SIMPLE_DEV_PM_OPS() use EXPORT_*_DEV_PM_OPS() set of macros to export dev_pm_ops symbol, which export the symbol in case CONFIG_PM=y but don't take CONFIG_PM_SLEEP into consideration. Since _SIMPLE_ variants of _PM_OPS() do not include runtime PM handles and are only used in case CONFIG_PM_SLEEP=y, we should not be exporting dev_pm_ops symbol for them in case CONFIG_PM_SLEEP=n. This can be fixed by having two distinct set of export macros for both _RUNTIME_ and _SIMPLE_ variants of _PM_OPS(), such that the export of dev_pm_ops symbol used in each variant depends on CONFIG_PM and CONFIG_PM_SLEEP respectively. Introduce _DEV_SLEEP_PM_OPS() set of export macros for _SIMPLE_ variants of _PM_OPS(), which export dev_pm_ops symbol only in case CONFIG_PM_SLEEP=y and discard it otherwise. Fixes: 34e1ed189fab ("PM: Improve EXPORT_*_DEV_PM_OPS macros") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* udplite: fix various data-racesEric Dumazet2023-11-202-9/+11
| | | | | | | | | | | | | | | [ Upstream commit 882af43a0fc37e26d85fb0df0c9edd3bed928de4 ] udp->pcflag, udp->pcslen and udp->pcrlen reads/writes are racy. Move udp->pcflag to udp->udp_flags for atomicity, and add READ_ONCE()/WRITE_ONCE() annotations for pcslen and pcrlen. Fixes: ba4e58eca8aa ("[NET]: Supporting UDP-Lite (RFC 3828) in Linux") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>