summaryrefslogtreecommitdiffstats
path: root/kernel/rcu/tasks.h
Commit message (Collapse)AuthorAgeFilesLines
*-. Merge branches 'context_tracking.15.08.24a', 'csd.lock.15.08.24a', ↵Neeraj Upadhyay2024-09-091-70/+142
|\ \ | | | | | | | | | 'nocb.09.09.24a', 'rcutorture.14.08.24a', 'rcustall.09.09.24a', 'srcu.12.08.24a', 'rcu.tasks.14.08.24a', 'rcu_scaling_tests.15.08.24a', 'fixes.12.08.24a' and 'misc.11.08.24a' into next.09.09.24a
| | * rcu/tasks: Add rcu_barrier_tasks*() start time to diagnosticsPaul E. McKenney2024-08-141-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds the start time, in jiffies, of the most recently started rcu_barrier_tasks*() operation to the diagnostic output used by rcuscale. This information can be helpful in distinguishing a hung barrier operation from a long series of barrier operations. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
| | * rcu/tasks: Add detailed grace-period and barrier diagnosticsPaul E. McKenney2024-08-141-2/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds rcu_tasks_torture_stats_print(), rcu_tasks_trace_torture_stats_print(), and rcu_tasks_rude_torture_stats_print() functions that provide detailed diagnostics on grace-period, callback, and barrier state. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
| | * rcu/tasks: Mark callbacks not currently participating in barrier operationPaul E. McKenney2024-08-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each Tasks RCU flavor keeps a count of the number of callbacks that the current rcu_barrier_tasks*() is waiting on, but there is currently no easy way to work out which callback is stuck. One way to do this is to mark idle RCU-barrier callbacks by making the ->next pointer point to the callback itself, and this commit does just that. Later commits will use this for debug output. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
| | * rcu/tasks: Update rtp->tasks_gp_seq commentPaul E. McKenney2024-08-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rtp->tasks_gp_seq grace-period sequence number is not a strict count, but rather the usual RCU sequence number with the lower few bits tracking per-grace-period state and the upper bits the count of grace periods since boot, give or take the initial value. This commit therefore adjusts this comment. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
| | * rcu/tasks: Check processor-ID assumptionsPaul E. McKenney2024-08-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current mapping of smp_processor_id() to a CPU processing Tasks-RCU callbacks makes some assumptions about layout. This commit therefore adds a WARN_ON() to check these assumptions. [ neeraj.upadhyay: Replace nr_cpu_ids with rcu_task_cpu_ids. ] Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
| | * rcu-tasks: Fix access non-existent percpu rtpcp variable in ↵Zqiang2024-08-141-29/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rcu_tasks_need_gpcb() For kernels built with CONFIG_FORCE_NR_CPUS=y, the nr_cpu_ids is defined as NR_CPUS instead of the number of possible cpus, this will cause the following system panic: smpboot: Allowing 4 CPUs, 0 hotplug CPUs ... setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:512 nr_node_ids:1 ... BUG: unable to handle page fault for address: ffffffff9911c8c8 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 15 Comm: rcu_tasks_trace Tainted: G W 6.6.21 #1 5dc7acf91a5e8e9ac9dcfc35bee0245691283ea6 RIP: 0010:rcu_tasks_need_gpcb+0x25d/0x2c0 RSP: 0018:ffffa371c00a3e60 EFLAGS: 00010082 CR2: ffffffff9911c8c8 CR3: 000000040fa20005 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x23/0x80 ? page_fault_oops+0xa4/0x180 ? exc_page_fault+0x152/0x180 ? asm_exc_page_fault+0x26/0x40 ? rcu_tasks_need_gpcb+0x25d/0x2c0 ? __pfx_rcu_tasks_kthread+0x40/0x40 rcu_tasks_one_gp+0x69/0x180 rcu_tasks_kthread+0x94/0xc0 kthread+0xe8/0x140 ? __pfx_kthread+0x40/0x40 ret_from_fork+0x34/0x80 ? __pfx_kthread+0x40/0x40 ret_from_fork_asm+0x1b/0x80 </TASK> Considering that there may be holes in the CPU numbers, use the maximum possible cpu number, instead of nr_cpu_ids, for configuring enqueue and dequeue limits. [ neeraj.upadhyay: Fix htmldocs build error reported by Stephen Rothwell ] Closes: https://lore.kernel.org/linux-input/CALMA0xaTSMN+p4xUXkzrtR5r6k7hgoswcaXx7baR_z9r5jjskw@mail.gmail.com/T/#u Reported-by: Zhixu Liu <zhixu.liu@gmail.com> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
| | * rcu-tasks: Remove RCU Tasks Rude asynchronous APIsPaul E. McKenney2024-08-141-38/+17
| |/ | | | | | | | | | | | | | | | | | | The call_rcu_tasks_rude() and rcu_barrier_tasks_rude() APIs are currently unused. This commit therefore removes their definitions and boot-time self-tests. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
* / rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs()Valentin Schneider2024-08-151-1/+1
|/ | | | | | | | | The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
* rcu/tasks: Fix stale task snaphot for Tasks TraceFrederic Weisbecker2024-06-061-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When RCU-TASKS-TRACE pre-gp takes a snapshot of the current task running on all online CPUs, no explicit ordering synchronizes properly with a context switch. This lack of ordering can permit the new task to miss pre-grace-period update-side accesses. The following diagram, courtesy of Paul, shows the possible bad scenario: CPU 0 CPU 1 ----- ----- // Pre-GP update side access WRITE_ONCE(*X, 1); smp_mb(); r0 = rq->curr; RCU_INIT_POINTER(rq->curr, TASK_B) spin_unlock(rq) rcu_read_lock_trace() r1 = X; /* ignore TASK_B */ Either r0==TASK_B or r1==1 is needed but neither is guaranteed. One possible solution to solve this is to wait for an RCU grace period at the beginning of the RCU-tasks-trace grace period before taking the current tasks snaphot. However this would introduce large additional latencies to RCU-tasks-trace grace periods. Another solution is to lock the target runqueue while taking the current task snapshot. This ensures that the update side sees the latest context switch and subsequent context switches will see the pre-grace-period update side accesses. This commit therefore adds runqueue locking to cpu_curr_snapshot(). Fixes: e386b6725798 ("rcu-tasks: Eliminate RCU Tasks Trace IPIs to online CPUs") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* Revert "rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()"Frederic Weisbecker2024-06-031-13/+3
| | | | | | | | | | | | | | | | | | This reverts commit 28319d6dc5e2ffefa452c2377dd0f71621b5bff0. The race it fixed was subject to conditions that don't exist anymore since: 1612160b9127 ("rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks") This latter commit removes the use of SRCU that used to cover the RCU-tasks blind spot on exit between the tasklist's removal and the final preemption disabling. The task is now placed instead into a temporary list inside which voluntary sleeps are accounted as RCU-tasks quiescent states. This would disarm the deadlock initially reported against PID namespace exit. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
*---. Merge branches 'fixes.2024.04.15a', 'misc.2024.04.12a', ↵Uladzislau Rezki (Sony)2024-05-011-5/+29
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'rcu-sync-normal-improve.2024.04.15a', 'rcu-tasks.2024.04.15a' and 'rcutorture.2024.04.15a' into rcu-merge.2024.04.15a fixes.2024.04.15a: RCU fixes misc.2024.04.12a: Miscellaneous fixes rcu-sync-normal-improve.2024.04.15a: Improving synchronize_rcu() call rcu-tasks.2024.04.15a: Tasks RCU updates rcutorture.2024.04.15a: Torture-test updates
| | | * rcutorture: Make rcutorture support print rcu-tasks gp stateZqiang2024-04-161-0/+21
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit make rcu-tasks related rcutorture test support rcu-tasks gp state printing when the writer stall occurs or the at the end of rcutorture test, and generate rcu_ops->get_gp_data() operation to simplify the acquisition of gp state for different types of rcutorture tests. Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
| | * rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflowNikita Kiryushin2024-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a possibility of buffer overflow in show_rcu_tasks_trace_gp_kthread() if counters, passed to sprintf() are huge. Counter numbers, needed for this are unrealistically high, but buffer overflow is still possible. Use snprintf() with buffer size instead of sprintf(). Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: edf3775f0ad6 ("rcu-tasks: Add count for idle tasks on offline CPUs") Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
| | * rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timerZqiang2024-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The synchronize_srcu() has been removed by commit("rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks") in rcu_tasks_postscan. This commit therefore fixes the tasks_rcu_exit_srcu_stall_timer comment. Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
| | * rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE()Paul E. McKenney2024-04-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because the Tasks RCU ->rtp_exit_list is initialized at rcu_init() time while there is only one CPU running with interrupts disabled, it is not possible for an exiting task to encounter an uninitialized list. This commit therefore replaces the conditional initialization with a WARN_ON_ONCE(). Reported-by: Frederic Weisbecker <frederic@kernel.org> Closes: https://lore.kernel.org/all/ZdiNXmO3wRvmzPsr@lothringen/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
| | * rcu-tasks: Make Tasks RCU wait idly for grace-period delaysPaul E. McKenney2024-04-091-1/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, all waits for grace periods sleep at TASK_UNINTERRUPTIBLE, regardless of RCU flavor. This has worked well, but there have been cases where a longer-than-average Tasks RCU grace period has triggered softlockup splats, many of them, before the Tasks RCU CPU stall warning appears. These softlockup splats unnecessarily consume console bandwidth and complicate diagnosis of the underlying problem. Plus a long but not pathologically long Tasks RCU grace period might trigger a few softlockup splats before completing normally, which generates noise for no good reason. This commit therefore causes Tasks RCU grace periods to sleep at TASK_IDLE priority. If there really is a persistent problem, the eventual Tasks RCU CPU stall warning will flag it, and without the extra noise. Reported-by: Breno Leitao <leitao@debian.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
* / rcu: Inform KCSAN of one-byte cmpxchg() in rcu_trc_cmpxchg_need_qs()Paul E. McKenney2024-04-151-1/+9
|/ | | | | | | | | | | | | | | | | Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. Therefore, rcu_trc_cmpxchg_need_qs() emulates one using field substitution and a four-byte cmpxchg(), such that the other three bytes are always atomically updated to their old values. This works, but results in false-positive KCSAN failures because as far as KCSAN knows, this cmpxchg() operation is updating all four bytes. This commit therefore encloses the cmpxchg() in a data_race() and adds a single-byte instrument_atomic_read_write(), thus telling KCSAN exactly what is going on so as to avoid the false positives. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Marco Elver <elver@google.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
* rcu-tasks: Maintain real-time response in rcu_tasks_postscan()Paul E. McKenney2024-02-251-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | The current code will scan the entirety of each per-CPU list of exiting tasks in ->rtp_exit_list with interrupts disabled. This is normally just fine, because each CPU typically won't have very many tasks in this state. However, if a large number of tasks block late in do_exit(), these lists could be arbitrarily long. Low probability, perhaps, but it really could happen. This commit therefore occasionally re-enables interrupts while traversing these lists, inserting a dummy element to hold the current place in the list. In kernels built with CONFIG_PREEMPT_RT=y, this re-enabling happens after each list element is processed, otherwise every one-to-two jiffies. [ paulmck: Apply Frederic Weisbecker feedback. ] Link: https://lore.kernel.org/all/ZdeI_-RfdLR8jlsm@localhost.localdomain/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
* rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasksPaul E. McKenney2024-02-251-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Holding a mutex across synchronize_rcu_tasks() and acquiring that same mutex in code called from do_exit() after its call to exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop() results in deadlock. This is by design, because tasks that are far enough into do_exit() are no longer present on the tasks list, making it a bit difficult for RCU Tasks to find them, let alone wait on them to do a voluntary context switch. However, such deadlocks are becoming more frequent. In addition, lockdep currently does not detect such deadlocks and they can be difficult to reproduce. In addition, if a task voluntarily context switches during that time (for example, if it blocks acquiring a mutex), then this task is in an RCU Tasks quiescent state. And with some adjustments, RCU Tasks could just as well take advantage of that fact. This commit therefore eliminates these deadlock by replacing the SRCU-based wait for do_exit() completion with per-CPU lists of tasks currently exiting. A given task will be on one of these per-CPU lists for the same period of time that this task would previously have been in the previous SRCU read-side critical section. These lists enable RCU Tasks to find the tasks that have already been removed from the tasks list, but that must nevertheless be waited upon. The RCU Tasks grace period gathers any of these do_exit() tasks that it must wait on, and adds them to the list of holdouts. Per-CPU locking and get_task_struct() are used to synchronize addition to and removal from these lists. Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/ Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
* rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocksPaul E. McKenney2024-02-251-10/+33
| | | | | | | | | | | | | | | | | | | | | This commit continues the elimination of deadlocks involving do_exit() and RCU tasks by causing exit_tasks_rcu_start() to add the current task to a per-CPU list and causing exit_tasks_rcu_stop() to remove the current task from whatever list it is on. These lists will be used to track tasks that are exiting, while still accounting for any RCU-tasks quiescent states that these tasks pass though. [ paulmck: Apply Frederic Weisbecker feedback. ] Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/ Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
* rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocksPaul E. McKenney2024-02-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Holding a mutex across synchronize_rcu_tasks() and acquiring that same mutex in code called from do_exit() after its call to exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop() results in deadlock. This is by design, because tasks that are far enough into do_exit() are no longer present on the tasks list, making it a bit difficult for RCU Tasks to find them, let alone wait on them to do a voluntary context switch. However, such deadlocks are becoming more frequent. In addition, lockdep currently does not detect such deadlocks and they can be difficult to reproduce. In addition, if a task voluntarily context switches during that time (for example, if it blocks acquiring a mutex), then this task is in an RCU Tasks quiescent state. And with some adjustments, RCU Tasks could just as well take advantage of that fact. This commit therefore initializes the data structures that will be needed to rely on these quiescent states and to eliminate these deadlocks. Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/ Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
* rcu-tasks: Initialize callback lists at rcu_init() timePaul E. McKenney2024-02-251-6/+18
| | | | | | | | | | | | | | | | | | | | In order for RCU Tasks to reliably maintain per-CPU lists of exiting tasks, those lists must be initialized before it is possible for tasks to exit, especially given that the boot CPU is not necessarily CPU 0 (an example being, powerpc kexec() kernels). And at the time that rcu_init_tasks_generic() is called, a task could potentially exit, unconventional though that sort of thing might be. This commit therefore moves the calls to cblist_init_generic() from functions called from rcu_init_tasks_generic() to a new function named tasks_cblist_init_generic() that is invoked from rcu_init(). This constituted a bug in a commit that never went to mainline, so there is no need for any backporting to -stable. Reported-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
* rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocksPaul E. McKenney2024-02-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Holding a mutex across synchronize_rcu_tasks() and acquiring that same mutex in code called from do_exit() after its call to exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop() results in deadlock. This is by design, because tasks that are far enough into do_exit() are no longer present on the tasks list, making it a bit difficult for RCU Tasks to find them, let alone wait on them to do a voluntary context switch. However, such deadlocks are becoming more frequent. In addition, lockdep currently does not detect such deadlocks and they can be difficult to reproduce. In addition, if a task voluntarily context switches during that time (for example, if it blocks acquiring a mutex), then this task is in an RCU Tasks quiescent state. And with some adjustments, RCU Tasks could just as well take advantage of that fact. This commit therefore adds the data structures that will be needed to rely on these quiescent states and to eliminate these deadlocks. Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/ Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
* rcu-tasks: Mark RCU Tasks accesses to current->rcu_tasks_idle_cpuPaul E. McKenney2023-12-121-2/+2
| | | | | | | | | | | | The task_struct structure's ->rcu_tasks_idle_cpu can be concurrently read and written from the RCU Tasks grace-period kthread and from the CPU on which the task_struct structure's task is running. This commit therefore marks the accesses appropriately. Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
* rcu/tasks-trace: Handle new PF_IDLE semanticsFrederic Weisbecker2023-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The commit: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup") has changed the semantics of what is to be considered an idle task in such a way that the idle task of an offline CPU may not carry the PF_IDLE flag anymore. However RCU-tasks-trace tests the opposite assertion, still assuming that idle tasks carry the PF_IDLE flag during their whole lifecycle. Remove this assumption to avoid spurious warnings but keep the initial test verifying that the idle task is the current task on any offline CPU. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup") Suggested-by: Joel Fernandes <joel@joelfernandes.org> Suggested-by: Paul E . McKenney" <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
* rcu/tasks: Handle new PF_IDLE semanticsFrederic Weisbecker2023-11-011-2/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup") has changed the semantics of what is to be considered an idle task in such a way that CPU boot code preceding the actual idle loop is excluded from it. This has however introduced new potential RCU-tasks stalls when either: 1) Grace period is started before init/0 had a chance to set PF_IDLE, keeping it stuck in the holdout list until idle ever schedules. 2) Grace period is started when some possible CPUs have never been online, keeping their idle tasks stuck in the holdout list until the CPU ever boots up. 3) Similar to 1) but with secondary CPUs: Grace period is started concurrently with secondary CPU booting, putting its idle task in the holdout list because PF_IDLE isn't yet observed on it. It stays then stuck in the holdout list until that CPU ever schedules. The effect is mitigated here by the hotplug AP thread that must run to bring the CPU up. Fix this with handling the new semantics of PF_IDLE, keeping in mind that it may or may not be set on an idle task. Take advantage of that to strengthen the coverage of an RCU-tasks quiescent state within an idle task, excluding the CPU boot code from it. Only the code running within the idle loop is now a quiescent state, along with offline CPUs. Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup") Suggested-by: Joel Fernandes <joel@joelfernandes.org> Suggested-by: Paul E . McKenney" <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
*-. Merge branches 'rcu/torture', 'rcu/fixes', 'rcu/docs', 'rcu/refscale', ↵Frederic Weisbecker2023-10-231-3/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'rcu/tasks' and 'rcu/stall' into rcu/next rcu/torture: RCU torture, locktorture and generic torture infrastructure rcu/fixes: Generic and misc fixes rcu/docs: RCU documentation updates rcu/refscale: RCU reference scalability test updates rcu/tasks: RCU tasks updates rcu/stall: Stall detection updates
| | * rcu-tasks: Make rcu_tasks_lazy_ms staticJiapeng Chong2023-09-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_tasks_lazy_ms variable is not used outside the file tasks.h, so this commit marks it static. kernel/rcu/tasks.h:1085:5: warning: symbol 'rcu_tasks_lazy_ms' was not declared. Should it be static? Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6086 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
| | * rcu-tasks: Pull sampling of ->percpu_dequeue_lim out of loopPaul E. McKenney2023-09-111-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_tasks_need_gpcb() samples ->percpu_dequeue_lim as part of the condition clause of a "for" loop, which is a bit confusing. This commit therefore hoists this sampling out of the loop, using the result loaded in the condition clause. So why does this work in the face of a concurrent switch from single-CPU queueing to per-CPU queueing? o The call_rcu_tasks_generic() that makes the change has already enqueued its callback, which means that all of the other CPU's callback queues are empty. o For the call_rcu_tasks_generic() that first notices the switch to per-CPU queues, the smp_store_release() used to update ->percpu_enqueue_lim pairs with the raw_spin_trylock_rcu_node()'s full barrier that is between the READ_ONCE(rtp->percpu_enqueue_shift) and the rcu_segcblist_enqueue() that enqueues the callback. o Because this CPU's queue is empty (unless it happens to be the original single queue, in which case there is no need for synchronization), this call_rcu_tasks_generic() will do an irq_work_queue() to schedule a handler for the needed rcuwait_wake_up() call. This call will be ordered after the first call_rcu_tasks_generic() function's change to ->percpu_dequeue_lim. o This rcuwait_wake_up() will either happen before or after the set_current_state() in rcuwait_wait_event(). If it happens before, the "condition" argument's call to rcu_tasks_need_gpcb() will be ordered after the original change, and all callbacks on all CPUs will be visible. Otherwise, if it happens after, then the grace-period kthread's state will be set back to running, which will result in a later call to rcuwait_wait_event() and thus to rcu_tasks_need_gpcb(), which will again see the change. So it all works out. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
| | * rcu-tasks: Add printk()s to localize boot-time self-test hangPaul E. McKenney2023-09-111-1/+3
| |/ |/| | | | | | | | | | | | | | | | | | | | | Currently, rcu_tasks_initiate_self_tests() prints a message and then initiates self tests on up to three different RCU Tasks flavors. If one of the flavors has a grace-period hang, it is not easy to work out which of the three hung. This commit therefore prints a message prior to each individual test. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
| * rcu: Dump memory object info if callback function is invalidZhen Lei2023-09-131-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a structure containing an RCU callback rhp is (incorrectly) freed and reallocated after rhp is passed to call_rcu(), it is not unusual for rhp->func to be set to NULL. This defeats the debugging prints used by __call_rcu_common() in kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, which expect to identify the offending code using the identity of this function. And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things are even worse, as can be seen from this splat: Unable to handle kernel NULL pointer dereference at virtual address 0 ... ... PC is at 0x0 LR is at rcu_do_batch+0x1c0/0x3b8 ... ... (rcu_do_batch) from (rcu_core+0x1d4/0x284) (rcu_core) from (__do_softirq+0x24c/0x344) (__do_softirq) from (__irq_exit_rcu+0x64/0x108) (__irq_exit_rcu) from (irq_exit+0x8/0x10) (irq_exit) from (__handle_domain_irq+0x74/0x9c) (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98) (gic_handle_irq) from (__irq_svc+0x5c/0x94) (__irq_svc) from (arch_cpu_idle+0x20/0x3c) (arch_cpu_idle) from (default_idle_call+0x4c/0x78) (default_idle_call) from (do_idle+0xf8/0x150) (do_idle) from (cpu_startup_entry+0x18/0x20) (cpu_startup_entry) from (0xc01530) This commit therefore adds calls to mem_dump_obj(rhp) to output some information, for example: slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256 This provides the rough size of the memory block and the offset of the rcu_head structure, which as least provides at least a few clues to help locate the problem. If the problem is reproducible, additional slab debugging can be enabled, for example, CONFIG_DEBUG_SLAB=y, which can provide significantly more information. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
*-. Merge branches 'doc.2023.07.14b', 'fixes.2023.08.16a', ↵Paul E. McKenney2023-08-161-19/+117
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'rcu-tasks.2023.07.24a', 'rcuscale.2023.07.14b', 'refscale.2023.07.14b', 'torture.2023.08.14a' and 'torturescripts.2023.07.20a' into HEAD doc.2023.07.14b: Documentation updates. fixes.2023.08.16a: Miscellaneous fixes. rcu-tasks.2023.07.24a: RCU Tasks updates. rcuscale.2023.07.14b: RCU (updater) scalability test updates. refscale.2023.07.14b: Reference (reader) scalability test updates. torture.2023.08.14a: Other torture-test updates. torturescripts.2023.07.20a: Other torture-test scripting updates.
| | * rcuscale: Add RCU Tasks Rude testingPaul E. McKenney2023-07-141-0/+7
| | | | | | | | | | | | | | | | | | Add a "tasks-rude" option to the rcuscale.scale_type module parameter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| | * rcuscale: Measure RCU Tasks Trace grace-period kthread CPU timePaul E. McKenney2023-07-141-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit causes RCU Tasks Trace to output the CPU time consumed by its grace-period kthread. The CPU time is whatever is in the designated task's current->stime field, and thus is controlled by whatever CPU-time accounting scheme is in effect. This output appears in microseconds as follows on the console: rcu_scale: Grace-period kthread CPU time: 42367.037 [ paulmck: Apply Willy Tarreau feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| | * rcuscale: Measure grace-period kthread CPU timePaul E. McKenney2023-07-141-0/+6
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds the ability to output the CPU time consumed by the grace-period kthread for the RCU variant under test. The CPU time is whatever is in the designated task's current->stime field, and thus is controlled by whatever CPU-time accounting scheme is in effect. This output appears in microseconds as follows on the console: rcu_scale: Grace-period kthread CPU time: 42367.037 [ paulmck: Apply feedback from Stephen Rothwell and kernel test robot. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yujie Liu <yujie.liu@intel.com>
| * rcu-tasks: Fix boot-time RCU tasks debug-only deadlockPaul E. McKenney2023-08-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In kernels built with CONFIG_PROVE_RCU=y (for example, lockdep kernels), the following sequence of events can occur: o rcu_init_tasks_generic() is invoked just before init is spawned. It invokes rcu_spawn_tasks_kthread() and friends. o rcu_spawn_tasks_kthread() invokes rcu_spawn_tasks_kthread_generic(), which uses kthread_run() to create the needed kthread. o Control returns to rcu_init_tasks_generic(), which, because this is a CONFIG_PROVE_RCU=y kernel, invokes the version of the rcu_tasks_initiate_self_tests() function that actually does something, including invoking synchronize_rcu_tasks(), which in turn invokes synchronize_rcu_tasks_generic(). o synchronize_rcu_tasks_generic() sees that the ->kthread_ptr is still NULL, because the newly spawned kthread has not yet started. o The new kthread starts, preempting synchronize_rcu_tasks_generic() just after its check. This kthread invokes rcu_tasks_one_gp(), which acquires ->tasks_gp_mutex, and, seeing no work, blocks in rcuwait_wait_event(). Note that this step requires either a preemptible kernel or a fault-injection-style sleep at the beginning of mutex_lock(). o synchronize_rcu_tasks_generic() resumes and invokes rcu_tasks_one_gp(). o rcu_tasks_one_gp() attempts to acquire ->tasks_gp_mutex, which is still held by the newly spawned kthread's rcu_tasks_one_gp() function. Deadlock. Because the only reason for ->tasks_gp_mutex is to handle pre-kthread synchronous grace periods, this commit avoids this deadlock by having rcu_tasks_one_gp() momentarily release ->tasks_gp_mutex while invoking rcuwait_wait_event(). This allows the call to rcu_tasks_one_gp() from synchronize_rcu_tasks_generic() proceed. Note that it is not necessary to release the mutex anywhere else in rcu_tasks_one_gp() because rcuwait_wait_event() is the only function that can block indefinitely. Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Roy Hopkins <rhopkins@suse.de> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Roy Hopkins <rhopkins@suse.de>
| * rcu-tasks: Permit use of debug-objects with RCU Tasks flavorsPaul E. McKenney2023-07-311-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, cblist_init_generic() holds a raw spinlock when invoking INIT_WORK(). This fails in kernels built with CONFIG_DEBUG_OBJECTS=y due to memory allocation being forbidden while holding a raw spinlock. But the only reason for holding the raw spinlock is to synchronize with early boot calls to call_rcu_tasks(), call_rcu_tasks_rude, and, last but not least, call_rcu_tasks_trace(). These calls also invoke cblist_init_generic() in order to support early boot queueing of callbacks. Except that there are no early boot calls to either of these three functions, and the BPF guys confirm that they have no plans to add any such calls. This commit therefore removes the synchronization and adds a WARN_ON_ONCE() to catch the case of now-prohibited early boot RCU Tasks callback queueing. If early boot queueing is needed, an "initialized" flag may be added to the rcu_tasks structure. Then queueing a callback before this flag is set would initialize the callback list (if needed) and queue the callback. The decision as to where to queue the callback given the possibility of non-zero boot CPUs is left as an exercise for the reader. Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu-tasks: Cancel callback laziness if too many callbacksPaul E. McKenney2023-07-141-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The various RCU Tasks flavors now do lazy grace periods when there are only asynchronous grace period requests. By default, the system will let 250 milliseconds elapse after the first call_rcu_tasks*() callbacki is queued before starting a grace period. In contrast, synchronous grace period requests such as synchronize_rcu_tasks*() will start a grace period immediately. However, invoking one of the call_rcu_tasks*() functions in a too-tight loop can result in a callback flood, which in turn can exhaust memory if grace periods are delayed for too long. This commit therefore sets a limit so that the grace-period kthread will be awakened when any CPU's callback list expands to contain rcupdate.rcu_task_lazy_lim callbacks elements (defaulting to 32, set to -1 to disable), the grace-period kthread will be awakened, thus cancelling any ongoing laziness and getting out in front of the potential callback flood. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu-tasks: Add kernel boot parameters for callback lazinessPaul E. McKenney2023-07-141-0/+15
| | | | | | | | | | | | | | This commit adds kernel boot parameters for callback laziness, allowing the RCU Tasks flavors to be individually adjusted. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu-tasks: Remove redundant #ifdef CONFIG_TASKS_RCUPaul E. McKenney2023-07-141-2/+0
| | | | | | | | | | | | The kernel/rcu/tasks.h file has a #endif immediately followed by an Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu-tasks: Treat only synchronous grace periods urgentlyPaul E. McKenney2023-07-141-8/+73
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The performance requirements on RCU Tasks, and in particular on RCU Tasks Trace, have evolved over time as the workloads have evolved. The current implementation is designed to provide low grace-period latencies, and also to accommodate short-duration floods of callbacks. However, current workloads can also provide a constant background callback-queuing rate of a few hundred call_rcu_tasks_trace() invocations per second. This results in continuous back-to-back RCU Tasks Trace grace periods, which in turn can consume the better part of 10% of a CPU. One could take the attitude that there are several tens of other CPUs on the systems running such workloads, but energy efficiency is a thing. On these systems, although asynchronous grace-period requests happen every few milliseconds, synchronous grace-period requests are quite rare. This commit therefore arrranges for grace periods to be initiated immediately in response to calls to synchronize_rcu_tasks*() and also to calls to synchronize_rcu_mult() that are passed one of the call_rcu_tasks*() functions. These are recognized by the tell-tale wakeme_after_rcu callback function. In other cases, callbacks are gathered up for up to about 250 milliseconds before a grace period is initiated. This results in more than an order of magnitude reduction in RCU Tasks Trace grace periods, with corresponding reduction in consumption of CPU time. Reported-by: Alexei Starovoitov <ast@kernel.org> Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
*-. Merge branches 'doc.2023.05.10a', 'fixes.2023.05.11a', 'kvfree.2023.05.10a', ↵Paul E. McKenney2023-06-071-4/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'nocb.2023.05.11a', 'rcu-tasks.2023.05.10a', 'torture.2023.05.15a' and 'rcu-urgent.2023.06.06a' into HEAD doc.2023.05.10a: Documentation updates fixes.2023.05.11a: Miscellaneous fixes kvfree.2023.05.10a: kvfree_rcu updates nocb.2023.05.11a: Callback-offloading updates rcu-tasks.2023.05.10a: Tasks RCU updates torture.2023.05.15a: Torture-test updates rcu-urgent.2023.06.06a: Urgent SRCU fix
| | * rcu-tasks: Clarify the cblist_init_generic() function's pr_info() outputZqiang2023-05-091-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | This commit uses rtp->name instead of __func__ and outputs the value of rcu_task_cb_adjust, thus reducing console-log output. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| | * rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()Shigeru Yoshida2023-05-091-1/+4
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pr_info() is called with rtp->cbs_gbl_lock spin lock locked. Because pr_info() calls printk() that might sleep, this will result in BUG like below: [ 0.206455] cblist_init_generic: Setting adjustable number of callback queues. [ 0.206463] [ 0.206464] ============================= [ 0.206464] [ BUG: Invalid wait context ] [ 0.206465] 5.19.0-00428-g9de1f9c8ca51 #5 Not tainted [ 0.206466] ----------------------------- [ 0.206466] swapper/0/1 is trying to lock: [ 0.206467] ffffffffa0167a58 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x327/0x4a0 [ 0.206473] other info that might help us debug this: [ 0.206473] context-{5:5} [ 0.206474] 3 locks held by swapper/0/1: [ 0.206474] #0: ffffffff9eb597e0 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic.constprop.0+0x14/0x1f0 [ 0.206478] #1: ffffffff9eb579c0 (console_lock){+.+.}-{0:0}, at: _printk+0x63/0x7e [ 0.206482] #2: ffffffff9ea77780 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.0+0x111/0x330 [ 0.206485] stack backtrace: [ 0.206486] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-00428-g9de1f9c8ca51 #5 [ 0.206488] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 [ 0.206489] Call Trace: [ 0.206490] <TASK> [ 0.206491] dump_stack_lvl+0x6a/0x9f [ 0.206493] __lock_acquire.cold+0x2d7/0x2fe [ 0.206496] ? stack_trace_save+0x46/0x70 [ 0.206497] lock_acquire+0xd1/0x2f0 [ 0.206499] ? serial8250_console_write+0x327/0x4a0 [ 0.206500] ? __lock_acquire+0x5c7/0x2720 [ 0.206502] _raw_spin_lock_irqsave+0x3d/0x90 [ 0.206504] ? serial8250_console_write+0x327/0x4a0 [ 0.206506] serial8250_console_write+0x327/0x4a0 [ 0.206508] console_emit_next_record.constprop.0+0x180/0x330 [ 0.206511] console_unlock+0xf7/0x1f0 [ 0.206512] vprintk_emit+0xf7/0x330 [ 0.206514] _printk+0x63/0x7e [ 0.206516] cblist_init_generic.constprop.0.cold+0x24/0x32 [ 0.206518] rcu_init_tasks_generic+0x5/0xd9 [ 0.206522] kernel_init_freeable+0x15b/0x2a2 [ 0.206523] ? rest_init+0x160/0x160 [ 0.206526] kernel_init+0x11/0x120 [ 0.206527] ret_from_fork+0x1f/0x30 [ 0.206530] </TASK> [ 0.207018] cblist_init_generic: Setting shift to 1 and lim to 1. This patch moves pr_info() so that it is called without rtp->cbs_gbl_lock locked. Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUsPaul E. McKenney2023-05-111-2/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_tasks_invoke_cbs() function relies on queue_work_on() to silently fall back to WORK_CPU_UNBOUND when the specified CPU is offline. However, the queue_work_on() function's silent fallback mechanism relies on that CPU having been online at some time in the past. When queue_work_on() is passed a CPU that has never been online, workqueue lockups ensue, which can be bad for your kernel's general health and well-being. This commit therefore checks whether a given CPU has ever been online, and, if not substitutes WORK_CPU_UNBOUND in the subsequent call to queue_work_on(). Why not simply omit the queue_work_on() call entirely? Because this function is flooding callback-invocation notifications to all CPUs, and must deal with possibilities that include a sparse cpu_possible_mask. This commit also moves the setting of the rcu_data structure's ->beenonline field to rcu_cpu_starting(), which executes on the incoming CPU before that CPU has ever enabled interrupts. This ensures that the required workqueues are present. In addition, because the incoming CPU has not yet enabled its interrupts, there cannot yet have been any softirq handlers running on this CPU, which means that the WARN_ON_ONCE(!rdp->beenonline) within the RCU_SOFTIRQ handler cannot have triggered yet. Fixes: d363f833c6d88 ("rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations") Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
*-. Merge branches 'rcu/staging-core', 'rcu/staging-docs' and ↵Joel Fernandes (Google)2023-04-051-0/+2
|\ \ | | | | | | | | | 'rcu/staging-kfree', remote-tracking branches 'paul/srcu-cf.2023.04.04a', 'fbq/rcu/lockdep.2023.03.27a' and 'fbq/rcu/rcutorture.2023.03.20a' into rcu/staging
| | * rcu-tasks: Fix warning for unused tasks_rcu_exit_srcuPaul E. McKenney2023-04-041-0/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tasks_rcu_exit_srcu variable is used only by kernels built with CONFIG_TASKS_RCU=y, but is defined for all kernesl with CONFIG_TASKS_RCU_GENERIC=y. Therefore, in kernels built with CONFIG_TASKS_RCU_GENERIC=y but CONFIG_TASKS_RCU=n, this gives a "defined but not used" warning. This commit therefore moves this variable under CONFIG_TASKS_RCU. Link: https://lore.kernel.org/oe-kbuild-all/202303191536.XzMSyzTl-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* / rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan()Neeraj Upadhyay2023-04-051-0/+31
|/ | | | | | | | | | | | | | | | | | The call to synchronize_srcu() from rcu_tasks_postscan() can be stalled by a task getting stuck in do_exit() between that function's calls to exit_tasks_rcu_start() and exit_tasks_rcu_finish(). To ease diagnosis of this situation, print a stall warning message every rcu_task_stall_info period when rcu_tasks_postscan() is stalled. [ paulmck: Adjust to handle CONFIG_SMP=n. ] Acked-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/rcu/20230111212736.GA1062057@paulmck-ThinkPad-P17-Gen-1/ Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
* rcu-tasks: Handle queue-shrink/callback-enqueue race conditionZqiang2023-01-031-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_tasks_need_gpcb() determines whether or not: (1) There are callbacks needing another grace period, (2) There are callbacks ready to be invoked, and (3) It would be a good time to shrink back down to a single-CPU callback list. This third case is interesting because some other CPU might be adding new callbacks, which might suddenly make this a very bad time to be shrinking. This is currently handled by requiring call_rcu_tasks_generic() to enqueue callbacks under the protection of rcu_read_lock() and requiring rcu_tasks_need_gpcb() to wait for an RCU grace period to elapse before finalizing the transition. This works well in practice. Unfortunately, the current code assumes that a grace period whose end is detected by the poll_state_synchronize_rcu() in the second "if" condition actually ended before the earlier code counted the callbacks queued on CPUs other than CPU 0 (local variable "ncbsnz"). Given the current code, it is possible that a long-delayed call_rcu_tasks_generic() invocation will queue a callback on a non-zero CPU after these CPUs have had their callbacks counted and zero has been stored to ncbsnz. Such a callback would trigger the WARN_ON_ONCE() in the second "if" statement. To see this, consider the following sequence of events: o CPU 0 invokes rcu_tasks_one_gp(), and counts fewer than rcu_task_collapse_lim callbacks. It sees at least one callback queued on some other CPU, thus setting ncbsnz to a non-zero value. o CPU 1 invokes call_rcu_tasks_generic() and loads 42 from ->percpu_enqueue_lim. It therefore decides to enqueue its callback onto CPU 1's callback list, but is delayed. o CPU 0 sees the rcu_task_cb_adjust is non-zero and that the number of callbacks does not exceed rcu_task_collapse_lim. It therefore checks percpu_enqueue_lim, and sees that its value is greater than the value one. CPU 0 therefore starts the shift back to a single callback list. It sets ->percpu_enqueue_lim to 1, but CPU 1 has already read the old value of 42. It also gets a grace-period state value from get_state_synchronize_rcu(). o CPU 0 sees that ncbsnz is non-zero in its second "if" statement, so it declines to finalize the shrink operation. o CPU 0 again invokes rcu_tasks_one_gp(), and counts fewer than rcu_task_collapse_lim callbacks. It also sees that there are no callback queued on any other CPU, and thus sets ncbsnz to zero. o CPU 1 resumes execution and enqueues its callback onto its own list. This invalidates the value of ncbsnz. o CPU 0 sees the rcu_task_cb_adjust is non-zero and that the number of callbacks does not exceed rcu_task_collapse_lim. It therefore checks percpu_enqueue_lim, but sees that its value is already unity. It therefore does not get a new grace-period state value. o CPU 0 sees that rcu_task_cb_adjust is non-zero, ncbsnz is zero, and that poll_state_synchronize_rcu() says that the grace period has completed. it therefore finalizes the shrink operation, setting ->percpu_dequeue_lim to the value one. o CPU 0 does a debug check, scanning the other CPUs' callback lists. It sees that CPU 1's list has a callback, so it (rightly) triggers the WARN_ON_ONCE(). After all, the new value of ->percpu_dequeue_lim says to not bother looking at CPU 1's callback list, which means that this callback will never be invoked. This can result in hangs and maybe even OOMs. Based on long experience with rcutorture, this is an extremely low-probability race condition, but it really can happen, especially in preemptible kernels or within guest OSes. This commit therefore checks for completion of the grace period before counting callbacks. With this change, in the above failure scenario CPU 0 would know not to prematurely end the shrink operation because the grace period would not have completed before the count operation started. [ paulmck: Adjust grace-period end rather than adding RCU reader. ] [ paulmck: Avoid spurious WARN_ON_ONCE() with ->percpu_dequeue_lim check. ] Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>