summaryrefslogtreecommitdiffstats
path: root/kernel/rcu/tree.c
Commit message (Collapse)AuthorAgeFilesLines
...
| * kernel/rcu/tree.c: Fix kernel-doc warningsRandy Dunlap2020-06-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Fix kernel-doc warning: ../kernel/rcu/tree.c:959: warning: Excess function parameter 'irq' description in 'rcu_nmi_enter' Fixes: cf7614e13c8f ("rcu: Refactor rcu_{nmi,irq}_{enter,exit}()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Stop shrinker loopPeter Enderborg2020-06-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | The count and scan can be separated in time, and there is a fair chance that all work is already done when the scan starts, which might in turn result in a needless retry. This commit therefore avoids this retry by returning SHRINK_STOP. Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Peter Enderborg <peter.enderborg@sony.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Mark rcu_nmi_enter() call to rcu_cleanup_after_idle() noinstrPaul E. McKenney2020-06-291-1/+4
| | | | | | | | | | | | | | | | | | The objtool complains about the call to rcu_cleanup_after_idle() from rcu_nmi_enter(), so this commit adds instrumentation_begin() before that call and instrumentation_end() after it. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Grace-period-kthread related sleeps to idle priorityPaul E. McKenney2020-06-291-3/+3
| | | | | | | | | | | | | | | | | | This commit converts the long-standing schedule_timeout_interruptible() and schedule_timeout_uninterruptible() calls used by RCU's grace-period kthread to schedule_timeout_idle(). This conversion avoids polluting the load-average with RCU-related sleeping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Add callbacks-invoked countersPaul E. McKenney2020-06-291-0/+1
| | | | | | | | | | | | | | | | | | This commit adds a count of the callbacks invoked to the per-CPU rcu_data structure. This count is printed by the show_rcu_gp_kthreads() that is invoked by rcutorture and the RCU CPU stall-warning code. It is also intended for use by drgn. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Simplify the calculation of rcu_state.ncpusWei Yang2020-06-291-6/+3
|/ | | | | | | | | | There is only 1 bit set in mask, which means that the only difference between oldmask and the new one will be at the position where the bit is set in mask. This commit therefore updates rcu_state.ncpus by checking whether the bit in mask is already set in rnp->expmaskinitnext. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* rcu: Fixup noinstr warningsPeter Zijlstra2020-06-251-7/+25
| | | | | | | | | | | | | | | | | | A KCSAN build revealed we have explicit annoations through atomic_*() usage, switch to arch_atomic_*() for the respective functions. vmlinux.o: warning: objtool: rcu_nmi_exit()+0x4d: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: rcu_dynticks_eqs_enter()+0x25: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: rcu_nmi_enter()+0x4f: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0x2a: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: __rcu_is_watching()+0x25: call to __kcsan_check_access() leaves .noinstr.text section Additionally, without the NOP in instrumentation_begin(), objtool would not detect the lack of the 'else instrumentation_begin();' branch in rcu_nmi_enter(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* Merge branch 'WIP.core/rcu' into core/rcu, to pick up two x86/entry dependenciesIngo Molnar2020-06-011-20/+80
|\ | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * rcu: Provide rcu_irq_exit_check_preempt()Thomas Gleixner2020-05-261-0/+18
| | | | | | | | | | | | | | | | | | | | | | Provide a debug check which can be invoked from exception return to kernel mode before an attempt is made to schedule. Warn if RCU is not ready for this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20200521202117.089709607@linutronix.de
| * rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()Paul E. McKenney2020-05-261-20/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There will likely be exception handlers that can sleep, which rules out the usual approach of invoking rcu_nmi_enter() on entry and also rcu_nmi_exit() on all exit paths. However, the alternative approach of just not calling anything can prevent RCU from coaxing quiescent states from nohz_full CPUs that are looping in the kernel: RCU must instead IPI them explicitly. It would be better to enable the scheduler tick on such CPUs to interact with RCU in a lighter-weight manner, and this enabling is one of the things that rcu_nmi_enter() currently does. What is needed is something that helps RCU coax quiescent states while not preventing subsequent sleeps. This commit therefore splits out the nohz_full scheduler-tick enabling from the rest of the rcu_nmi_enter() logic into a new function named rcu_irq_enter_check_tick(). [ tglx: Renamed the function and made it a nop when context tracking is off ] [ mingo: Fixed a CONFIG_NO_HZ_FULL assumption, harmonized and fixed all the comment blocks and cleaned up rcu_nmi_enter()/exit() definitions. ] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200521202116.996113173@linutronix.de
* | rcu: Allow for smp_call_function() running callbacks from idlePeter Zijlstra2020-05-281-6/+19
|/ | | | | | | | | | | | | | | | | Current RCU hard relies on smp_call_function() callbacks running from interrupt context. A pending optimization is going to break that, it will allow idle CPUs to run the callbacks from the idle loop. This avoids raising the IPI on the requesting CPU and avoids handling an exception on the receiving CPU. Change rcu_is_cpu_rrupt_from_idle() to also accept task context, provided it is the idle task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20200527171236.GC706495@hirez.programming.kicks-ass.net
* rcu: Provide __rcu_is_watching()Thomas Gleixner2020-05-191-0/+5
| | | | | | | | | | | | | | | | Same as rcu_is_watching() but without the preempt_disable/enable() pair inside the function. It is merked noinstr so it ends up in the non-instrumentable text section. This is useful for non-preemptible code especially in the low level entry section. Using rcu_is_watching() there results in a call to the preempt_schedule_notrace() thunk which triggers noinstr section warnings in objtool. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200512213810.518709291@linutronix.de
* rcu: Provide rcu_irq_exit_preempt()Thomas Gleixner2020-05-191-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Interrupts and exceptions invoke rcu_irq_enter() on entry and need to invoke rcu_irq_exit() before they either return to the interrupted code or invoke the scheduler due to preemption. The general assumption is that RCU idle code has to have preemption disabled so that a return from interrupt cannot schedule. So the return from interrupt code invokes rcu_irq_exit() and preempt_schedule_irq(). If there is any imbalance in the rcu_irq/nmi* invocations or RCU idle code had preemption enabled then this goes unnoticed until the CPU goes idle or some other RCU check is executed. Provide rcu_irq_exit_preempt() which can be invoked from the interrupt/exception return code in case that preemption is enabled. It invokes rcu_irq_exit() and contains a few sanity checks in case that CONFIG_PROVE_RCU is enabled to catch such issues directly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134904.364456424@linutronix.de
* rcu: Make RCU IRQ enter/exit functions rely on in_nmi()Paul E. McKenney2020-05-191-32/+15
| | | | | | | | | | | | | | | | | | | | | | | | The rcu_nmi_enter_common() and rcu_nmi_exit_common() functions take an "irq" parameter that indicates whether these functions have been invoked from an irq handler (irq==true) or an NMI handler (irq==false). However, recent changes have applied notrace to a few critical functions such that rcu_nmi_enter_common() and rcu_nmi_exit_common() many now rely on in_nmi(). Note that in_nmi() works no differently than before, but rather that tracing is now prohibited in code regions where in_nmi() would incorrectly report NMI state. Therefore remove the "irq" parameter and inline rcu_nmi_enter_common() and rcu_nmi_exit_common() into rcu_nmi_enter() and rcu_nmi_exit(), respectively. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/20200505134101.617130349@linutronix.de
* rcu/tree: Mark the idle relevant functions noinstrThomas Gleixner2020-05-191-37/+46
| | | | | | | | | | | | | | | | | These functions are invoked from context tracking and other places in the low level entry code. Move them into the .noinstr.text section to exclude them from instrumentation. Mark the places which are safe to invoke traceable functions with instrumentation_begin/end() so objtool won't complain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/20200505134100.575356107@linutronix.de
*-----. Merge branches 'fixes.2020.04.27a', 'kfree_rcu.2020.04.27a', ↵Paul E. McKenney2020-05-071-4/+134
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'rcu-tasks.2020.04.27a', 'stall.2020.04.27a' and 'torture.2020.05.07a' into HEAD fixes.2020.04.27a: Miscellaneous fixes. kfree_rcu.2020.04.27a: Changes related to kfree_rcu(). rcu-tasks.2020.04.27a: Addition of new RCU-tasks flavors. stall.2020.04.27a: RCU CPU stall-warning updates. torture.2020.05.07a: Torture-test updates.
| | | | * rcu: Allow rcutorture to starve grace-period kthreadPaul E. McKenney2020-05-071-0/+27
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit provides an rcutorture.stall_gp_kthread module parameter to allow rcutorture to starve the grace-period kthread. This allows testing the code that detects such starvation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| | * | rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so builtPaul E. McKenney2020-04-271-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Systems running CPU-bound real-time task do not want IPIs sent to CPUs executing nohz_full userspace tasks. Battery-powered systems don't want IPIs sent to idle CPUs in low-power mode. Unfortunately, RCU tasks trace can and will send such IPIs in some cases. Both of these situations occur only when the target CPU is in RCU dyntick-idle mode, in other words, when RCU is not watching the target CPU. This suggests that CPUs in dyntick-idle mode should use memory barriers in outermost invocations of rcu_read_lock_trace() and rcu_read_unlock_trace(), which would allow the RCU tasks trace grace period to directly read out the target CPU's read-side state. One challenge is that RCU tasks trace is not targeting a specific CPU, but rather a task. And that task could switch from one CPU to another at any time. This commit therefore uses try_invoke_on_locked_down_task() and checks for task_curr() in trc_inspect_reader_notrunning(). When this condition holds, the target task is running and cannot move. If CONFIG_TASKS_TRACE_RCU_READ_MB=y, the new rcu_dynticks_zero_in_eqs() function can be used to check if the specified integer (in this case, t->trc_reader_nesting) is zero while the target CPU remains in that same dyntick-idle sojourn. If so, the target task is in a quiescent state. If not, trc_read_check_handler() must indicate failure so that the grace-period kthread can take appropriate action or retry after an appropriate delay, as the case may be. With this change, given CONFIG_TASKS_TRACE_RCU_READ_MB=y, if a given CPU remains idle or a given task continues executing in nohz_full mode, the RCU tasks trace grace-period kthread will detect this without the need to send an IPI. Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| | * | rcu: Add comments marking transitions between RCU watching and notPaul E. McKenney2020-04-271-4/+25
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | It is not as clear as it might be just where in RCU's idle entry/exit code RCU stops and starts watching the current CPU. This commit therefore adds comments calling out the transitions. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * | rcu/tree: Count number of batched kfree_rcu() locklesslyJoel Fernandes (Google)2020-04-271-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can relax the correctness of counting of number of queued objects in favor of not hurting performance, by locklessly sampling per-cpu counters. This should be Ok since under high memory pressure, it should not matter if we are off by a few objects while counting. The shrinker will still do the reclaim. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Remove unused "flags" variable. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * | rcu/tree: Add a shrinker to prevent OOM due to kfree_rcu() batchingJoel Fernandes (Google)2020-04-271-0/+60
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To reduce grace periods and improve kfree() performance, we have done batching recently dramatically bringing down the number of grace periods while giving us the ability to use kfree_bulk() for efficient kfree'ing. However, this has increased the likelihood of OOM condition under heavy kfree_rcu() flood on small memory systems. This patch introduces a shrinker which starts grace periods right away if the system is under memory pressure due to existence of objects that have still not started a grace period. With this patch, I do not observe an OOM anymore on a system with 512MB RAM and 8 CPUs, with the following rcuperf options: rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 rcuperf.kfree_mult=2 Otherwise it easily OOMs with the above parameters. NOTE: 1. On systems with no memory pressure, the patch has no effect as intended. 2. In the future, we can use this same mechanism to prevent grace periods from happening even more, by relying on shrinkers carefully. Cc: urezki@gmail.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* | rcu: Convert ULONG_CMP_GE() to time_after() for jiffy comparisonPaul E. McKenney2020-04-271-1/+1
| | | | | | | | | | | | | | | | This commit converts the ULONG_CMP_GE() in rcu_gp_fqs_loop() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* | rcu: Replace 1 by trueJules Irenge2020-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Coccinelle reports a warning at use_softirq declaration WARNING: Assignment of 0/1 to bool variable The root cause is use_softirq a variable of bool type is initialised with the integer 1 Replacing 1 with value true solve the issue. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* | rcu: Mark rcu_state.gp_seq to detect more concurrent writesPaul E. McKenney2020-04-271-1/+3
| | | | | | | | | | | | | | | | | | The rcu_state structure's gp_seq field is only to be modified by the RCU grace-period kthread, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. This commit applies KCSAN-specific primitives, so cannot go upstream until KCSAN does. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* | rcu: Expedite first two FQS scans under callback-overload conditionsPaul E. McKenney2020-04-271-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Even if some CPUs have excessive numbers of callbacks, RCU's grace-period kthread will still wait normally between successive force-quiescent-state scans. The first two are the most important, as they are the ones that enlist aid from the scheduler when overloaded. This commit therefore omits the wait before the first and the second force-quiescent-state scan under callback-overload conditions. This approach was inspired by a discussion with Jeff Roberson. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* | rcu: Mark rcu_state.ncpus to detect concurrent writesPaul E. McKenney2020-04-271-0/+1
|/ | | | | | | | The rcu_state structure's ncpus field is only to be modified by the CPU-hotplug CPU-online code path, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* rcu: Add KCSAN stubsPaul E. McKenney2020-04-271-0/+13
| | | | | | | | This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
* Merge branch 'urgent-for-mingo' of ↵Ingo Molnar2020-04-141-1/+1
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull RCU fix from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * rcu: Don't acquire lock in NMI handler in rcu_nmi_enter_common()Paul E. McKenney2020-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_nmi_enter_common() function can be invoked both in interrupt and NMI handlers. If it is invoked from process context (as opposed to userspace or idle context) on a nohz_full CPU, it might acquire the CPU's leaf rcu_node structure's ->lock. Because this lock is held only with interrupts disabled, this is safe from an interrupt handler, but doing so from an NMI handler can result in self-deadlock. This commit therefore adds "irq" to the "if" condition so as to only acquire the ->lock from irq handlers or process context, never from an NMI handler. Fixes: 5b14557b073c ("rcu: Avoid tick_dep_set_cpu() misordering") Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> # 5.5.x
* | Merge branch 'locking-core-for-linus' of ↵Linus Torvalds2020-03-301-0/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Continued user-access cleanups in the futex code. - percpu-rwsem rewrite that uses its own waitqueue and atomic_t instead of an embedded rwsem. This addresses a couple of weaknesses, but the primary motivation was complications on the -rt kernel. - Introduce raw lock nesting detection on lockdep (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal lock differences. This too originates from -rt. - Reuse lockdep zapped chain_hlocks entries, to conserve RAM footprint on distro-ish kernels running into the "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep chain-entries pool. - Misc cleanups, smaller fixes and enhancements - see the changelog for details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t Documentation/locking/locktypes: Minor copy editor fixes Documentation/locking/locktypes: Further clarifications and wordsmithing m68knommu: Remove mm.h include from uaccess_no.h x86: get rid of user_atomic_cmpxchg_inatomic() generic arch_futex_atomic_op_inuser() doesn't need access_ok() x86: don't reload after cmpxchg in unsafe_atomic_op2() loop x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end() objtool: whitelist __sanitizer_cov_trace_switch() [parisc, s390, sparc64] no need for access_ok() in futex handling sh: no need of access_ok() in arch_futex_atomic_op_inuser() futex: arch_futex_atomic_op_inuser() calling conventions change completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() lockdep: Add posixtimer context tracing bits lockdep: Annotate irq_work lockdep: Add hrtimer context tracing bits lockdep: Introduce wait-type checks completion: Use simple wait queues sched/swait: Prepare usage in completions ...
| * lockdep: Annotate irq_workSebastian Andrzej Siewior2020-03-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be invoked in softirq context on PREEMPT_RT. Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq context so lockdep knows that these can safely acquire a spinlock_t. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200321113242.643576700@linutronix.de
| |
| \
| \
| \
*---. \ Merge branches 'doc.2020.02.27a', 'fixes.2020.03.21a', ↵Paul E. McKenney2020-03-211-113/+339
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'kfree_rcu.2020.02.20a', 'locktorture.2020.02.20a', 'ovld.2020.02.20a', 'rcu-tasks.2020.02.20a', 'srcu.2020.02.20a' and 'torture.2020.02.20a' into HEAD doc.2020.02.27a: Documentation updates. fixes.2020.03.21a: Miscellaneous fixes. kfree_rcu.2020.02.20a: Updates to kfree_rcu(). locktorture.2020.02.20a: Lock torture-test updates. ovld.2020.02.20a: Updates to callback-overload handling. rcu-tasks.2020.02.20a: RCU-tasks updates. srcu.2020.02.20a: SRCU updates. torture.2020.02.20a: Torture-test updates.
| | | * rcu: Update __call_rcu() commentsPaul E. McKenney2020-02-201-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __call_rcu() function's header comment refers to a cpu argument that no longer exists, and the comment of the return path from rcu_nocb_try_bypass() ignores the non-no-CBs CPU case. This commit therefore update both comments. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| | | * rcu: React to callback overload by aggressively seeking quiescent statesPaul E. McKenney2020-02-201-4/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In default configutions, RCU currently waits at least 100 milliseconds before asking cond_resched() and/or resched_rcu() for help seeking quiescent states to end a grace period. But 100 milliseconds can be one good long time during an RCU callback flood, for example, as can happen when user processes repeatedly open and close files in a tight loop. These 100-millisecond gaps in successive grace periods during a callback flood can result in excessive numbers of callbacks piling up, unnecessarily increasing memory footprint. This commit therefore asks cond_resched() and/or resched_rcu() for help as early as the first FQS scan when at least one of the CPUs has more than 20,000 callbacks queued, a number that can be changed using the new rcutree.qovld kernel boot parameter. An auxiliary qovld_calc variable is used to avoid acquisition of locks that have not yet been initialized. Early tests indicate that this reduces the RCU-callback memory footprint during rcutorture floods by from 50% to 4x, depending on configuration. Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: Tejun Heo <tj@kernel.org> [ paulmck: Fix bug located by Qian Cai. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Dexuan Cui <decui@microsoft.com> Tested-by: Qian Cai <cai@lca.pw>
| | | * rcu: Clear ->core_needs_qs at GP end or self-reported QSPaul E. McKenney2020-02-201-4/+9
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_data structure's ->core_needs_qs field does not necessarily get cleared in a timely fashion after the corresponding CPUs' quiescent state has been reported. From a functional viewpoint, no harm done, but this can result in excessive invocation of RCU core processing, as witnessed by the kernel test robot, which saw greatly increased softirq overhead. This commit therefore restores the rcu_report_qs_rdp() function's clearing of this field, but only when running on the corresponding CPU. Cases where some other CPU reports the quiescent state (for example, on behalf of an idle CPU) are handled by setting this field appropriately within the __note_gp_changes() function's end-of-grace-period checks. This handling is carried out regardless of whether the end of a grace period actually happened, thus handling the case where a CPU goes non-idle after a quiescent state is reported on its behalf, but before the grace period ends. This fix also avoids cross-CPU updates to ->core_needs_qs, While in the area, this commit changes the __note_gp_changes() need_gp variable's name to need_qs because it is a quiescent state that is needed from the CPU in question. Fixes: ed93dfc6bc00 ("rcu: Confine ->core_needs_qs accesses to the corresponding CPU") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| | * rcu: Add a trace event for kfree_rcu() use of kfree_bulk()Uladzislau Rezki (Sony)2020-02-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The event is given three parameters, first one is the name of RCU flavour, second one is the number of elements in array for free and last one is an address of the array holding pointers to be freed by the kfree_bulk() function. To enable the trace event your kernel has to be build with CONFIG_RCU_TRACE=y, after that it is possible to track the events using ftrace subsystem. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| | * rcu: Support kfree_bulk() interface in kfree_rcu()Uladzislau Rezki (Sony)2020-02-201-35/+169
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kfree_rcu() logic can be improved further by using kfree_bulk() interface along with "basic batching support" introduced earlier. The are at least two advantages of using "bulk" interface: - in case of large number of kfree_rcu() requests kfree_bulk() reduces the per-object overhead caused by calling kfree() per-object. - reduces the number of cache-misses due to "pointer chasing" between objects which can be far spread between each other. This approach defines a new kfree_rcu_bulk_data structure that stores pointers in an array with a specific size. Number of entries in that array depends on PAGE_SIZE making kfree_rcu_bulk_data structure to be exactly one page. Since it deals with "block-chain" technique there is an extra need in dynamic allocation when a new block is required. Memory is allocated with GFP_NOWAIT | __GFP_NOWARN flags, i.e. that allows to skip direct reclaim under low memory condition to prevent stalling and fails silently under high memory pressure. The "emergency path" gets maintained when a system is run out of memory. In that case objects are linked into regular list. The "rcuperf" was run to analyze this change in terms of memory consumption and kfree_bulk() throughput. 1) Testing on the Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz, 12xCPUs with following parameters: kfree_loops=200000 kfree_alloc_num=1000 kfree_rcu_test=1 kfree_vary_obj_size=1 dev.2020.01.10a branch Default / CONFIG_SLAB 53607352517 ns, loops: 200000, batches: 1885, memory footprint: 1248MB 53529637912 ns, loops: 200000, batches: 1921, memory footprint: 1193MB 53570175705 ns, loops: 200000, batches: 1929, memory footprint: 1250MB Patch / CONFIG_SLAB 23981587315 ns, loops: 200000, batches: 810, memory footprint: 1219MB 23879375281 ns, loops: 200000, batches: 822, memory footprint: 1190MB 24086841707 ns, loops: 200000, batches: 794, memory footprint: 1380MB Default / CONFIG_SLUB 51291025022 ns, loops: 200000, batches: 1713, memory footprint: 741MB 51278911477 ns, loops: 200000, batches: 1671, memory footprint: 719MB 51256183045 ns, loops: 200000, batches: 1719, memory footprint: 647MB Patch / CONFIG_SLUB 50709919132 ns, loops: 200000, batches: 1618, memory footprint: 456MB 50736297452 ns, loops: 200000, batches: 1633, memory footprint: 507MB 50660403893 ns, loops: 200000, batches: 1628, memory footprint: 429MB in case of CONFIG_SLAB there is double increase in performance and slightly higher memory usage. As for CONFIG_SLUB, the performance figures are better together with lower memory usage. 2) Testing on the HiKey-960, arm64, 8xCPUs with below parameters: CONFIG_SLAB=y kfree_loops=200000 kfree_alloc_num=1000 kfree_rcu_test=1 102898760401 ns, loops: 200000, batches: 5822, memory footprint: 158MB 89947009882 ns, loops: 200000, batches: 6715, memory footprint: 115MB rcuperf shows approximately ~12% better throughput in case of using "bulk" interface. The "drain logic" or its RCU callback does the work faster that leads to better throughput. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Make rcu_barrier() account for offline no-CBs CPUsPaul E. McKenney2020-03-211-12/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, rcu_barrier() ignores offline CPUs, However, it is possible for an offline no-CBs CPU to have callbacks queued, and rcu_barrier() must wait for those callbacks. This commit therefore makes rcu_barrier() directly invoke the rcu_barrier_func() with interrupts disabled for such CPUs. This requires passing the CPU number into this function so that it can entrain the rcu_barrier() callback onto the correct CPU's callback list, given that the code must instead execute on the current CPU. While in the area, this commit fixes a bug where the first CPU's callback might have been invoked before rcu_segcblist_entrain() returned, which would also result in an early wakeup. Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Apply optimization feedback from Boqun Feng. ] Cc: <stable@vger.kernel.org> # 5.5.x
| * rcu: Mark rcu_state.gp_seq to detect concurrent writesPaul E. McKenney2020-03-211-14/+8
| | | | | | | | | | | | | | | | The rcu_state structure's gp_seq field is only to be modified by the RCU grace-period kthread, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Optimize and protect atomic_cmpxchg() loopPaul E. McKenney2020-02-201-2/+5
| | | | | | | | | | | | | | | | | | This commit reworks the atomic_cmpxchg() loop in rcu_eqs_special_set() to do only the initial read from the current CPU's rcu_data structure's ->dynticks field explicitly. On subsequent passes, this value is instead retained from the failing atomic_cmpxchg() operation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Don't flag non-starting GPs before GP kthread is runningPaul E. McKenney2020-02-201-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently rcu_check_gp_start_stall() complains if a grace period takes too long to start, where "too long" is roughly one RCU CPU stall-warning interval. This has worked well, but there are some debugging Kconfig options (such as CONFIG_EFI_PGT_DUMP=y) that can make booting take a very long time, so much so that the stall-warning interval has expired before RCU's grace-period kthread has even been spawned. This commit therefore resets the rcu_state.gp_req_activity and rcu_state.gp_activity timestamps just before the grace-period kthread is spawned, and modifies the checks and adds ordering to ensure that if rcu_check_gp_start_stall() sees that the grace-period kthread has been spawned, that it will also see the resets applied to the rcu_state.gp_req_activity and rcu_state.gp_activity timestamps. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Fix whitespace issues reported by Qian Cai. ] Tested-by: Qian Cai <cai@lca.pw> [ paulmck: Simplify grace-period wakeup check per Steve Rostedt feedback. ]
| * rcu: Fix rcu_barrier_callback() race conditionPaul E. McKenney2020-02-201-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_barrier_callback() function does an atomic_dec_and_test(), and if it is the last CPU to check in, does the required wakeup. Either way, it does an event trace. Unfortunately, this is susceptible to the following sequence of events: o CPU 0 invokes rcu_barrier_callback(), but atomic_dec_and_test() says that it is not last. But at this point, CPU 0 is delayed, perhaps due to an NMI, SMI, or vCPU preemption. o CPU 1 invokes rcu_barrier_callback(), and atomic_dec_and_test() says that it is last. So CPU 1 traces completion and does the needed wakeup. o The awakened rcu_barrier() function does cleanup and releases rcu_state.barrier_mutex. o Another CPU now acquires rcu_state.barrier_mutex and starts another round of rcu_barrier() processing, including updating rcu_state.barrier_sequence. o CPU 0 gets its act back together and does its tracing. Except that rcu_state.barrier_sequence has already been updated, so its tracing is incorrect and probably quite confusing. (Wait! Why did this CPU check in twice for one rcu_barrier() invocation???) This commit therefore causes rcu_barrier_callback() to take a snapshot of the value of rcu_state.barrier_sequence before invoking atomic_dec_and_test(), thus guaranteeing that the event-trace output is sensible, even if the timing of the event-trace output might still be confusing. (Wait! Why did the old rcu_barrier() complete before all of its CPUs checked in???) But being that this is RCU, only so much confusion can reasonably be eliminated. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to the mild consequences of the failure, namely a confusing event trace. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Add *_ONCE() to rcu_data ->rcu_forced_tickPaul E. McKenney2020-02-201-3/+4
| | | | | | | | | | | | | | | | | | | | | | The rcu_data structure's ->rcu_forced_tick field is read locklessly, so this commit adds WRITE_ONCE() to all updates and READ_ONCE() to all lockless reads. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Add READ_ONCE() to rcu_data ->gpwrapPaul E. McKenney2020-02-201-1/+1
| | | | | | | | | | | | | | | | | | | | The rcu_data structure's ->gpwrap field is read locklessly, and so this commit adds the required READ_ONCE() to a pair of laods in order to avoid destructive compiler optimizations. This data race was reported by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Fix typos in file-header commentsSeongJae Park2020-02-201-2/+2
| | | | | | | | | | | | | | Convert to plural and add a note that this is for Tree RCU. Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Add *_ONCE() for grace-period progress indicatorsPaul E. McKenney2020-02-201-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | The various RCU structures' ->gp_seq, ->gp_seq_needed, ->gp_req_activity, and ->gp_activity fields are read locklessly, so they must be updated with WRITE_ONCE() and, when read locklessly, with READ_ONCE(). This commit makes these changes. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Add WRITE_ONCE() to rcu_node ->qsmaskinitnextPaul E. McKenney2020-02-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The rcu_state structure's ->qsmaskinitnext field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely for systems not doing incessant CPU-hotplug operations. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Add WRITE_ONCE() to rcu_state ->gp_req_activityPaul E. McKenney2020-02-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | The rcu_state structure's ->gp_req_activity field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Add READ_ONCE() to rcu_node ->gp_seqPaul E. McKenney2020-02-201-4/+5
| | | | | | | | | | | | | | | | | | | | | | The rcu_node structure's ->gp_seq field is read locklessly, so this commit adds the READ_ONCE() to several loads in order to avoid destructive compiler optimizations. This data race was reported by KCSAN. Not appropriate for backporting because this affects only tracing and warnings. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
| * rcu: Add WRITE_ONCE() to rcu_node ->qsmask updatePaul E. McKenney2020-02-201-1/+1
|/ | | | | | | | | | | The rcu_node structure's ->qsmask field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>