From 4fea6ef0b219d66b8a901fea1744745a1ed2f79b Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 9 Jan 2019 14:48:09 -0800 Subject: doc: Remove obsolete RCU update functions from RCU documentation Now that synchronize_rcu_bh, synchronize_rcu_bh_expedited, call_rcu_bh, rcu_barrier_bh, synchronize_sched, synchronize_sched_expedited, call_rcu_sched, rcu_barrier_sched, get_state_synchronize_sched, and cond_synchronize_sched are obsolete, let's remove them from the documentation aside from a small historical section. Signed-off-by: Paul E. McKenney --- .../Design/Data-Structures/Data-Structures.html | 3 +- .../Expedited-Grace-Periods.html | 4 +- .../Memory-Ordering/Tree-RCU-Memory-Ordering.html | 5 +- Documentation/RCU/NMI-RCU.txt | 13 ++-- Documentation/RCU/UP.txt | 6 +- Documentation/RCU/checklist.txt | 76 ++++++++++------------ Documentation/RCU/rcu.txt | 8 +-- Documentation/RCU/rcubarrier.txt | 27 ++++---- 8 files changed, 66 insertions(+), 76 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 18f179807563..c30c1957c7e6 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -155,8 +155,7 @@ keeping lock contention under control at all tree levels regardless of the level of loading on the system.

RCU updaters wait for normal grace periods by registering -RCU callbacks, either directly via call_rcu() and -friends (namely call_rcu_bh() and call_rcu_sched()), +RCU callbacks, either directly via call_rcu() or indirectly via synchronize_rcu() and friends. RCU callbacks are represented by rcu_head structures, which are queued on rcu_data structures while they are diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html index 19e7a5fb6b73..57300db4b5ff 100644 --- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html +++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html @@ -56,6 +56,7 @@ sections. RCU-preempt Expedited Grace Periods

+CONFIG_PREEMPT=y kernels implement RCU-preempt. The overall flow of the handling of a given CPU by an RCU-preempt expedited grace period is shown in the following diagram: @@ -139,6 +140,7 @@ or offline, among other things. RCU-sched Expedited Grace Periods

+CONFIG_PREEMPT=n kernels implement RCU-sched. The overall flow of the handling of a given CPU by an RCU-sched expedited grace period is shown in the following diagram: @@ -146,7 +148,7 @@ expedited grace period is shown in the following diagram:

As with RCU-preempt, RCU-sched's -synchronize_sched_expedited() ignores offline and +synchronize_rcu_expedited() ignores offline and idle CPUs, again because they are in remotely detectable quiescent states. However, because the diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html index 8d21af02b1f0..c64f8d26609f 100644 --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html @@ -34,12 +34,11 @@ Similarly, any code that happens before the beginning of a given RCU grace period is guaranteed to see the effects of all accesses following the end of that grace period that are within RCU read-side critical sections. -

This guarantee is particularly pervasive for synchronize_sched(), -for which RCU-sched read-side critical sections include any region +

Note well that RCU-sched read-side critical sections include any region of code for which preemption is disabled. Given that each individual machine instruction can be thought of as an extremely small region of preemption-disabled code, one can think of -synchronize_sched() as smp_mb() on steroids. +synchronize_rcu() as smp_mb() on steroids.

RCU updaters use this guarantee by splitting their updates into two phases, one of which is executed before the grace period and diff --git a/Documentation/RCU/NMI-RCU.txt b/Documentation/RCU/NMI-RCU.txt index 687777f83b23..881353fd5bff 100644 --- a/Documentation/RCU/NMI-RCU.txt +++ b/Documentation/RCU/NMI-RCU.txt @@ -81,18 +81,19 @@ currently executing on some other CPU. We therefore cannot free up any data structures used by the old NMI handler until execution of it completes on all other CPUs. -One way to accomplish this is via synchronize_sched(), perhaps as +One way to accomplish this is via synchronize_rcu(), perhaps as follows: unset_nmi_callback(); - synchronize_sched(); + synchronize_rcu(); kfree(my_nmi_data); -This works because synchronize_sched() blocks until all CPUs complete -any preemption-disabled segments of code that they were executing. -Since NMI handlers disable preemption, synchronize_sched() is guaranteed +This works because (as of v4.20) synchronize_rcu() blocks until all +CPUs complete any preemption-disabled segments of code that they were +executing. +Since NMI handlers disable preemption, synchronize_rcu() is guaranteed not to return until all ongoing NMI handlers exit. It is therefore safe -to free up the handler's data as soon as synchronize_sched() returns. +to free up the handler's data as soon as synchronize_rcu() returns. Important note: for this to work, the architecture in question must invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively. diff --git a/Documentation/RCU/UP.txt b/Documentation/RCU/UP.txt index 90ec5341ee98..53bde717017b 100644 --- a/Documentation/RCU/UP.txt +++ b/Documentation/RCU/UP.txt @@ -86,10 +86,8 @@ even on a UP system. So do not do it! Even on a UP system, the RCU infrastructure -must- respect grace periods, and -must- invoke callbacks from a known environment in which no locks are held. -It -is- safe for synchronize_sched() and synchronize_rcu_bh() to return -immediately on an UP system. It is also safe for synchronize_rcu() -to return immediately on UP systems, except when running preemptable -RCU. +Note that it -is- safe for synchronize_rcu() to return immediately on +UP systems, including !PREEMPT SMP builds running on UP systems. Quick Quiz #3: Why can't synchronize_rcu() return immediately on UP systems running preemptable RCU? diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 6f469864d9f5..fcc59fea5cd4 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -182,16 +182,13 @@ over a rather long period of time, but improvements are always welcome! when publicizing a pointer to a structure that can be traversed by an RCU read-side critical section. -5. If call_rcu(), or a related primitive such as call_rcu_bh(), - call_rcu_sched(), or call_srcu() is used, the callback function - will be called from softirq context. In particular, it cannot - block. +5. If call_rcu() or call_srcu() is used, the callback function will + be called from softirq context. In particular, it cannot block. -6. Since synchronize_rcu() can block, it cannot be called from - any sort of irq context. The same rule applies for - synchronize_rcu_bh(), synchronize_sched(), synchronize_srcu(), - synchronize_rcu_expedited(), synchronize_rcu_bh_expedited(), - synchronize_sched_expedite(), and synchronize_srcu_expedited(). +6. Since synchronize_rcu() can block, it cannot be called + from any sort of irq context. The same rule applies + for synchronize_srcu(), synchronize_rcu_expedited(), and + synchronize_srcu_expedited(). The expedited forms of these primitives have the same semantics as the non-expedited forms, but expediting is both expensive and @@ -212,20 +209,20 @@ over a rather long period of time, but improvements are always welcome! of the system, especially to real-time workloads running on the rest of the system. -7. If the updater uses call_rcu() or synchronize_rcu(), then the - corresponding readers must use rcu_read_lock() and - rcu_read_unlock(). If the updater uses call_rcu_bh() or - synchronize_rcu_bh(), then the corresponding readers must - use rcu_read_lock_bh() and rcu_read_unlock_bh(). If the - updater uses call_rcu_sched() or synchronize_sched(), then - the corresponding readers must disable preemption, possibly - by calling rcu_read_lock_sched() and rcu_read_unlock_sched(). - If the updater uses synchronize_srcu() or call_srcu(), then - the corresponding readers must use srcu_read_lock() and +7. As of v4.20, a given kernel implements only one RCU flavor, + which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y. + If the updater uses call_rcu() or synchronize_rcu(), + then the corresponding readers my use rcu_read_lock() and + rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(), + or any pair of primitives that disables and re-enables preemption, + for example, rcu_read_lock_sched() and rcu_read_unlock_sched(). + If the updater uses synchronize_srcu() or call_srcu(), + then the corresponding readers must use srcu_read_lock() and srcu_read_unlock(), and with the same srcu_struct. The rules for the expedited primitives are the same as for their non-expedited counterparts. Mixing things up will result in confusion and - broken kernels. + broken kernels, and has even resulted in an exploitable security + issue. One exception to this rule: rcu_read_lock() and rcu_read_unlock() may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() @@ -288,8 +285,7 @@ over a rather long period of time, but improvements are always welcome! d. Periodically invoke synchronize_rcu(), permitting a limited number of updates per grace period. - The same cautions apply to call_rcu_bh(), call_rcu_sched(), - call_srcu(), and kfree_rcu(). + The same cautions apply to call_srcu() and kfree_rcu(). Note that although these primitives do take action to avoid memory exhaustion when any given CPU has too many callbacks, a determined @@ -336,12 +332,12 @@ over a rather long period of time, but improvements are always welcome! to safely access and/or modify that data structure. RCU callbacks are -usually- executed on the same CPU that executed - the corresponding call_rcu(), call_rcu_bh(), or call_rcu_sched(), - but are by -no- means guaranteed to be. For example, if a given - CPU goes offline while having an RCU callback pending, then that - RCU callback will execute on some surviving CPU. (If this was - not the case, a self-spawning RCU callback would prevent the - victim CPU from ever going offline.) + the corresponding call_rcu() or call_srcu(). but are by -no- + means guaranteed to be. For example, if a given CPU goes offline + while having an RCU callback pending, then that RCU callback + will execute on some surviving CPU. (If this was not the case, + a self-spawning RCU callback would prevent the victim CPU from + ever going offline.) 13. Unlike other forms of RCU, it -is- permissible to block in an SRCU read-side critical section (demarked by srcu_read_lock() @@ -381,8 +377,7 @@ over a rather long period of time, but improvements are always welcome! SRCU's expedited primitive (synchronize_srcu_expedited()) never sends IPIs to other CPUs, so it is easier on - real-time workloads than is synchronize_rcu_expedited(), - synchronize_rcu_bh_expedited() or synchronize_sched_expedited(). + real-time workloads than is synchronize_rcu_expedited(). Note that rcu_dereference() and rcu_assign_pointer() relate to SRCU just as they do to other forms of RCU. @@ -428,22 +423,19 @@ over a rather long period of time, but improvements are always welcome! These debugging aids can help you find problems that are otherwise extremely difficult to spot. -17. If you register a callback using call_rcu(), call_rcu_bh(), - call_rcu_sched(), or call_srcu(), and pass in a function defined - within a loadable module, then it in necessary to wait for - all pending callbacks to be invoked after the last invocation - and before unloading that module. Note that it is absolutely - -not- sufficient to wait for a grace period! The current (say) - synchronize_rcu() implementation waits only for all previous - callbacks registered on the CPU that synchronize_rcu() is running - on, but it is -not- guaranteed to wait for callbacks registered - on other CPUs. +17. If you register a callback using call_rcu() or call_srcu(), + and pass in a function defined within a loadable module, + then it in necessary to wait for all pending callbacks to + be invoked after the last invocation and before unloading + that module. Note that it is absolutely -not- sufficient to + wait for a grace period! The current (say) synchronize_rcu() + implementation waits only for all previous callbacks registered + on the CPU that synchronize_rcu() is running on, but it is -not- + guaranteed to wait for callbacks registered on other CPUs. You instead need to use one of the barrier functions: o call_rcu() -> rcu_barrier() - o call_rcu_bh() -> rcu_barrier() - o call_rcu_sched() -> rcu_barrier() o call_srcu() -> srcu_barrier() However, these barrier functions are absolutely -not- guaranteed diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 721b3e426515..c818cf65c5a9 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt @@ -52,10 +52,10 @@ o If I am running on a uniprocessor kernel, which can only do one o How can I see where RCU is currently used in the Linux kernel? Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu", - "rcu_read_lock_bh", "rcu_read_unlock_bh", "call_rcu_bh", - "srcu_read_lock", "srcu_read_unlock", "synchronize_rcu", - "synchronize_net", "synchronize_srcu", and the other RCU - primitives. Or grab one of the cscope databases from: + "rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock", + "srcu_read_unlock", "synchronize_rcu", "synchronize_net", + "synchronize_srcu", and the other RCU primitives. Or grab one + of the cscope databases from: http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html diff --git a/Documentation/RCU/rcubarrier.txt b/Documentation/RCU/rcubarrier.txt index 5d7759071a3e..a2782df69732 100644 --- a/Documentation/RCU/rcubarrier.txt +++ b/Documentation/RCU/rcubarrier.txt @@ -83,16 +83,15 @@ Pseudo-code using rcu_barrier() is as follows: 2. Execute rcu_barrier(). 3. Allow the module to be unloaded. -There are also rcu_barrier_bh(), rcu_barrier_sched(), and srcu_barrier() -functions for the other flavors of RCU, and you of course must match -the flavor of rcu_barrier() with that of call_rcu(). If your module -uses multiple flavors of call_rcu(), then it must also use multiple +There is also an srcu_barrier() function for SRCU, and you of course +must match the flavor of rcu_barrier() with that of call_rcu(). If your +module uses multiple flavors of call_rcu(), then it must also use multiple flavors of rcu_barrier() when unloading that module. For example, if -it uses call_rcu_bh(), call_srcu() on srcu_struct_1, and call_srcu() on +it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on srcu_struct_2(), then the following three lines of code will be required when unloading: - 1 rcu_barrier_bh(); + 1 rcu_barrier(); 2 srcu_barrier(&srcu_struct_1); 3 srcu_barrier(&srcu_struct_2); @@ -185,12 +184,12 @@ module invokes call_rcu() from timers, you will need to first cancel all the timers, and only then invoke rcu_barrier() to wait for any remaining RCU callbacks to complete. -Of course, if you module uses call_rcu_bh(), you will need to invoke -rcu_barrier_bh() before unloading. Similarly, if your module uses -call_rcu_sched(), you will need to invoke rcu_barrier_sched() before -unloading. If your module uses call_rcu(), call_rcu_bh(), -and- -call_rcu_sched(), then you will need to invoke each of rcu_barrier(), -rcu_barrier_bh(), and rcu_barrier_sched(). +Of course, if you module uses call_rcu(), you will need to invoke +rcu_barrier() before unloading. Similarly, if your module uses +call_srcu(), you will need to invoke srcu_barrier() before unloading, +and on the same srcu_struct structure. If your module uses call_rcu() +-and- call_srcu(), then you will need to invoke rcu_barrier() -and- +srcu_barrier(). Implementing rcu_barrier() @@ -223,8 +222,8 @@ shown below. Note that the final "1" in on_each_cpu()'s argument list ensures that all the calls to rcu_barrier_func() will have completed before on_each_cpu() returns. Line 9 then waits for the completion. -This code was rewritten in 2008 to support rcu_barrier_bh() and -rcu_barrier_sched() in addition to the original rcu_barrier(). +This code was rewritten in 2008 and several times thereafter, but this +still gives the general idea. The rcu_barrier_func() runs on each CPU, where it invokes call_rcu() to post an RCU callback, as follows: -- cgit v1.2.3 From 0fa201d1618e0b39a60b1045aa1b323f9d351721 Mon Sep 17 00:00:00 2001 From: Tycho Andersen Date: Tue, 29 Jan 2019 15:05:46 -0700 Subject: doc: Repair some whitespace damage A diagram in whatisRCU.txt has space character before tabs. This commit therefore makes this diagram consistent with elsewhere in the document: Use one leading tab, followed by spaces for any additional whitespace required. Signed-off-by: Tycho Andersen Signed-off-by: Paul E. McKenney --- Documentation/RCU/whatisRCU.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 1ace20815bb1..981651a8b65d 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -310,7 +310,7 @@ reader, updater, and reclaimer. rcu_assign_pointer() - +--------+ + +--------+ +---------------------->| reader |---------+ | +--------+ | | | | @@ -318,12 +318,12 @@ reader, updater, and reclaimer. | | | rcu_read_lock() | | | rcu_read_unlock() | rcu_dereference() | | - +---------+ | | - | updater |<---------------------+ | - +---------+ V + +---------+ | | + | updater |<----------------+ | + +---------+ V | +-----------+ +----------------------------------->| reclaimer | - +-----------+ + +-----------+ Defer: synchronize_rcu() & call_rcu() -- cgit v1.2.3 From d1b493bbe101d85c86970f1b6c0401d4c1c473ed Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 12 Feb 2019 07:51:24 -0800 Subject: doc: Describe choice of rcu_dereference() APIs and __rcu usage Reported-by: Andrew Morton Signed-off-by: Paul E. McKenney --- Documentation/RCU/rcu_dereference.txt | 103 ++++++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) (limited to 'Documentation') diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.txt index ab96227bad42..bf699e8cfc75 100644 --- a/Documentation/RCU/rcu_dereference.txt +++ b/Documentation/RCU/rcu_dereference.txt @@ -351,3 +351,106 @@ garbage values. In short, rcu_dereference() is -not- optional when you are going to dereference the resulting pointer. + + +WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE? + +First, please avoid using rcu_dereference_raw() and also please avoid +using rcu_dereference_check() and rcu_dereference_protected() with a +second argument with a constant value of 1 (or true, for that matter). +With that caution out of the way, here is some guidance for which +member of the rcu_dereference() to use in various situations: + +1. If the access needs to be within an RCU read-side critical + section, use rcu_dereference(). With the new consolidated + RCU flavors, an RCU read-side critical section is entered + using rcu_read_lock(), anything that disables bottom halves, + anything that disables interrupts, or anything that disables + preemption. + +2. If the access might be within an RCU read-side critical section + on the one hand, or protected by (say) my_lock on the other, + use rcu_dereference_check(), for example: + + p1 = rcu_dereference_check(p->rcu_protected_pointer, + lockdep_is_held(&my_lock)); + + +3. If the access might be within an RCU read-side critical section + on the one hand, or protected by either my_lock or your_lock on + the other, again use rcu_dereference_check(), for example: + + p1 = rcu_dereference_check(p->rcu_protected_pointer, + lockdep_is_held(&my_lock) || + lockdep_is_held(&your_lock)); + +4. If the access is on the update side, so that it is always protected + by my_lock, use rcu_dereference_protected(): + + p1 = rcu_dereference_protected(p->rcu_protected_pointer, + lockdep_is_held(&my_lock)); + + This can be extended to handle multiple locks as in #3 above, + and both can be extended to check other conditions as well. + +5. If the protection is supplied by the caller, and is thus unknown + to this code, that is the rare case when rcu_dereference_raw() + is appropriate. In addition, rcu_dereference_raw() might be + appropriate when the lockdep expression would be excessively + complex, except that a better approach in that case might be to + take a long hard look at your synchronization design. Still, + there are data-locking cases where any one of a very large number + of locks or reference counters suffices to protect the pointer, + so rcu_dereference_raw() does have its place. + + However, its place is probably quite a bit smaller than one + might expect given the number of uses in the current kernel. + Ditto for its synonym, rcu_dereference_check( ... , 1), and + its close relative, rcu_dereference_protected(... , 1). + + +SPARSE CHECKING OF RCU-PROTECTED POINTERS + +The sparse static-analysis tool checks for direct access to RCU-protected +pointers, which can result in "interesting" bugs due to compiler +optimizations involving invented loads and perhaps also load tearing. +For example, suppose someone mistakenly does something like this: + + p = q->rcu_protected_pointer; + do_something_with(p->a); + do_something_else_with(p->b); + +If register pressure is high, the compiler might optimize "p" out +of existence, transforming the code to something like this: + + do_something_with(q->rcu_protected_pointer->a); + do_something_else_with(q->rcu_protected_pointer->b); + +This could fatally disappoint your code if q->rcu_protected_pointer +changed in the meantime. Nor is this a theoretical problem: Exactly +this sort of bug cost Paul E. McKenney (and several of his innocent +colleagues) a three-day weekend back in the early 1990s. + +Load tearing could of course result in dereferencing a mashup of a pair +of pointers, which also might fatally disappoint your code. + +These problems could have been avoided simply by making the code instead +read as follows: + + p = rcu_dereference(q->rcu_protected_pointer); + do_something_with(p->a); + do_something_else_with(p->b); + +Unfortunately, these sorts of bugs can be extremely hard to spot during +review. This is where the sparse tool comes into play, along with the +"__rcu" marker. If you mark a pointer declaration, whether in a structure +or as a formal parameter, with "__rcu", which tells sparse to complain if +this pointer is accessed directly. It will also cause sparse to complain +if a pointer not marked with "__rcu" is accessed using rcu_dereference() +and friends. For example, ->rcu_protected_pointer might be declared as +follows: + + struct foo __rcu *rcu_protected_pointer; + +Use of "__rcu" is opt-in. If you choose not to use it, then you should +ignore the sparse warnings. -- cgit v1.2.3 From 884b429ae66758bd2e006dc5fd56757e0fde896d Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 6 Mar 2019 11:24:35 -0800 Subject: doc: Fix typos and otherwise modernize checklist.txt This commit fixes some issues with Documentation/RCU/checklist.txt. Signed-off-by: Paul E. McKenney --- Documentation/RCU/checklist.txt | 43 ++++++++++++++++++++++++----------------- 1 file changed, 25 insertions(+), 18 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index fcc59fea5cd4..e98ff261a438 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -318,7 +318,7 @@ over a rather long period of time, but improvements are always welcome! 11. Any lock acquired by an RCU callback must be acquired elsewhere with softirq disabled, e.g., via spin_lock_irqsave(), - spin_lock_bh(), etc. Failing to disable irq on a given + spin_lock_bh(), etc. Failing to disable softirq on a given acquisition of that lock will result in deadlock as soon as the RCU softirq handler happens to run your RCU callback while interrupting that acquisition's critical section. @@ -331,13 +331,16 @@ over a rather long period of time, but improvements are always welcome! must use whatever locking or other synchronization is required to safely access and/or modify that data structure. - RCU callbacks are -usually- executed on the same CPU that executed - the corresponding call_rcu() or call_srcu(). but are by -no- - means guaranteed to be. For example, if a given CPU goes offline - while having an RCU callback pending, then that RCU callback - will execute on some surviving CPU. (If this was not the case, - a self-spawning RCU callback would prevent the victim CPU from - ever going offline.) + Do not assume that RCU callbacks will be executed on the same + CPU that executed the corresponding call_rcu() or call_srcu(). + For example, if a given CPU goes offline while having an RCU + callback pending, then that RCU callback will execute on some + surviving CPU. (If this was not the case, a self-spawning RCU + callback would prevent the victim CPU from ever going offline.) + Furthermore, CPUs designated by rcu_nocbs= might well -always- + have their RCU callbacks executed on some other CPUs, in fact, + for some real-time workloads, this is the whole point of using + the rcu_nocbs= kernel boot parameter. 13. Unlike other forms of RCU, it -is- permissible to block in an SRCU read-side critical section (demarked by srcu_read_lock() @@ -379,8 +382,9 @@ over a rather long period of time, but improvements are always welcome! never sends IPIs to other CPUs, so it is easier on real-time workloads than is synchronize_rcu_expedited(). - Note that rcu_dereference() and rcu_assign_pointer() relate to - SRCU just as they do to other forms of RCU. + Note that rcu_assign_pointer() relates to SRCU just as it does to + other forms of RCU, but instead of rcu_dereference() you should + use srcu_dereference() in order to avoid lockdep splats. 14. The whole point of call_rcu(), synchronize_rcu(), and friends is to wait until all pre-existing readers have finished before @@ -400,6 +404,9 @@ over a rather long period of time, but improvements are always welcome! read-side critical sections. It is the responsibility of the RCU update-side primitives to deal with this. + For SRCU readers, you can use smp_mb__after_srcu_read_unlock() + immediately after an srcu_read_unlock() to get a full barrier. + 16. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the __rcu sparse checks to validate your RCU code. These can help find problems as follows: @@ -423,15 +430,15 @@ over a rather long period of time, but improvements are always welcome! These debugging aids can help you find problems that are otherwise extremely difficult to spot. -17. If you register a callback using call_rcu() or call_srcu(), - and pass in a function defined within a loadable module, - then it in necessary to wait for all pending callbacks to - be invoked after the last invocation and before unloading - that module. Note that it is absolutely -not- sufficient to - wait for a grace period! The current (say) synchronize_rcu() - implementation waits only for all previous callbacks registered - on the CPU that synchronize_rcu() is running on, but it is -not- +17. If you register a callback using call_rcu() or call_srcu(), and + pass in a function defined within a loadable module, then it in + necessary to wait for all pending callbacks to be invoked after + the last invocation and before unloading that module. Note that + it is absolutely -not- sufficient to wait for a grace period! + The current (say) synchronize_rcu() implementation is -not- guaranteed to wait for callbacks registered on other CPUs. + Or even on the current CPU if that CPU recently went offline + and came back online. You instead need to use one of the barrier functions: -- cgit v1.2.3 From da8739f23fadf05809c6c37c327367b229467045 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 5 Mar 2019 15:28:19 -0800 Subject: rcu: Allow rcu_nocbs= to specify all CPUs Currently, the rcu_nocbs= kernel boot parameter requires that a specific list of CPUs be specified, and has no way to say "all of them". As noted by user RavFX in a comment to Phoronix topic 1002538, this is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL Kconfig option. This commit therefore enables the rcu_nocbs= kernel boot parameter to be given the string "all", as in "rcu_nocbs=all" to specify that all CPUs on the system are to have their RCU callbacks offloaded. Another approach would be to make cpulist_parse() check for "all", but there are uses of cpulist_parse() that do other checking, which could conflict with an "all". This commit therefore focuses on the specific use of cpulist_parse() in rcu_nocb_setup(). Just a note to other people who would like changes to Linux-kernel RCU: If you send your requests to me directly, they might get fixed somewhat faster. RavFX's comment was posted on January 22, 2018 and I first saw it on March 5, 2019. And the only reason that I found it -at- -all- was that I was looking for projects using RCU, and my search engine showed me that Phoronix comment quite by accident. Your choice, though! ;-) Signed-off-by: Paul E. McKenney --- Documentation/admin-guide/kernel-parameters.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2b8ee90bb644..d377a2166b79 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3623,7 +3623,9 @@ see CONFIG_RAS_CEC help text. rcu_nocbs= [KNL] - The argument is a cpu list, as described above. + The argument is a cpu list, as described above, + except that the string "all" can be used to + specify every CPU on the system. In kernels built with CONFIG_RCU_NOCB_CPU=y, set the specified list of CPUs to be no-callback CPUs. -- cgit v1.2.3 From 1755ecedc485e8a898ac27b90be664784ddb6b57 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 9 Jan 2019 14:50:29 -0800 Subject: doc/kprobes: Update obsolete RCU update functions The RCU flavors have been consolidated, so this commit replaces mentions of the now-obsolete synchronize_sched() function with synchronize_rcu(). Signed-off-by: Paul E. McKenney Cc: "Naveen N. Rao" Cc: Anil S Keshavamurthy Cc: "David S. Miller" Cc: Jonathan Corbet Cc: Acked-by: Masami Hiramatsu --- Documentation/kprobes.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 10f4499e677c..ee60e519438a 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt @@ -243,10 +243,10 @@ Optimization ^^^^^^^^^^^^ The Kprobe-optimizer doesn't insert the jump instruction immediately; -rather, it calls synchronize_sched() for safety first, because it's +rather, it calls synchronize_rcu() for safety first, because it's possible for a CPU to be interrupted in the middle of executing the -optimized region [3]_. As you know, synchronize_sched() can ensure -that all interruptions that were active when synchronize_sched() +optimized region [3]_. As you know, synchronize_rcu() can ensure +that all interruptions that were active when synchronize_rcu() was called are done, but only if CONFIG_PREEMPT=n. So, this version of kprobe optimization supports only kernels with CONFIG_PREEMPT=n [4]_. -- cgit v1.2.3