summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tracing: Add SPDX license GPL-2.0 license identifier to inter-event testcasesTom Zanussi2019-02-208-0/+8
| | | | | | | | | | Apparently this directory was missed in the license cleanup process - add the missing identifiers to the trigger/inter-event test cases. Link: http://lkml.kernel.org/r/6f9828c2cfb0b378ebd217a39a1b44f063fc17fb.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Add alternative synthetic event trace action syntaxTom Zanussi2019-02-203-22/+76
| | | | | | | | | | | | | | | | | | | | | | | Add a 'trace(synthetic_event_name, params)' alternative to synthetic_event_name(params). Currently, the syntax used for generating synthetic events is to invoke synthetic_event_name(params) i.e. use the synthetic event name as a function call. Users requested a new form that more explicitly shows that the synthetic event is in effect being traced. In this version, a new 'trace()' keyword is used, and the synthetic event name is passed in as the first argument. In addition, for the sake of consistency with other actions, change the documention to emphasize the trace() form over the function-call form, which remains documented as equivalent. Link: http://lkml.kernel.org/r/d082773e50232a001480cf837679a1e01c1a2eb7.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Add hist trigger onchange() handler DocumentationTom Zanussi2019-02-201-0/+98
| | | | | | | | | Add Documentation for the hist:onchange($var) handler. Link: http://lkml.kernel.org/r/ab54b7383b265609fda52648a8fbfbd2631a640f.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Add hist trigger onchange() handlerTom Zanussi2019-02-202-9/+52
| | | | | | | | | | | Add support for a hist:onchange($var) handler, similar to the onmax() handler but triggering whenever there's any change in $var, not just a max. Link: http://lkml.kernel.org/r/dfbc7e4ada242603e9ec3f049b5ad076a07dfd03.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Add hist trigger snapshot() action DocumentationTom Zanussi2019-02-201-0/+110
| | | | | | | | | Add Documentation for the hist:handlerXXX($var).snapshot() action. Link: http://lkml.kernel.org/r/445861d7822cd4b6aeaea1cecfcdbda466502148.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Add hist trigger snapshot() actionTom Zanussi2019-02-202-10/+259
| | | | | | | | | | | | | | | | | | Add support for hist:handlerXXX($var).snapshot(), which will take a snapshot of the current trace buffer whenever handlerXXX is hit. As a first user, this also adds snapshot() action support for the onmax() handler i.e. hist:onmax($var).snapshot(). Also, the hist trigger key printing is moved into a separate function so the snapshot() action can print a histogram key outside the histogram display - add and use hist_trigger_print_key() for that purpose. Link: http://lkml.kernel.org/r/2f1a952c0dcd8aca8702ce81269581a692396d45.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Add conditional snapshotTom Zanussi2019-02-203-6/+244
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, tracing snapshots are context-free - they capture the ring buffer contents at the time the tracing_snapshot() function was invoked, and nothing else. Additionally, they're always taken unconditionally - the calling code can decide whether or not to take a snapshot, but the data used to make that decision is kept separately from the snapshot itself. This change adds the ability to associate with each trace instance some user data, along with an 'update' function that can use that data to determine whether or not to actually take a snapshot. The update function can then update that data along with any other state (as part of the data presumably), if warranted. Because snapshots are 'global' per-instance, only one user can enable and use a conditional snapshot for any given trace instance. To enable a conditional snapshot (see details in the function and data structure comments), the user calls tracing_snapshot_cond_enable(). Similarly, to disable a conditional snapshot and free it up for other users, tracing_snapshot_cond_disable() should be called. To actually initiate a conditional snapshot, tracing_snapshot_cond() should be called. tracing_snapshot_cond() will invoke the update() callback, allowing the user to decide whether or not to actually take the snapshot and update the user-defined data associated with the snapshot. If the callback returns 'true', tracing_snapshot_cond() will then actually take the snapshot and return. This scheme allows for flexibility in snapshot implementations - for example, by implementing slightly different update() callbacks, snapshots can be taken in situations where the user is only interested in taking a snapshot when a new maximum in hit versus when a value changes in any way at all. Future patches will demonstrate both cases. Link: http://lkml.kernel.org/r/1bea07828d5fd6864a585f83b1eed47ce097eb45.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Generalize hist trigger onmax and save actionTom Zanussi2019-02-201-76/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The action refactor code allowed actions and handlers to be separated, but the existing onmax handler and save action code is still not flexible enough to handle arbitrary coupling. This change generalizes them and in the process makes additional handlers and actions easier to implement. The onmax action can be broken up and thought of as two separate components - a variable to be tracked (the parameter given to the onmax($var_to_track) function) and an invisible variable created to save the ongoing result of doing something with that variable, such as saving the max value of that variable so far seen. Separating it out like this and renaming it appropriately allows us to use the same code for similar tracking functions such as onchange($var_to_track), which would just track the last value seen rather than the max seen so far, which is useful in some situations. Additionally, because different handlers and actions may want to save and access data differently e.g. save and retrieve tracking values as local variables vs something more global, save_val() and get_val() interface functions are introduced and max-specific implementations are used instead. The same goes for the code that checks whether a maximum has been hit - a generic check_val() interface and max-checking implementation is used instead, which allows future patches to make use of he same code using their own implemetations of similar functionality. Link: http://lkml.kernel.org/r/980ea73dd8e3f36db3d646f99652f8fed42b77d4.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Split up onmatch action dataTom Zanussi2019-02-202-52/+63
| | | | | | | | | | | | | | | | Currently, the onmatch action data binds the onmatch action to data related to synthetic event generation. Since we want to allow the onmatch handler to potentially invoke a different action, and because we expect other handlers to generate synthetic events, we need to separate the data related to these two functions. Also rename the onmatch data to something more descriptive, and create and use common action data destroy function. Link: http://lkml.kernel.org/r/b9abbf9aae69fe3920cdc8ddbcaad544dd258d78.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Make hist trigger Documentation better reflect actions/handlersTom Zanussi2019-02-201-13/+43
| | | | | | | | | | The action/handler code refactoring didn't change the action/handler syntax, but did generalize it - the Documentation should reflect that. Link: http://lkml.kernel.org/r/c2fe4144678829c70cad67aaa847dca27d57cb83.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Refactor hist trigger action codeTom Zanussi2019-02-201-169/+238
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hist trigger action code currently implements two essentially hard-coded pairs of 'actions' - onmax(), which tracks a variable and saves some event fields when a max is hit, and onmatch(), which is hard-coded to generate a synthetic event. These hardcoded pairs (track max/save fields and detect match/generate synthetic event) should really be decoupled into separate components that can then be arbitrarily combined. The first component of each pair (track max/detect match) is called a 'handler' in the new code, while the second component (save fields/generate synthetic event) is called an 'action' in this scheme. This change refactors the action code to reflect this split by adding two handlers, HANDLER_ONMATCH and HANDLER_ONMAX, along with two actions, ACTION_SAVE and ACTION_TRACE. The new code combines them to produce the existing ONMATCH/TRACE and ONMAX/SAVE functionality, but doesn't implement the other combinations now possible. Future patches will expand these to further useful cases, such as ONMAX/TRACE, as well as add additional handlers and actions such as ONCHANGE and SNAPSHOT. Also, add abbreviated documentation for handlers and actions to README. Link: http://lkml.kernel.org/r/98bfdd48c1b4ff29fc5766442f99f5bc3c34b76b.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Do not free iter->trace in fail path of tracing_open_pipe()zhangyi (F)2019-02-201-1/+0
| | | | | | | | | | | | | | | | | Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files") use the current tracer instead of the copy in tracing_open_pipe(), but it forget to remove the freeing sentence in the error path. There's an error path that can call kfree(iter->trace) after the iter->trace was assigned to tr->current_trace, which would be bad to free. Link: http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zhang@huawei.com Cc: stable@vger.kernel.org Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* uprobes: convert uprobe.ref to refcount_tElena Reshetova2019-02-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable uprobe.ref is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the uprobe.ref it might make a difference in following places: - put_uprobe(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Link: http://lkml.kernel.org/r/1547637627-29526-1-git-send-email-elena.reshetova@intel.com Suggested-by: Kees Cook <keescook@chromium.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* ftrace: Allow enabling of filters via index of available_filter_functionsSteven Rostedt (VMware)2019-02-154-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enabling of large number of functions by echoing in a large subset of the functions in available_filter_functions can take a very long time. The process requires testing all functions registered by the function tracer (which is in the 10s of thousands), and doing a kallsyms lookup to convert the ip address into a name, then comparing that name with the string passed in. When a function causes the function tracer to crash the system, a binary bisect of the available_filter_functions can be done to find the culprit. But this requires passing in half of the functions in available_filter_functions over and over again, which makes it basically a O(n^2) operation. With 40,000 functions, that ends up bing 1,600,000,000 opertions! And enabling this can take over 20 minutes. As a quick speed up, if a number is passed into one of the filter files, instead of doing a search, it just enables the function at the corresponding line of the available_filter_functions file. That is: # echo 50 > set_ftrace_filter # cat set_ftrace_filter x86_pmu_commit_txn # head -50 available_filter_functions | tail -1 x86_pmu_commit_txn This allows setting of half the available_filter_functions to take place in less than a second! # time seq 20000 > set_ftrace_filter real 0m0.042s user 0m0.005s sys 0m0.015s # wc -l set_ftrace_filter 20000 set_ftrace_filter Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Change the function format to display function names by perfChangbin Du2019-02-111-22/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is an example for this change. $ sudo perf record -e 'ftrace:function' --filter='ip==schedule' $ sudo perf report The output of perf before this patch: \# Samples: 100 of event 'ftrace:function' \# Event count (approx.): 100 \# \# Overhead Trace output \# ........ ...................................... \# 51.00% ffffffff81f6aaa0 <-- ffffffff81158e8d 29.00% ffffffff81f6aaa0 <-- ffffffff8116ccb2 8.00% ffffffff81f6aaa0 <-- ffffffff81f6f2ed 4.00% ffffffff81f6aaa0 <-- ffffffff811628db 4.00% ffffffff81f6aaa0 <-- ffffffff81f6ec5b 2.00% ffffffff81f6aaa0 <-- ffffffff81f6f21a 1.00% ffffffff81f6aaa0 <-- ffffffff811b04af 1.00% ffffffff81f6aaa0 <-- ffffffff8143ce17 After this patch: \# Samples: 36 of event 'ftrace:function' \# Event count (approx.): 36 \# \# Overhead Trace output \# ........ ............................................ \# 38.89% schedule <-- schedule_hrtimeout_range_clock 27.78% schedule <-- worker_thread 13.89% schedule <-- schedule_timeout 11.11% schedule <-- smpboot_thread_fn 5.56% schedule <-- rcu_gp_kthread 2.78% schedule <-- exit_to_usermode_loop Link: http://lkml.kernel.org/r/20190209161919.32350-1-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* ring-buffer: Remove unused function ring_buffer_page_len()Miroslav Benes2019-02-062-16/+0
| | | | | | | | | | | Commit 6b7e633fe9c2 ("tracing: Remove extra zeroing out of the ring buffer page") removed the only caller of ring_buffer_page_len(). The function is now unused and may be removed. Link: http://lkml.kernel.org/r/20181228133847.106177-1-mbenes@suse.cz Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Show stacktrace for wakeup tracersChangbin Du2019-02-061-0/+2
| | | | | | | | | | | This align the behavior of wakeup tracers with irqsoff latency tracer that we record stacktrace at the beginning and end of waking up. The stacktrace shows us what is happening in the kernel. Link: http://lkml.kernel.org/r/20190116160249.7554-1-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing/doc: Add latency tracer funcgraph exampleChangbin Du2019-02-061-0/+51
| | | | | | | | | This add an example about how to use funcgraph with latency tracers. Link: http://lkml.kernel.org/r/20190101154614.8887-6-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Put a margin between flags and duration for wakeup tracersChangbin Du2019-02-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't mix context flags with function duration info. Instead of this: # tracer: wakeup_rt # # wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNh5 | /* 0:120:R + [000] 11: 0:R migration/0 */ 2 us | 0) <idle>-0 | dNh5 0.000 us | (null)(); 4 us | 0) <idle>-0 | dNh4 | _raw_spin_unlock() { 4 us | 0) <idle>-0 | dNh4 0.304 us | preempt_count_sub(); 5 us | 0) <idle>-0 | dNh3 1.063 us | } 5 us | 0) <idle>-0 | dNh3 0.266 us | ttwu_stat(); 6 us | 0) <idle>-0 | dNh3 | _raw_spin_unlock_irqrestore() { 6 us | 0) <idle>-0 | dNh3 0.273 us | preempt_count_sub(); 6 us | 0) <idle>-0 | dNh2 0.818 us | } Show this: # tracer: wakeup # # wakeup latency trace v1.1.5 on 4.20.0+ # -------------------------------------------------------------------- # latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNs. | | /* 0:120:R + [000] 339:100:R kworker/0:1H */ 3 us | 0) <idle>-0 | dNs. | 0.000 us | (null)(); 67 us | 0) <idle>-0 | dNs. | 0.721 us | ttwu_stat(); 69 us | 0) <idle>-0 | dNs. | 0.607 us | _raw_spin_unlock_irqrestore(); 71 us | 0) <idle>-0 | .Ns. | 0.598 us | _raw_spin_lock_irq(); 72 us | 0) <idle>-0 | .Ns. | 0.584 us | _raw_spin_lock_irq(); 73 us | 0) <idle>-0 | dNs. | + 11.118 us | __next_timer_interrupt(); 75 us | 0) <idle>-0 | dNs. | | call_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | delayed_work_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | __queue_work() { ... Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Show more info for funcgraph wakeup tracersChangbin Du2019-02-061-1/+4
| | | | | | | | | | | | Add these info fields to funcgraph wakeup tracers: o Show CPU info since the waker could be on a different CPU. o Show function duration and overhead. o Show IRQ markers. Link: http://lkml.kernel.org/r/20190101154614.8887-3-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Add comment to predicate_parse() about "&&" or "||"Steven Rostedt (VMware)2019-02-061-0/+1
| | | | | | | | | As the predicat_parse() code is rather complex, commenting subtleties is important. The switch case statement should be commented to describe that it is only looking for two '&' or '|' together, which is why the fall through to an error is after the check. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Annotate implicit fall through in predicate_parse()Mathieu Malaterre2019-02-061-0/+1
| | | | | | | | | | | | | | There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit remove the following warning: kernel/trace/trace_events_filter.c:494:8: warning: this statement may fall through [-Wimplicit-fallthrough=] Link: http://lkml.kernel.org/r/20190114203039.16535-2-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Annotate implicit fall through in parse_probe_arg()Mathieu Malaterre2019-02-061-0/+1
| | | | | | | | | | | | | | There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit remove the following warning: kernel/trace/trace_probe.c:302:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Link: http://lkml.kernel.org/r/20190114203039.16535-1-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* function_graph: Support displaying relative timestampChangbin Du2019-02-064-6/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* Linux 5.0-rc5v5.0-rc5Linus Torvalds2019-02-031-1/+1
|
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2019-02-0313-14/+35
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few updates for x86: - Fix an unintended sign extension issue in the fault handling code - Rename the new resource control config switch so it's less confusing - Avoid setting up EFI info in kexec when the EFI runtime is disabled. - Fix the microcode version check in the AMD microcode loader so it only loads higher version numbers and never downgrades - Set EFER.LME in the 32bit trampoline before returning to long mode to handle older AMD/KVM behaviour properly. - Add Darren and Andy as x86/platform reviewers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Avoid confusion over the new X86_RESCTRL config x86/kexec: Don't setup EFI info if EFI runtime is not enabled x86/microcode/amd: Don't falsely trick the late loading mechanism MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers x86/fault: Fix sign-extend unintended sign extension x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode x86/cpu: Add Atom Tremont (Jacobsville)
| * x86/resctrl: Avoid confusion over the new X86_RESCTRL configJohannes Weiner2019-02-026-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Resource Control" is a very broad term for this CPU feature, and a term that is also associated with containers, cgroups etc. This can easily cause confusion. Make the user prompt more specific. Match the config symbol name. [ bp: In the future, the corresponding ARM arch-specific code will be under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out under the CPU_RESCTRL umbrella symbol. ] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Babu Moger <Babu.Moger@amd.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: linux-doc@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
| * x86/kexec: Don't setup EFI info if EFI runtime is not enabledKairui Song2019-02-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kexec-ing a kernel with "efi=noruntime" on the first kernel's command line causes the following null pointer dereference: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] Call Trace: efi_runtime_map_copy+0x28/0x30 bzImage64_load+0x688/0x872 arch_kexec_kernel_image_load+0x6d/0x70 kimage_file_alloc_init+0x13e/0x220 __x64_sys_kexec_file_load+0x144/0x290 do_syscall_64+0x55/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Just skip the EFI info setup if EFI runtime services are not enabled. [ bp: Massage commit message. ] Suggested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: bhe@redhat.com Cc: David Howells <dhowells@redhat.com> Cc: erik.schmauss@intel.com Cc: fanc.fnst@cn.fujitsu.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kexec@lists.infradead.org Cc: lenb@kernel.org Cc: linux-acpi@vger.kernel.org Cc: Philipp Rudo <prudo@linux.vnet.ibm.com> Cc: rafael.j.wysocki@intel.com Cc: robert.moore@intel.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Yannik Sembritzki <yannik@sembritzki.me> Link: https://lkml.kernel.org/r/20190118111310.29589-2-kasong@redhat.com
| * x86/microcode/amd: Don't falsely trick the late loading mechanismThomas Lendacky2019-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The load_microcode_amd() function searches for microcode patches and attempts to apply a microcode patch if it is of different level than the currently installed level. While the processor won't actually load a level that is less than what is already installed, the logic wrongly returns UCODE_NEW thus signaling to its caller reload_store() that a late loading should be attempted. If the file-system contains an older microcode revision than what is currently running, such a late microcode reload can result in these misleading messages: x86/CPU: CPU features have changed after loading microcode, but might not take effect. x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update. These messages were issued on a system where SME/SEV are not enabled by the BIOS (MSR C001_0010[23] = 0b) because during boot, early_detect_mem_encrypt() is called and cleared the SME and SEV features in this case. However, after the wrong late load attempt, get_cpu_cap() is called and reloads the SME and SEV feature bits, resulting in the messages. Update the microcode level check to not attempt microcode loading if the current level is greater than(!) and not only equal to the current patch level. [ bp: massage commit message. ] Fixes: 2613f36ed965 ("x86/microcode: Attempt late loading only when new microcode is present") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/154894518427.9406.8246222496874202773.stgit@tlendack-t1.amdoffice.net
| * MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewersBorislav Petkov2019-01-301-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | ... so that they can get CCed on platform patches. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Darren Hart <dvhart@infradead.org> Cc: Andy Shevchenko <andy@infradead.org> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190128113619.19025-1-bp@alien8.de
| * x86/fault: Fix sign-extend unintended sign extensionColin Ian King2019-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | show_ldttss() shifts desc.base2 by 24 bit, but base2 is 8 bits of a bitfield in a u16. Due to the really great idea of integer promotion in C99 base2 is promoted to an int, because that's the standard defined behaviour when all values which can be represented by base2 fit into an int. Now if bit 7 is set in desc.base2 the result of the shift left by 24 makes the resulting integer negative and the following conversion to unsigned long legitmately sign extends first causing the upper bits 32 bits to be set in the result. Fix this by casting desc.base2 to unsigned long before the shift. Detected by CoverityScan, CID#1475635 ("Unintended sign extension") [ tglx: Reworded the changelog a bit as I actually had to lookup the standard (again) to decode the original one. ] Fixes: a1a371c468f7 ("x86/fault: Decode page fault OOPSes better") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20181222191116.21831-1-colin.king@canonical.com
| * x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning ↵Wei Huang2019-01-292-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to long mode In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM when the hypervsior detects that the guest sets CR0.PG to 0. This causes the guest OS to reboot when it tries to return from 32-bit trampoline code because the CPU is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but EFER.LME=0. As a precaution, set EFER.LME=1 as part of long mode activation procedure. This extra step won't cause any harm when Linux is booted on a bare-metal machine. Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: bp@alien8.de Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20190104054411.12489-1-wei@redhat.com
| * x86/cpu: Add Atom Tremont (Jacobsville)Kan Liang2019-01-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the Atom Tremont model number to the Intel family list. [ Tony: Also update comment at head of file to say "_X" suffix is also used for microserver parts. ] Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: Megha Dey <megha.dey@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190125195902.17109-4-tony.luck@intel.com
* | Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds2019-02-036-39/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cpu hotplug fixes from Thomas Gleixner: "Two fixes for the cpu hotplug machinery: - Replace the overly clever 'SMT disabled by BIOS' detection logic as it breaks KVM scenarios and prevents speculation control updates when the Hyperthreads are brought online late after boot. - Remove a redundant invocation of the speculation control update function" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM x86/speculation: Remove redundant arch_smt_update() invocation
| * | cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVMJosh Poimboeuf2019-01-306-35/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the following commit: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") ... the hotplug code attempted to detect when SMT was disabled by BIOS, in which case it reported SMT as permanently disabled. However, that code broke a virt hotplug scenario, where the guest is booted with only primary CPU threads, and a sibling is brought online later. The problem is that there doesn't seem to be a way to reliably distinguish between the HW "SMT disabled by BIOS" case and the virt "sibling not yet brought online" case. So the above-mentioned commit was a bit misguided, as it permanently disabled SMT for both cases, preventing future virt sibling hotplugs. Going back and reviewing the original problems which were attempted to be solved by that commit, when SMT was disabled in BIOS: 1) /sys/devices/system/cpu/smt/control showed "on" instead of "notsupported"; and 2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning. I'd propose that we instead consider #1 above to not actually be a problem. Because, at least in the virt case, it's possible that SMT wasn't disabled by BIOS and a sibling thread could be brought online later. So it makes sense to just always default the smt control to "on" to allow for that possibility (assuming cpuid indicates that the CPU supports SMT). The real problem is #2, which has a simple fix: change vmx_vm_init() to query the actual current SMT state -- i.e., whether any siblings are currently online -- instead of looking at the SMT "control" sysfs value. So fix it by: a) reverting the original "fix" and its followup fix: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation") and b) changing vmx_vm_init() to query the actual current SMT state -- instead of the sysfs control value -- to determine whether the L1TF warning is needed. This also requires the 'sched_smt_present' variable to exported, instead of 'cpu_smt_control'. Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joe Mario <jmario@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
| * | x86/speculation: Remove redundant arch_smt_update() invocationZhenzhong Duan2019-01-291-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With commit a74cfffb03b7 ("x86/speculation: Rework SMT state change"), arch_smt_update() is invoked from each individual CPU hotplug function. Therefore the extra arch_smt_update() call in the sysfs SMT control is redundant. Fixes: a74cfffb03b7 ("x86/speculation: Rework SMT state change") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <konrad.wilk@oracle.com> Cc: <dwmw@amazon.co.uk> Cc: <bp@suse.de> Cc: <srinivas.eeda@oracle.com> Cc: <peterz@infradead.org> Cc: <hpa@zytor.com> Link: https://lkml.kernel.org/r/e2e064f2-e8ef-42ca-bf4f-76b612964752@default
* | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2019-02-036-23/+35
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A pile of perf updates: - Fix broken sanity check in the /proc/sys/kernel/perf_cpu_time_max_percent write handler - Cure a perf script crash which caused by an unitinialized data structure - Highlight the hottest instruction in perf top and not a random one - Cure yet another clang issue when building perf python - Handle topology entries with no CPU correctly in the tools - Handle perf data which contains both tracepoints and performance counter entries correctly. - Add a missing NULL pointer check in perf ordered_events_free()" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf script: Fix crash when processing recorded stat data perf top: Fix wrong hottest instruction highlighted perf tools: Handle TOPOLOGY headers with no CPU perf python: Remove -fstack-clash-protection when building with some clang versions perf core: Fix perf_proc_update_handler() bug perf script: Fix crash with printing mixed trace point and other events perf ordered_events: Fix crash in ordered_events__free
| * \ \ Merge tag 'perf-urgent-for-mingo-5.0-20190121' of ↵Ingo Molnar2019-01-226-23/+35
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: Kernel: Stephane Eranian: - Fix perf_proc_update_handler() bug. perf script: Andi Kleen: - Fix crash with printing mixed trace point and other events. Tony Jones: - Fix crash when processing recorded stat data. perf top: He Kuang: - Fix wrong hottest instruction highlighted. perf python: Arnaldo Carvalho de Melo: - Remove -fstack-clash-protection when building with some clang versions. perf ordered_events: Jiri Olsa: - Fix out of buffers crash in ordered_events__free(). perf cpu_map: Stephane Eranian: - Handle TOPOLOGY headers with no CPU. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | perf script: Fix crash when processing recorded stat dataTony Jones2019-01-211-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While updating perf to work with Python3 and Python2 I noticed that the stat-cpi script was dumping core. $ perf stat -e cycles,instructions record -o /tmp/perf.data /bin/false Performance counter stats for '/bin/false': 802,148 cycles 604,622 instructions 802,148 cycles 604,622 instructions 0.001445842 seconds time elapsed $ perf script -i /tmp/perf.data -s scripts/python/stat-cpi.py Segmentation fault (core dumped) ... ... rblist=rblist@entry=0xb2a200 <rt_stat>, new_entry=new_entry@entry=0x7ffcb755c310) at util/rblist.c:33 ctx=<optimized out>, type=<optimized out>, create=<optimized out>, cpu=<optimized out>, evsel=<optimized out>) at util/stat-shadow.c:118 ctx=<optimized out>, type=<optimized out>, st=<optimized out>) at util/stat-shadow.c:196 count=count@entry=727442, cpu=cpu@entry=0, st=0xb2a200 <rt_stat>) at util/stat-shadow.c:239 config=config@entry=0xafeb40 <stat_config>, counter=counter@entry=0x133c6e0) at util/stat.c:372 ... ... The issue is that since 1fcd03946b52 perf_stat__update_shadow_stats now calls update_runtime_stat passing rt_stat rather than calling update_stats but perf_stat__init_shadow_stats has never been called to initialize rt_stat in the script path processing recorded stat data. Since I can't see any reason why perf_stat__init_shadow_stats() is presently initialized like it is in builtin-script.c::perf_sample__fprint_metric() [4bd1bef8bba2f] I'm proposing it instead be initialized once in __cmd_script Committer testing: After applying the patch: # perf script -i /tmp/perf.data -s tools/perf/scripts/python/stat-cpi.py 0.001970: cpu -1, thread -1 -> cpi 1.709079 (1075684/629394) # No segfault. Signed-off-by: Tony Jones <tonyj@suse.de> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Fixes: 1fcd03946b52 ("perf stat: Update per-thread shadow stats") Link: http://lkml.kernel.org/r/20190120191414.12925-1-tonyj@suse.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf top: Fix wrong hottest instruction highlightedHe Kuang2019-01-211-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The annotation line percentage is compared and inserted into the rbtree, but the percent field of 'struct annotation_data' is an array, the comparison result between them is the address difference. This patch compares the right slot of percent array according to opts->percent_type and makes things right. The problem can be reproduced by pressing 'H' in perf top annotation view. It should highlight the instruction line which has the highest sampling percentage. Signed-off-by: He Kuang <hekuang@huawei.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190120160523.4391-1-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf tools: Handle TOPOLOGY headers with no CPUStephane Eranian2019-01-211-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes an issue in cpumap.c when used with the TOPOLOGY header. In some configurations, some NUMA nodes may have no CPU (empty cpulist). Yet a cpumap map must be created otherwise perf abort with an error. This patch handles this case by creating a dummy map. Before: $ perf record -o - -e cycles noploop 2 | perf script -i - 0x6e8 [0x6c]: failed to process type: 80 After: $ perf record -o - -e cycles noploop 2 | perf script -i - noploop for 2 seconds Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1547885559-1657-1-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf python: Remove -fstack-clash-protection when building with some clang ↵Arnaldo Carvalho de Melo2019-01-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | versions These options are not present in some (all?) clang versions, so when we build for a distro that has a gcc new enough to have these options and that the distro python build config settings use them but clang doesn't support, b00m. This is the case with fedora rawhide (now gearing towards f30), so check if clang has the and remove the missing ones from CFLAGS. Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Thiago Macieira <thiago.macieira@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-5q50q9w458yawgxf9ez54jbp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf core: Fix perf_proc_update_handler() bugStephane Eranian2019-01-181-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The perf_proc_update_handler() handles /proc/sys/kernel/perf_event_max_sample_rate syctl variable. When the PMU IRQ handler timing monitoring is disabled, i.e, when /proc/sys/kernel/perf_cpu_time_max_percent is equal to 0 or 100, then no modification to sysctl_perf_event_sample_rate is allowed to prevent possible hang from wrong values. The problem is that the test to prevent modification is made after the sysctl variable is modified in perf_proc_update_handler(). You get an error: $ echo 10001 >/proc/sys/kernel/perf_event_max_sample_rate echo: write error: invalid argument But the value is still modified causing all sorts of inconsistencies: $ cat /proc/sys/kernel/perf_event_max_sample_rate 10001 This patch fixes the problem by moving the parsing of the value after the test. Committer testing: # echo 100 > /proc/sys/kernel/perf_cpu_time_max_percent # echo 10001 > /proc/sys/kernel/perf_event_max_sample_rate -bash: echo: write error: Invalid argument # cat /proc/sys/kernel/perf_event_max_sample_rate 10001 # Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1547169436-6266-1-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf script: Fix crash with printing mixed trace point and other eventsAndi Kleen2019-01-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'perf script' crashes currently when printing mixed trace points and other events because the trace format does not handle events without trace meta data. Add a simple check to avoid that. % cat > test.c main() { printf("Hello world\n"); } ^D % gcc -g -o test test.c % sudo perf probe -x test 'test.c:3' % perf record -e '{cpu/cpu-cycles,period=10000/,probe_test:main}:S' ./test % perf script <segfault> Committer testing: Before: # perf probe -x /lib64/libc-2.28.so malloc Added new event: probe_libc:malloc (on malloc in /usr/lib64/libc-2.28.so) You can now use it in all perf tools, such as: perf record -e probe_libc:malloc -aR sleep 1 # perf probe -l probe_libc:malloc (on __libc_malloc@malloc/malloc.c in /usr/lib64/libc-2.28.so) # perf record -e '{cpu/cpu-cycles,period=10000/,probe_libc:*}:S' sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.023 MB perf.data (40 samples) ] # perf script Segmentation fault (core dumped) ^C # After: # perf script | head -6 sleep 2888 94796.944981: 16198 cpu/cpu-cycles,period=10000/: ffffffff925dc04f get_random_u32+0x1f (/lib/modules/5.0.0-rc2+/build/vmlinux) sleep 2888 [-01] 94796.944981: probe_libc:malloc: sleep 2888 94796.944983: 4713 cpu/cpu-cycles,period=10000/: ffffffff922763af change_protection+0xcf (/lib/modules/5.0.0-rc2+/build/vmlinux) sleep 2888 [-01] 94796.944983: probe_libc:malloc: sleep 2888 94796.944986: 9934 cpu/cpu-cycles,period=10000/: ffffffff922777e0 move_page_tables+0x0 (/lib/modules/5.0.0-rc2+/build/vmlinux) sleep 2888 [-01] 94796.944986: probe_libc:malloc: # Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190117194834.21940-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | perf ordered_events: Fix crash in ordered_events__freeJiri Olsa2019-01-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Song Liu reported crash in 'perf record': > #0 0x0000000000500055 in ordered_events(float, long double,...)(...) () > #1 0x0000000000500196 in ordered_events.reinit () > #2 0x00000000004fe413 in perf_session.process_events () > #3 0x0000000000440431 in cmd_record () > #4 0x00000000004a439f in run_builtin () > #5 0x000000000042b3e5 in main ()" This can happen when we get out of buffers during event processing. The subsequent ordered_events__free() call assumes oe->buffer != NULL and crashes. Add a check to prevent that. Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Song Liu <liu.song.a23@gmail.com> Tested-by: Song Liu <liu.song.a23@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20190117113017.12977-1-jolsa@kernel.org Fixes: d5ceb62b3654 ("perf ordered_events: Add 'struct ordered_events_buffer' layer") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | | | Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds2019-02-031-2/+3
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fix from Thomas Gleixner: "The dump info for the efi page table debugging lacks a terminator which causes the kernel to crash when the debugfile is read" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/arm64: Fix debugfs crash by adding a terminator for ptdump marker
| * | | | | efi/arm64: Fix debugfs crash by adding a terminator for ptdump markerQian Cai2019-02-021-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reading 'efi_page_tables' debugfs triggers an out-of-bounds access here: arch/arm64/mm/dump.c: 282 if (addr >= st->marker[1].start_address) { called from: arch/arm64/mm/dump.c: 331 note_page(st, addr, 2, pud_val(pud)); because st->marker++ is is called after "UEFI runtime end" which is the last element in addr_marker[]. Therefore, add a terminator like the one for kernel_page_tables, so it can be skipped to print out non-existent markers. Here's the KASAN bug report: # cat /sys/kernel/debug/efi_page_tables ---[ UEFI runtime start ]--- 0x0000000020000000-0x0000000020010000 64K PTE RW NX SHD AF ... 0x0000000020200000-0x0000000021340000 17664K PTE RW NX SHD AF ... ... 0x0000000021920000-0x0000000021950000 192K PTE RW x SHD AF ... 0x0000000021950000-0x00000000219a0000 320K PTE RW NX SHD AF ... ---[ UEFI runtime end ]--- ---[ (null) ]--- ---[ (null) ]--- BUG: KASAN: global-out-of-bounds in note_page+0x1f0/0xac0 Read of size 8 at addr ffff2000123f2ac0 by task read_all/42464 Call trace: dump_backtrace+0x0/0x298 show_stack+0x24/0x30 dump_stack+0xb0/0xdc print_address_description+0x64/0x2b0 kasan_report+0x150/0x1a4 __asan_report_load8_noabort+0x30/0x3c note_page+0x1f0/0xac0 walk_pgd+0xb4/0x244 ptdump_walk_pgd+0xec/0x140 ptdump_show+0x40/0x50 seq_read+0x3f8/0xad0 full_proxy_read+0x9c/0xc0 __vfs_read+0xfc/0x4c8 vfs_read+0xec/0x208 ksys_read+0xd0/0x15c __arm64_sys_read+0x84/0x94 el0_svc_handler+0x258/0x304 el0_svc+0x8/0xc The buggy address belongs to the variable: __compound_literal.0+0x20/0x800 Memory state around the buggy address: ffff2000123f2980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff2000123f2a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa >ffff2000123f2a80: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 00 ^ ffff2000123f2b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff2000123f2b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 [ ardb: fix up whitespace ] [ mingo: fix up some moar ] Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: 9d80448ac92b ("efi/arm64: Add debugfs node to dump UEFI runtime page tables") Link: http://lkml.kernel.org/r/20190202095017.13799-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | Merge tag 'for-5.0-rc4-tag' of ↵Linus Torvalds2019-02-034-38/+71
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - regression fix: transaction commit can run away due to delayed ref waiting heuristic, this is not necessary now because of the proper reservation mechanism introduced in 5.0 - regression fix: potential crash due to use-before-check of an ERR_PTR return value - fix for transaction abort during transaction commit that needs to properly clean up pending block groups - fix deadlock during b-tree node/leaf splitting, when this happens on some of the fundamental trees, we must prevent new tree block allocation to re-enter indirectly via the block group flushing path - potential memory leak after errors during mount * tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: On error always free subvol_name in btrfs_mount btrfs: clean up pending block groups when transaction commit aborts btrfs: fix potential oops in device_list_add btrfs: don't end the transaction for delayed refs in throttle Btrfs: fix deadlock when allocating tree block during leaf/node split
| * | | | | | btrfs: On error always free subvol_name in btrfs_mountEric W. Biederman2019-01-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The subvol_name is allocated in btrfs_parse_subvol_options and is consumed and freed in mount_subvol. Add a free to the error paths that don't call mount_subvol so that it is guaranteed that subvol_name is freed when an error happens. Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()") Cc: stable@vger.kernel.org # v4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | btrfs: clean up pending block groups when transaction commit abortsDavid Sterba2019-01-301-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fstests generic/475 stresses transaction aborts and can reveal space accounting or use-after-free bugs regarding block goups. In this case the pending block groups that remain linked to the structures after transaction commit aborts in the middle. The corrupted slabs lead to failures in following tests, eg. generic/476 [ 8172.752887] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [ 8172.755799] #PF error: [normal kernel read fault] [ 8172.757571] PGD 661ae067 P4D 661ae067 PUD 3db8e067 PMD 0 [ 8172.759000] Oops: 0000 [#1] PREEMPT SMP [ 8172.760209] CPU: 0 PID: 39 Comm: kswapd0 Tainted: G W 5.0.0-rc2-default #408 [ 8172.762495] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 [ 8172.765772] RIP: 0010:shrink_page_list+0x2f9/0xe90 [ 8172.770453] RSP: 0018:ffff967f00663b18 EFLAGS: 00010287 [ 8172.771184] RAX: 0000000000000000 RBX: ffff967f00663c20 RCX: 0000000000000000 [ 8172.772850] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c0620ab20e0 [ 8172.774629] RBP: ffff967f00663dd8 R08: 0000000000000000 R09: 0000000000000000 [ 8172.776094] R10: ffff8c0620ab22f8 R11: ffff8c063f772688 R12: ffff967f00663b78 [ 8172.777533] R13: ffff8c063f625600 R14: ffff8c063f625608 R15: dead000000000200 [ 8172.778886] FS: 0000000000000000(0000) GS:ffff8c063d400000(0000) knlGS:0000000000000000 [ 8172.780545] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8172.781787] CR2: 0000000000000058 CR3: 000000004e962000 CR4: 00000000000006f0 [ 8172.783547] Call Trace: [ 8172.784112] shrink_inactive_list+0x194/0x410 [ 8172.784747] shrink_node_memcg.constprop.85+0x3a5/0x6a0 [ 8172.785472] shrink_node+0x62/0x1e0 [ 8172.786011] balance_pgdat+0x216/0x460 [ 8172.786577] kswapd+0xe3/0x4a0 [ 8172.787085] ? finish_wait+0x80/0x80 [ 8172.787795] ? balance_pgdat+0x460/0x460 [ 8172.788799] kthread+0x116/0x130 [ 8172.789640] ? kthread_create_on_node+0x60/0x60 [ 8172.790323] ret_from_fork+0x24/0x30 [ 8172.794253] CR2: 0000000000000058 or accounting errors at umount time: [ 8159.537251] WARNING: CPU: 2 PID: 19031 at fs/btrfs/extent-tree.c:5987 btrfs_free_block_groups+0x3d5/0x410 [btrfs] [ 8159.543325] CPU: 2 PID: 19031 Comm: umount Tainted: G W 5.0.0-rc2-default #408 [ 8159.545472] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 [ 8159.548155] RIP: 0010:btrfs_free_block_groups+0x3d5/0x410 [btrfs] [ 8159.554030] RSP: 0018:ffff967f079cbde8 EFLAGS: 00010206 [ 8159.555144] RAX: 0000000001000000 RBX: ffff8c06366cf800 RCX: 0000000000000000 [ 8159.556730] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8c06255ad800 [ 8159.558279] RBP: ffff8c0637ac0000 R08: 0000000000000001 R09: 0000000000000000 [ 8159.559797] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8c0637ac0108 [ 8159.561296] R13: ffff8c0637ac0158 R14: 0000000000000000 R15: dead000000000100 [ 8159.562852] FS: 00007f7f693b9fc0(0000) GS:ffff8c063d800000(0000) knlGS:0000000000000000 [ 8159.564839] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8159.566160] CR2: 00007f7f68fab7b0 CR3: 000000000aec7000 CR4: 00000000000006e0 [ 8159.567898] Call Trace: [ 8159.568597] close_ctree+0x17f/0x350 [btrfs] [ 8159.569628] generic_shutdown_super+0x64/0x100 [ 8159.570808] kill_anon_super+0x14/0x30 [ 8159.571857] btrfs_kill_super+0x12/0xa0 [btrfs] [ 8159.573063] deactivate_locked_super+0x29/0x60 [ 8159.574234] cleanup_mnt+0x3b/0x70 [ 8159.575176] task_work_run+0x98/0xc0 [ 8159.576177] exit_to_usermode_loop+0x83/0x90 [ 8159.577315] do_syscall_64+0x15b/0x180 [ 8159.578339] entry_SYSCALL_64_after_hwframe+0x49/0xbe This fix is based on 2 Josef's patches that used sideefects of btrfs_create_pending_block_groups, this fix introduces the helper that does what we need. CC: stable@vger.kernel.org # 4.4+ CC: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>