summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'trace-v5.19' of ↵Linus Torvalds2022-05-2921-595/+677
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The majority of the changes are for fixes and clean ups. Notable changes: - Rework trace event triggers code to be easier to interact with. - Support for embedding bootconfig with the kernel (as suppose to having it embedded in initram). This is useful for embedded boards without initram disks. - Speed up boot by parallelizing the creation of tracefs files. - Allow absolute ring buffer timestamps handle timestamps that use more than 59 bits. - Added new tracing clock "TAI" (International Atomic Time) - Have weak functions show up in available_filter_function list as: __ftrace_invalid_address___<invalid-offset> instead of using the name of the function before it" * tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (52 commits) ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function tracing: Fix comments for event_trigger_separate_filter() x86/traceponit: Fix comment about irq vector tracepoints x86,tracing: Remove unused headers ftrace: Clean up hash direct_functions on register failures tracing: Fix comments of create_filter() tracing: Disable kcov on trace_preemptirq.c tracing: Initialize integer variable to prevent garbage return value ftrace: Fix typo in comment ftrace: Remove return value of ftrace_arch_modify_*() tracing: Cleanup code by removing init "char *name" tracing: Change "char *" string form to "char []" tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ tracing/timerlat: Print stacktrace in the IRQ handler if needed tracing/timerlat: Notify IRQ new max latency only if stop tracing is set kprobes: Fix build errors with CONFIG_KRETPROBES=n tracing: Fix return value of trace_pid_write() tracing: Fix potential double free in create_var_ref() tracing: Use strim() to remove whitespace instead of doing it manually ftrace: Deal with error return code of the ftrace_process_locs() function ...
| * ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak functionSteven Rostedt (Google)2022-05-281-2/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an unused weak function was traced, it's call to fentry will still exist, which gets added into the __mcount_loc table. Ftrace will use kallsyms to retrieve the name for each location in __mcount_loc to display it in the available_filter_functions and used to enable functions via the name matching in set_ftrace_filter/notrace. Enabling these functions do nothing but enable an unused call to ftrace_caller. If a traced weak function is overridden, the symbol of the function would be used for it, which will either created duplicate names, or if the previous function was not traced, it would be incorrectly be listed in available_filter_functions as a function that can be traced. This became an issue with BPF[1] as there are tooling that enables the direct callers via ftrace but then checks to see if the functions were actually enabled. The case of one function that was marked notrace, but was followed by an unused weak function that was traced. The unused function's call to fentry was added to the __mcount_loc section, and kallsyms retrieved the untraced function's symbol as the weak function was overridden. Since the untraced function would not get traced, the BPF check would detect this and fail. The real fix would be to fix kallsyms to not show addresses of weak functions as the function before it. But that would require adding code in the build to add function size to kallsyms so that it can know when the function ends instead of just using the start of the next known symbol. In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET macro that if defined, ftrace will ignore any function that has its call to fentry/mcount that has an offset from the symbol that is greater than FTRACE_MCOUNT_MAX_OFFSET. If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET to zero (unless IBT is enabled), which will have ftrace ignore all locations that are not at the start of the function (or one after the ENDBR instruction). A worker thread is added at boot up to scan all the ftrace record entries, and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled. They will still appear in the available_filter_functions file as: __ftrace_invalid_address___<invalid-offset> (showing the offset that caused it to be invalid). This is required for tools that use libtracefs (like trace-cmd does) that scan the available_filter_functions and enable set_ftrace_filter and set_ftrace_notrace using indexes of the function listed in the file (this is a speedup, as enabling thousands of files via names is an O(n^2) operation and can take minutes to complete, where the indexing takes less than a second). The invalid functions cannot be removed from available_filter_functions as the names there correspond to the ftrace records in the array that manages them (and the indexing depends on this). [1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/ Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Fix comments for event_trigger_separate_filter()sunliming2022-05-261-4/+4
| | | | | | | | | | | | | | | | | | | | The parameter name in comments of event_trigger_separate_filter() is inconsistent with actual parameter name, fix it. Link: https://lkml.kernel.org/r/20220526072957.165655-1-sunliming@kylinos.cn Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * ftrace: Clean up hash direct_functions on register failuresSong Liu2022-05-261-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We see the following GPF when register_ftrace_direct fails: [ ] general protection fault, probably for non-canonical address \ 0x200000000000010: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [...] [ ] RIP: 0010:ftrace_find_rec_direct+0x53/0x70 [ ] Code: 48 c1 e0 03 48 03 42 08 48 8b 10 31 c0 48 85 d2 74 [...] [ ] RSP: 0018:ffffc9000138bc10 EFLAGS: 00010206 [ ] RAX: 0000000000000000 RBX: ffffffff813e0df0 RCX: 000000000000003b [ ] RDX: 0200000000000000 RSI: 000000000000000c RDI: ffffffff813e0df0 [ ] RBP: ffffffffa00a3000 R08: ffffffff81180ce0 R09: 0000000000000001 [ ] R10: ffffc9000138bc18 R11: 0000000000000001 R12: ffffffff813e0df0 [ ] R13: ffffffff813e0df0 R14: ffff888171b56400 R15: 0000000000000000 [ ] FS: 00007fa9420c7780(0000) GS:ffff888ff6a00000(0000) knlGS:000000000 [ ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ ] CR2: 000000000770d000 CR3: 0000000107d50003 CR4: 0000000000370ee0 [ ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ ] Call Trace: [ ] <TASK> [ ] register_ftrace_direct+0x54/0x290 [ ] ? render_sigset_t+0xa0/0xa0 [ ] bpf_trampoline_update+0x3f5/0x4a0 [ ] ? 0xffffffffa00a3000 [ ] bpf_trampoline_link_prog+0xa9/0x140 [ ] bpf_tracing_prog_attach+0x1dc/0x450 [ ] bpf_raw_tracepoint_open+0x9a/0x1e0 [ ] ? find_held_lock+0x2d/0x90 [ ] ? lock_release+0x150/0x430 [ ] __sys_bpf+0xbd6/0x2700 [ ] ? lock_is_held_type+0xd8/0x130 [ ] __x64_sys_bpf+0x1c/0x20 [ ] do_syscall_64+0x3a/0x80 [ ] entry_SYSCALL_64_after_hwframe+0x44/0xae [ ] RIP: 0033:0x7fa9421defa9 [ ] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 9 f8 [...] [ ] RSP: 002b:00007ffed743bd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ ] RAX: ffffffffffffffda RBX: 00000000069d2480 RCX: 00007fa9421defa9 [ ] RDX: 0000000000000078 RSI: 00007ffed743bd80 RDI: 0000000000000011 [ ] RBP: 00007ffed743be00 R08: 0000000000bb7270 R09: 0000000000000000 [ ] R10: 00000000069da210 R11: 0000000000000246 R12: 0000000000000001 [ ] R13: 00007ffed743c4b0 R14: 00000000069d2480 R15: 0000000000000001 [ ] </TASK> [ ] Modules linked in: klp_vm(OK) [ ] ---[ end trace 0000000000000000 ]--- One way to trigger this is: 1. load a livepatch that patches kernel function xxx; 2. run bpftrace -e 'kfunc:xxx {}', this will fail (expected for now); 3. repeat #2 => gpf. This is because the entry is added to direct_functions, but not removed. Fix this by remove the entry from direct_functions when register_ftrace_direct fails. Also remove the last trailing space from ftrace.c, so we don't have to worry about it anymore. Link: https://lkml.kernel.org/r/20220524170839.900849-1-song@kernel.org Cc: stable@vger.kernel.org Fixes: 763e34e74bb7 ("ftrace: Add register_ftrace_direct()") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Fix comments of create_filter()sunliming2022-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | The name in comments of parameter "filter_string" in function create_filter is annotated as "filter_str", just fix it. Link: https://lkml.kernel.org/r/20220524063937.52873-1-sunliming@kylinos.cn Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Disable kcov on trace_preemptirq.cCongyu Liu2022-05-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Functions in trace_preemptirq.c could be invoked from early interrupt code that bypasses kcov trace function's in_task() check. Disable kcov on this file to reduce random code coverage. Link: https://lkml.kernel.org/r/20220523063033.1778974-1-liu3101@purdue.edu Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Congyu Liu <liu3101@purdue.edu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Initialize integer variable to prevent garbage return valueGautam Menghani2022-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initialize the integer variable to 0 to fix the clang scan warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn] return ret; Link: https://lkml.kernel.org/r/20220522061826.1751-1-gautammenghani201@gmail.com Cc: stable@vger.kernel.org Fixes: 8993665abcce ("tracing/boot: Support multiple handlers for per-event histogram") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * ftrace: Fix typo in commentJulia Lawall2022-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lkml.kernel.org/r/20220521111145.81697-81-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * ftrace: Remove return value of ftrace_arch_modify_*()Li kunyu2022-05-261-12/+4
| | | | | | | | | | | | | | | | | | | | | | All instances of the function ftrace_arch_modify_prepare() and ftrace_arch_modify_post_process() return zero. There's no point in checking their return value. Just have them be void functions. Link: https://lkml.kernel.org/r/20220518023639.4065-1-kunyu@nfschina.com Signed-off-by: Li kunyu <kunyu@nfschina.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Cleanup code by removing init "char *name"liqiong2022-05-261-3/+1
| | | | | | | | | | | | | | | | | | | | The pointer is assigned to "type->name" anyway. no need to initialize with "preemption". Link: https://lkml.kernel.org/r/20220513075221.26275-1-liqiong@nfschina.com Signed-off-by: liqiong <liqiong@nfschina.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Change "char *" string form to "char []"liqiong2022-05-262-2/+2
| | | | | | | | | | | | | | | | | | | | The "char []" string form declares a single variable. It is better than "char *" which creates two variables in the final assembly. Link: https://lkml.kernel.org/r/20220512143230.28796-1-liqiong@nfschina.com Signed-off-by: liqiong <liqiong@nfschina.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQDaniel Bristot de Oliveira2022-05-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to wakeup the timerlat/ thread if stop tracing is hit at the timerlat's IRQ handler. Return before waking up timerlat's thread. Link: https://lkml.kernel.org/r/b392356c91b56aedd2b289513cc56a84cf87e60d.1652175637.git.bristot@kernel.org Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing/timerlat: Print stacktrace in the IRQ handler if neededDaniel Bristot de Oliveira2022-05-261-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If print_stack and stop_tracing_us are set, and stop_tracing_us is hit with latency higher than or equal to print_stack, print the stack at the IRQ handler as it is useful to define the root cause for the IRQ latency. Link: https://lkml.kernel.org/r/fd04530ce98ae9270e41bb124ee5bf67b05ecfed.1652175637.git.bristot@kernel.org Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing/timerlat: Notify IRQ new max latency only if stop tracing is setDaniel Bristot de Oliveira2022-05-261-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the notification of a new max latency is sent from timerlat's IRQ handler anytime a new max latency is found. While this behavior is not wrong, the send IPI overhead itself will increase the thread latency and that is not the desired effect (tracing overhead). Moreover, the thread will notify a new max latency again because the thread latency as it is always higher than the IRQ latency that woke it up. The only case in which it is helpful to notify a new max latency from IRQ is when stop tracing (for the IRQ) is set, as in this case, the thread will not be dispatched. Notify a new max latency from the IRQ handler only if stop tracing is set for the IRQ handler. Link: https://lkml.kernel.org/r/2c2d9a56c0886c8402ba320de32856cbbb10c2bb.1652175637.git.bristot@kernel.org Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Reported-by: Clark Williams <williams@redhat.com> Fixes: a955d7eac177 ("trace: Add timerlat tracer") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * kprobes: Fix build errors with CONFIG_KRETPROBES=nMasami Hiramatsu2022-05-261-73/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Max Filippov reported: When building kernel with CONFIG_KRETPROBES=n kernel/kprobes.c compilation fails with the following messages: kernel/kprobes.c: In function ‘recycle_rp_inst’: kernel/kprobes.c:1273:32: error: implicit declaration of function ‘get_kretprobe’ kernel/kprobes.c: In function ‘kprobe_flush_task’: kernel/kprobes.c:1299:35: error: ‘struct task_struct’ has no member named ‘kretprobe_instances’ This came from the commit d741bf41d7c7 ("kprobes: Remove kretprobe hash") which introduced get_kretprobe() and kretprobe_instances member in task_struct when CONFIG_KRETPROBES=y, but did not make recycle_rp_inst() and kprobe_flush_task() depending on CONFIG_KRETPORBES. Since those functions are only used for kretprobe, move those functions into #ifdef CONFIG_KRETPROBE area. Link: https://lkml.kernel.org/r/165163539094.74407.3838114721073251225.stgit@devnote2 Reported-by: Max Filippov <jcmvbkbc@gmail.com> Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash") Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S . Miller" <davem@davemloft.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Fix return value of trace_pid_write()Wonhyuk Yang2022-05-261-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting set_event_pid with trailing whitespace lead to endless write system calls like below. $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid execve("/usr/bin/echo", ["echo", "123 "], ...) = 0 ... write(1, "123 \n", 5) = 4 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 .... This is because, the result of trace_get_user's are not returned when it read at least one pid. To fix it, update read variable even if parser->idx == 0. The result of applied patch is below. $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid execve("/usr/bin/echo", ["echo", "123 "], ...) = 0 ... write(1, "123 \n", 5) = 5 close(1) = 0 Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Baik Song An <bsahn@etri.re.kr> Cc: Hong Yeon Kim <kimhy@etri.re.kr> Cc: Taeung Song <taeung@reallinux.co.kr> Cc: linuxgeek@linuxgeek.io Cc: stable@vger.kernel.org Fixes: 4909010788640 ("tracing: Add set_event_pid directory for future use") Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Fix potential double free in create_var_ref()Keita Suzuki2022-05-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In create_var_ref(), init_var_ref() is called to initialize the fields of variable ref_field, which is allocated in the previous function call to create_hist_field(). Function init_var_ref() allocates the corresponding fields such as ref_field->system, but frees these fields when the function encounters an error. The caller later calls destroy_hist_field() to conduct error handling, which frees the fields and the variable itself. This results in double free of the fields which are already freed in the previous function. Fix this by storing NULL to the corresponding fields when they are freed in init_var_ref(). Link: https://lkml.kernel.org/r/20220425063739.3859998-1-keitasuzuki.park@sslab.ics.keio.ac.jp Fixes: 067fe038e70f ("tracing: Add variable reference handling to hist triggers") CC: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Use strim() to remove whitespace instead of doing it manuallyYuntao Wang2022-05-261-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tracing_set_trace_write() function just removes the trailing whitespace from the user supplied tracer name, but the leading whitespace should also be removed. In addition, if the user supplied tracer name contains only a few whitespace characters, the first one will not be removed using the current method, which results it a single whitespace character left in the buf. To fix all of these issues, we use strim() to correctly remove both the leading and trailing whitespace. Link: https://lkml.kernel.org/r/20220121095623.1826679-1-ytcoode@gmail.com Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * ftrace: Deal with error return code of the ftrace_process_locs() functionYuntao Wang2022-05-261-4/+13
| | | | | | | | | | | | | | | | | | | | The ftrace_process_locs() function may return -ENOMEM error code, which should be handled by the callers. Link: https://lkml.kernel.org/r/20220120065949.1813231-1-ytcoode@gmail.com Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Use trace_create_file() to simplify creation of tracefs entriesYuntao Wang2022-05-265-48/+21
| | | | | | | | | | | | | | | | | | | | | | | | Creating tracefs entries with tracefs_create_file() followed by pr_warn() is tedious and repetitive, we can use trace_create_file() to simplify this process and make the code more readable. Link: https://lkml.kernel.org/r/20220114131052.534382-1-ytcoode@gmail.com Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Reset the function filter after completing trampoline/graph selftestLi Huafei2022-05-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The direct trampoline and graph coexistence test sets global_ops to trace only 'trace_selftest_dynamic_test_func', but does not reset it after the test is completed, resulting in the function filter being set already after the system starts. Although it can be reset through the tracefs interface, it is more or less confusing to the user, and we should reset it to trace all functions after the trampoline/graph test completes. Link: https://lkml.kernel.org/r/20220427034119.24668-1-lihuafei1@huawei.com Link: https://lore.kernel.org/all/20220418073958.104029-1-lihuafei1@huawei.com/ Fixes: 130c08065848 ("tracing: Add trampoline/graph selftest") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Have event format check not flag %p* on __get_dynamic_array()Steven Rostedt (Google)2022-05-251-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The print fmt check against trace events to make sure that the format does not use pointers that may be freed from the time of the trace to the time the event is read, gives a false positive on %pISpc when reading data that was saved in __get_dynamic_array() when it is perfectly fine to do so, as the data being read is on the ring buffer. Link: https://lore.kernel.org/all/20220407144524.2a592ed6@canb.auug.org.au/ Cc: stable@vger.kernel.org Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Remove check of list iterator against head past the loop bodyJakob Koschel2022-04-273-20/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When list_for_each_entry() completes the iteration over the whole list without breaking the loop, the iterator value will be a bogus pointer computed based on the head element. While it is safe to use the pointer to determine if it was computed based on the head element, either with list_entry_is_head() or &pos->member == head, using the iterator variable after the loop should be avoided. In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1]. Link: https://lkml.kernel.org/r/20220427170734.819891-5-jakobkoschel@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Replace usage of found with dedicated list iterator variableJakob Koschel2022-04-272-21/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To move the list iterator variable into the list_for_each_entry_*() macro in the future it should be avoided to use the list iterator variable after the loop body. To *never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use a found variable and simply checking if the variable was set, can determine if the break/goto was hit. Link: https://lkml.kernel.org/r/20220427170734.819891-4-jakobkoschel@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Remove usage of list iterator variable after the loopJakob Koschel2022-04-271-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1]. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/20220427170734.819891-3-jakobkoschel@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Remove usage of list iterator after the loop bodyJakob Koschel2022-04-271-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to limit the scope of the list iterator variable to the traversal loop, use a dedicated pointer to point to the found element [1]. Before, the code implicitly used the head when no element was found when using &pos->list. Since the new variable is only set if an element was found, the head needs to be used explicitly if the variable is NULL. Link: https://lkml.kernel.org/r/20220427170734.819891-2-jakobkoschel@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Introduce trace clock taiKurt Kanzenbach2022-04-271-0/+1
| | | | | | | | | | | | | | | | | | | | A fast/NMI safe accessor for CLOCK_TAI has been introduced. Use it for adding the additional trace clock "tai". Link: https://lkml.kernel.org/r/20220414091805.89667-3-kurt@linutronix.de Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * ring-buffer: Have 32 bit time stamps use all 64 bitsSteven Rostedt (Google)2022-04-271-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the new logic was made to handle deltas of events from interrupts that interrupted other events, it required 64 bit local atomics. Unfortunately, 64 bit local atomics are expensive on 32 bit architectures. Thus, commit 10464b4aa605e ("ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit") created a type of seq lock timer for 32 bits. It used two 32 bit local atomics, but required 2 bits from them each for synchronization, making it only 60 bits. Add a new "msb" field to hold the extra 4 bits that are cut off. Link: https://lore.kernel.org/all/20220426175338.3807ca4f@gandalf.local.home/ Link: https://lkml.kernel.org/r/20220427170812.53cc7139@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * ring-buffer: Have absolute time stamps handle large numbersSteven Rostedt (Google)2022-04-271-5/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's an absolute timestamp event in the ring buffer, but this only saves 59 bits of the timestamp, as the 5 MSB is used for meta data (stating it is an absolute time stamp). This was never an issue as all the clocks currently in use never used those 5 MSB. But now there's a new clock (TAI) that does. To handle this case, when reading an absolute timestamp, a previous full timestamp is passed in, and the 5 MSB of that timestamp is OR'd to the absolute timestamp (if any of the 5 MSB are set), and then to test for overflow, if the new result is smaller than the passed in previous timestamp, then 1 << 59 is added to it. All the extra processing is done on the reader "slow" path, with the exception of the "too big delta" check, and the reading of timestamps for histograms. Note, libtraceevent will need to be updated to handle this case as well. But this is not a user space regression, as user space was never able to handle any timestamps that used more than 59 bits. Link: https://lore.kernel.org/all/20220426175338.3807ca4f@gandalf.local.home/ Link: https://lkml.kernel.org/r/20220427153339.16c33f75@gandalf.local.home Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: make tracer_init_tracefs initcall asynchronousMark-PK Tsai2022-04-261-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move trace_eval_init() to subsys_initcall to make it start earlier. And to avoid tracer_init_tracefs being blocked by trace_event_sem which trace_eval_init() hold [1], queue tracer_init_tracefs() to eval_map_wq to let the two works being executed sequentially. It can speed up the initialization of kernel as result of making tracer_init_tracefs asynchronous. On my arm64 platform, it reduce ~20ms of 125ms which total time do_initcalls spend. Link: https://lkml.kernel.org/r/20220426122407.17042-3-mark-pk.tsai@mediatek.com [1]: https://lore.kernel.org/r/68d7b3327052757d0cd6359a6c9015a85b437232.camel@pengutronix.de Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Avoid adding tracer option before update_tracer_optionsMark-PK Tsai2022-04-261-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To prepare for support asynchronous tracer_init_tracefs initcall, avoid calling create_trace_option_files before __update_tracer_options. Otherwise, create_trace_option_files will show warning because some tracers in trace_types list are already in tr->topts. For example, hwlat_tracer call register_tracer in late_initcall, and global_trace.dir is already created in tracing_init_dentry, hwlat_tracer will be put into tr->topts. Then if the __update_tracer_options is executed after hwlat_tracer registered, create_trace_option_files find that hwlat_tracer is already in tr->topts. Link: https://lkml.kernel.org/r/20220426122407.17042-2-mark-pk.tsai@mediatek.com Link: https://lore.kernel.org/lkml/20220322133339.GA32582@xsang-OptiPlex-9020/ Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * ring-buffer: Simplify if-if to if-elseWan Jiabing2022-04-261-2/+2
| | | | | | | | | | | | | | | | | | Use if and else instead of if(A) and if (!A). Link: https://lkml.kernel.org/r/20220426070628.167565-1-wanjiabing@vivo.com Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Use WARN instead of printk and WARN_ONGuo Zhengkui2022-04-261-9/+3
| | | | | | | | | | | | | | | | | | | | | | Use `WARN(cond, ...)` instead of `if (cond)` + `printk(...)` + `WARN_ON(1)`. Link: https://lkml.kernel.org/r/20220424131932.3606-1-guozhengkui@vivo.com Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Fix sleeping function called from invalid context on RT kernelJun Miao2022-04-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When setting bootparams="trace_event=initcall:initcall_start tp_printk=1" in the cmdline, the output_printk() was called, and the spin_lock_irqsave() was called in the atomic and irq disable interrupt context suitation. On the PREEMPT_RT kernel, these locks are replaced with sleepable rt-spinlock, so the stack calltrace will be triggered. Fix it by raw_spin_lock_irqsave when PREEMPT_RT and "trace_event=initcall:initcall_start tp_printk=1" enabled. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 2, expected: 0 RCU nest depth: 0, expected: 0 Preemption disabled at: [<ffffffff8992303e>] try_to_wake_up+0x7e/0xba0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt17+ #19 34c5812404187a875f32bee7977f7367f9679ea7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x60/0x8c dump_stack+0x10/0x12 __might_resched.cold+0x11d/0x155 rt_spin_lock+0x40/0x70 trace_event_buffer_commit+0x2fa/0x4c0 ? map_vsyscall+0x93/0x93 trace_event_raw_event_initcall_start+0xbe/0x110 ? perf_trace_initcall_finish+0x210/0x210 ? probe_sched_wakeup+0x34/0x40 ? ttwu_do_wakeup+0xda/0x310 ? trace_hardirqs_on+0x35/0x170 ? map_vsyscall+0x93/0x93 do_one_initcall+0x217/0x3c0 ? trace_event_raw_event_initcall_level+0x170/0x170 ? push_cpu_stop+0x400/0x400 ? cblist_init_generic+0x241/0x290 kernel_init_freeable+0x1ac/0x347 ? _raw_spin_unlock_irq+0x65/0x80 ? rest_init+0xf0/0xf0 kernel_init+0x1e/0x150 ret_from_fork+0x22/0x30 </TASK> Link: https://lkml.kernel.org/r/20220419013910.894370-1-jun.miao@intel.com Signed-off-by: Jun Miao <jun.miao@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Change `if (strlen(glob))` to `if (glob[0])`Ammar Faizi2022-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | No need to traverse to the end of string. If the first byte is not a NUL char, it's guaranteed `if (strlen(glob))` is true. Link: https://lkml.kernel.org/r/20220417185630.199062-3-ammarfaizi2@gnuweeb.org Cc: Ingo Molnar <mingo@redhat.com> Cc: GNU/Weeb Mailing List <gwml@vger.gnuweeb.org> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Return -EINVAL if WARN_ON(!glob) triggered in ↵Ammar Faizi2022-04-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | event_hist_trigger_parse() If `WARN_ON(!glob)` is ever triggered, we will still continue executing the next lines. This will trigger the more serious problem, a NULL pointer dereference bug. Just return -EINVAL if @glob is NULL. Link: https://lkml.kernel.org/r/20220417185630.199062-2-ammarfaizi2@gnuweeb.org Cc: Ingo Molnar <mingo@redhat.com> Cc: GNU/Weeb Mailing List <gwml@vger.gnuweeb.org> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Make tp_printk work on syscall tracepointsJeff Xie2022-04-261-24/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the tp_printk option has no effect on syscall tracepoint. When adding the kernel option parameter tp_printk, then: echo 1 > /sys/kernel/debug/tracing/events/syscalls/enable When running any application, no trace information is printed on the terminal. Now added printk for syscall tracepoints. Link: https://lkml.kernel.org/r/20220410145025.681144-1-xiehuan09@gmail.com Signed-off-by: Jeff Xie <xiehuan09@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Fix tracing_map_sort_entries() kernel-doc commentYang Li2022-04-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the description of @n_sort_keys and make @sort_key -> @sort_keys in tracing_map_sort_entries() kernel-doc comment to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. kernel/trace/tracing_map.c:1073: warning: Function parameter or member 'sort_keys' not described in 'tracing_map_sort_entries' kernel/trace/tracing_map.c:1073: warning: Function parameter or member 'n_sort_keys' not described in 'tracing_map_sort_entries' kernel/trace/tracing_map.c:1073: warning: Excess function parameter 'sort_key' description in 'tracing_map_sort_entries' Link: https://lkml.kernel.org/r/20220402072015.45864-1-yang.lee@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Fix kernel-docJiapeng Chong2022-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following W=1 kernel warnings: kernel/trace/trace.c:1181: warning: expecting prototype for tracing_snapshot_cond_data(). Prototype was for tracing_cond_snapshot_data() instead. Link: https://lkml.kernel.org/r/20220218100849.122038-1-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Fix inconsistent style of mini-HOWTOOscar Shiang2022-04-261-2/+2
| | | | | | | | | | | | | | | | | | | | Each description should start with a hyphen and a space. Insert spaces to fix it. Link: https://lkml.kernel.org/r/TYCP286MB19130AA4A9C6FC5A8793DED2A1359@TYCP286MB1913.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Oscar Shiang <oscar0225@livemail.tw> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Separate hist state updates from hist registrationTom Zanussi2022-04-261-18/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | hist_register_trigger() handles both new hist registration as well as existing hist registration through event_command.reg(). Adding a new function, existing_hist_update_only(), that checks and updates existing histograms and exits after doing so allows the confusing logic in event_hist_trigger_parse() to be simplified. Link: https://lkml.kernel.org/r/211b2cd3e3d7e00f4f8ad45ef8b33063da6a7e05.1644010576.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Have existing event_command.parse() implementations use helpersTom Zanussi2022-04-264-151/+69
| | | | | | | | | | | | | | | | | | | | Simplify the existing event_command.parse() implementations by having them make use of the helper functions previously introduced. Link: https://lkml.kernel.org/r/b353e3427a81f9d3adafd98fd7d73e78a8209f43.1644010576.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Remove redundant trigger_ops paramsTom Zanussi2022-04-264-60/+36
| | | | | | | | | | | | | | | | | | | | | | Since event_trigger_data contains the .ops trigger_ops field, there's no reason to pass the trigger_ops separately. Remove it as a param from functions whenever event_trigger_data is passed. Link: https://lkml.kernel.org/r/9856c9bc81bde57077f5b8d6f8faa47156c6354a.1644010575.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Remove logic for registering multiple event triggers at a timeTom Zanussi2022-04-263-77/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code for registering triggers assumes it's possible to register more than one trigger at a time. In fact, it's unimplemented and there doesn't seem to be a reason to do that. Remove the n_registered param from event_trigger_register() and fix up callers. Doing so simplifies the logic in event_trigger_register to the point that it just becomes a wrapper calling event_command.reg(). It also removes the problematic call to event_command.unreg() in case of failure. A new function, event_trigger_unregister() is also added for callers to call themselves. The changes to trace_events_hist.c simply allow compilation; a separate patch follows which updates the hist triggers to work correctly with the new changes. Link: https://lkml.kernel.org/r/6149fec7a139d93e84fa4535672fb5bef88006b0.1644010575.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
| * tracing: Cleanup double word in commentTom Rix2022-04-261-2/+2
| | | | | | | | | | | | | | | | | | Remove the second 'is' and 'to'. Link: https://lkml.kernel.org/r/20220207131216.2059997-1-trix@redhat.com Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
* | Merge tag 'cxl-for-5.19' of ↵Linus Torvalds2022-05-273-3/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for this cycle. The highlight is new driver-core infrastructure and CXL subsystem changes for allowing lockdep to validate device_lock() usage. Thanks to PeterZ for setting me straight on the current capabilities of the lockdep API, and Greg acked it as well. On the CXL ACPI side this update adds support for CXL _OSC so that platform firmware knows that it is safe to still grant Linux native control of PCIe hotplug and error handling in the presence of CXL devices. A circular dependency problem was discovered between suspend and CXL memory for cases where the suspend image might be stored in CXL memory where that image also contains the PCI register state to restore to re-enable the device. Disable suspend for now until an architecture is defined to clarify that conflict. Lastly a collection of reworks, fixes, and cleanups to the CXL subsystem where support for snooping mailbox commands and properly handling the "mem_enable" flow are the highlights. Summary: - Add driver-core infrastructure for lockdep validation of device_lock(), and fixup a deadlock report that was previously hidden behind the 'lockdep no validate' policy. - Add CXL _OSC support for claiming native control of CXL hotplug and error handling. - Disable suspend in the presence of CXL memory unless and until a protocol is identified for restoring PCI device context from memory hosted on CXL PCI devices. - Add support for snooping CXL mailbox commands to protect against inopportune changes, like set-partition with the 'immediate' flag set. - Rework how the driver detects legacy CXL 1.1 configurations (CXL DVSEC / 'mem_enable') before enabling new CXL 2.0 decode configurations (CXL HDM Capability). - Miscellaneous cleanups and fixes from -next exposure" * tag 'cxl-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (47 commits) cxl/port: Enable HDM Capability after validating DVSEC Ranges cxl/port: Reuse 'struct cxl_hdm' context for hdm init cxl/port: Move endpoint HDM Decoder Capability init to port driver cxl/pci: Drop @info argument to cxl_hdm_decode_init() cxl/mem: Merge cxl_dvsec_ranges() and cxl_hdm_decode_init() cxl/mem: Skip range enumeration if mem_enable clear cxl/mem: Consolidate CXL DVSEC Range enumeration in the core cxl/pci: Move cxl_await_media_ready() to the core cxl/mem: Validate port connectivity before dvsec ranges cxl/mem: Fix cxl_mem_probe() error exit cxl/pci: Drop wait_for_valid() from cxl_await_media_ready() cxl/pci: Consolidate wait_for_media() and wait_for_media_ready() cxl/mem: Drop mem_enabled check from wait_for_media() nvdimm: Fix firmware activation deadlock scenarios device-core: Kill the lockdep_mutex nvdimm: Drop nd_device_lock() ACPI: NFIT: Drop nfit_device_lock() nvdimm: Replace lockdep_mutex with local lock classes cxl: Drop cxl_device_lock() cxl/acpi: Add root device lockdep validation ...
| * | PM: CXL: Disable suspendDan Williams2022-04-223-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CXL specification claims S3 support at a hardware level, but at a system software level there are some missing pieces. Section 9.4 (CXL 2.0) rightly claims that "CXL mem adapters may need aux power to retain memory context across S3", but there is no enumeration mechanism for the OS to determine if a given adapter has that support. Moreover the save state and resume image for the system may inadvertantly end up in a CXL device that needs to be restored before the save state is recoverable. I.e. a circular dependency that is not resolvable without a third party save-area. Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly allows for suspend, but requires unbinding all CXL memory devices before the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind flow is intended to also tear down all CXL memory regions associated with a given cxl_memdev. It is reasonable to assume that any device participating in a System RAM range published in the EFI memory map is covered by aux power and save-area outside the device itself. So this restriction can be minimized in the future once pre-existing region enumeration support arrives, and perhaps a spec update to clarify if the EFI memory map is sufficent for determining the range of devices managed by platform-firmware for S3 support. Per Rafael, if the CXL configuration prevents suspend then it should fail early before tasks are frozen, and mem_sleep should stop showing 'mem' as an option [1]. Effectively CXL augments the platform suspend ->valid() op since, for example, the ACPI ops are not aware of the CXL / PCI dependencies. Given the split role of platform firmware vs OS provisioned CXL memory it is up to the cxl_mem driver to determine if the CXL configuration has elements that platform firmware may not be prepared to restore. Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@mail.gmail.com [1] Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <len.brown@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/165066828317.3907920.5690432272182042556.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | Merge tag 'mm-hotfixes-stable-2022-05-27' of ↵Linus Torvalds2022-05-271-34/+0
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Six hotfixes. The page_table_check one from Miaohe Lin is considered a minor thing so it isn't marked for -stable. The remainder address pre-5.19 issues and are cc:stable" * tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/page_table_check: fix accessing unmapped ptep kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add] mm/page_alloc: always attempt to allocate at least one page during bulk allocation hugetlb: fix huge_pmd_unshare address update zsmalloc: fix races between asynchronous zspage free and page migration Revert "mm/cma.c: remove redundant cma_mutex lock"
| * | | kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]Naveen N. Rao2022-05-271-34/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [1], binutils (v2.36+) started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Address this by dropping the weak attribute from these functions. Instead, follow the existing pattern of having architectures #define the name of the function they want to override in their headers. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h] Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | | Merge tag 'mm-nonmm-stable-2022-05-26' of ↵Linus Torvalds2022-05-2710-20/+49
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc updates from Andrew Morton: "The non-MM patch queue for this merge window. Not a lot of material this cycle. Many singleton patches against various subsystems. Most notably some maintenance work in ocfs2 and initramfs" * tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits) kcov: update pos before writing pc in trace function ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock fs/ntfs: remove redundant variable idx fat: remove time truncations in vfat_create/vfat_mkdir fat: report creation time in statx fat: ignore ctime updates, and keep ctime identical to mtime in memory fat: split fat_truncate_time() into separate functions MAINTAINERS: add Muchun as a memcg reviewer proc/sysctl: make protected_* world readable ia64: mca: drop redundant spinlock initialization tty: fix deadlock caused by calling printk() under tty_port->lock relay: remove redundant assignment to pointer buf fs/ntfs3: validate BOOT sectors_per_clusters lib/string_helpers: fix not adding strarray to device's resource list kernel/crash_core.c: remove redundant check of ck_cmdline ELF, uapi: fixup ELF_ST_TYPE definition ipc/mqueue: use get_tree_nodev() in mqueue_get_tree() ipc: update semtimedop() to use hrtimer ipc/sem: remove redundant assignments ...