summaryrefslogtreecommitdiffstats
path: root/kernel/trace
Commit message (Collapse)AuthorAgeFilesLines
* scsi: block: Remove REQ_OP_WRITE_SAME supportChristoph Hellwig2022-02-221-1/+0
| | | | | | | | | | | | No more users of REQ_OP_WRITE_SAME or drivers implementing it are left, so remove the infrastructure. [mkp: fold in and tweak sysfs reporting fix] Link: https://lore.kernel.org/r/20220209082828.2629273-8-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* Merge tag 'trace-v5.17-3' of ↵Linus Torvalds2022-01-232-3/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Fix s390 breakage from sorting mcount tables. The latest merge of the tracing tree sorts the mcount table at build time. But s390 appears to do things differently (like always) and replaces the sorted table back to the original unsorted one. As the ftrace algorithm depends on it being sorted, bad things happen when it is not, and s390 experienced those bad things. Add a new config to tell the boot if the mcount table is sorted or not, and allow s390 to opt out of it" * tag 'trace-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix assuming build time sort works for s390
| * ftrace: Fix assuming build time sort works for s390Steven Rostedt (Google)2022-01-232-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To speed up the boot process, as mcount_loc needs to be sorted for ftrace to work properly, sorting it at build time is more efficient than boot up and can save milliseconds of time. Unfortunately, this change broke s390 as it will modify the mcount_loc location after the sorting takes place and will put back the unsorted locations. Since the sorting is skipped at boot up if it is believed that it was sorted at run time, ftrace can crash as its algorithms are dependent on the list being sorted. Add a new config BUILDTIME_MCOUNT_SORT that is set when BUILDTIME_TABLE_SORT but not if S390 is set. Use this config to determine if sorting should take place at boot up. Link: https://lore.kernel.org/all/yt9dee51ctfn.fsf@linux.ibm.com/ Fixes: 72b3942a173c ("scripts: ftrace - move the sort-processing in ftrace_init") Reported-by: Sven Schnelle <svens@linux.ibm.com> Tested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
* | Merge tag 'trace-v5.17' of ↵Linus Torvalds2022-01-1617-247/+800
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New: - The Real Time Linux Analysis (RTLA) tool is added to the tools directory. - Can safely filter on user space pointers with: field.ustring ~ "match-string" - eprobes can now be filtered like any other event. - trace_marker(_raw) now uses stream_open() to allow multiple threads to safely write to it. Note, this could possibly break existing user space, but we will not know until we hear about it, and then can revert the change if need be. - New field in events to display when bottom halfs are disabled. - Sorting of the ftrace functions are now done at compile time instead of at bootup. Infrastructure changes to support future efforts: - Added __rel_loc type for trace events. Similar to __data_loc but the offset to the dynamic data is based off of the location of the descriptor and not the beginning of the event. Needed for user defined events. - Some simplification of event trigger code. - Make synthetic events process its callback better to not hinder other event callbacks that are registered. Needed for user defined events. And other small fixes and cleanups" * tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits) tracing: Add ustring operation to filtering string pointers rtla: Add rtla timerlat hist documentation rtla: Add rtla timerlat top documentation rtla: Add rtla timerlat documentation rtla: Add rtla osnoise hist documentation rtla: Add rtla osnoise top documentation rtla: Add rtla osnoise man page rtla: Add Documentation rtla/timerlat: Add timerlat hist mode rtla: Add timerlat tool and timelart top mode rtla/osnoise: Add the hist mode rtla/osnoise: Add osnoise top mode rtla: Add osnoise tool rtla: Helper functions for rtla rtla: Real-Time Linux Analysis tool tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails tracing: Remove duplicate warnings when calling trace_create_file() tracing/kprobes: 'nmissed' not showed correctly for kretprobe tracing: Add test for user space strings when filtering on string pointers tracing: Have syscall trace events use trace_event_buffer_lock_reserve() ...
| * tracing: Add ustring operation to filtering string pointersSteven Rostedt2022-01-141-24/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since referencing user space pointers is special, if the user wants to filter on a field that is a pointer to user space, then they need to specify it. Add a ".ustring" attribute to the field name for filters to state that the field is pointing to user space such that the kernel can take the appropriate action to read that pointer. Link: https://lore.kernel.org/all/yt9d8rvmt2jq.fsf@linux.ibm.com/ Fixes: 77360f9bbc7e ("tracing: Add test for user space strings when filtering on string pointers") Tested-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() failsNikita Yushchenko2022-01-131-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If start_per_cpu_kthreads() called from osnoise_workload_start() returns error, event hooks are left in broken state: unhook_irq_events() called but unhook_thread_events() and unhook_softirq_events() not called, and trace_osnoise_callback_enabled flag not cleared. On the next tracer enable, hooks get not installed due to trace_osnoise_callback_enabled flag. And on the further tracer disable an attempt to remove non-installed hooks happened, hitting a WARN_ON_ONCE() in tracepoint_remove_func(). Fix the error path by adding the missing part of cleanup. While at this, introduce osnoise_unhook_events() to avoid code duplication between this error path and normal tracer disable. Link: https://lkml.kernel.org/r/20220109153459.3701773-1-nikita.yushchenko@virtuozzo.com Cc: stable@vger.kernel.org Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Remove duplicate warnings when calling trace_create_file()Yuntao Wang2022-01-131-9/+3
| | | | | | | | | | | | | | | | | | | | Since the same warning message is already printed in the trace_create_file() function, there is no need to print it again. Link: https://lkml.kernel.org/r/20220109162232.361747-1-ytcoode@gmail.com Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing/kprobes: 'nmissed' not showed correctly for kretprobeXiangyang Zhang2022-01-131-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'nmissed' column of the 'kprobe_profile' file for kretprobe is not showed correctly, kretprobe can be skipped by two reasons, shortage of kretprobe_instance which is counted by tk->rp.nmissed, and kprobe itself is missed by some reason, so to show the sum. Link: https://lkml.kernel.org/r/20220107150242.5019-1-xyz.sun.ok@gmail.com Cc: stable@vger.kernel.org Fixes: 4a846b443b4e ("tracing/kprobes: Cleanup kprobe tracer code") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Xiangyang Zhang <xyz.sun.ok@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Add test for user space strings when filtering on string pointersSteven Rostedt2022-01-131-3/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pingfan reported that the following causes a fault: echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter echo 1 > events/syscalls/sys_enter_at/enable The reason is that trace event filter treats the user space pointer defined by "filename" as a normal pointer to compare against the "cpu" string. The following bug happened: kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60 #PF: supervisor read access in kernel mode #PF: error_code(0x0001) - permissions violation PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867 Oops: 0001 [#1] PREEMPT SMP PTI CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:strlen+0x0/0x20 Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246 RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000 RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60 RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000 R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380 R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0 FS: 00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: filter_pred_pchar+0x18/0x40 filter_match_preds+0x31/0x70 ftrace_syscall_enter+0x27a/0x2c0 syscall_trace_enter.constprop.0+0x1aa/0x1d0 do_syscall_64+0x16/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fa861d88664 The above happened because the kernel tried to access user space directly and triggered a "supervisor read access in kernel mode" fault. Worse yet, the memory could not even be loaded yet, and a SEGFAULT could happen as well. This could be true for kernel space accessing as well. To be even more robust, test both kernel and user space strings. If the string fails to read, then simply have the filter fail. Note, TASK_SIZE is used to determine if the pointer is user or kernel space and the appropriate strncpy_from_kernel/user_nofault() function is used to copy the memory. For some architectures, the compare to TASK_SIZE may always pick user space or kernel space. If it gets it wrong, the only thing is that the filter will fail to match. In the future, this needs to be fixed to have the event denote which should be used. But failing a filter is much better than panicing the machine, and that can be solved later. Link: https://lore.kernel.org/all/20220107044951.22080-1-kernelfans@gmail.com/ Link: https://lkml.kernel.org/r/20220110115532.536088fd@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Reported-by: Pingfan Liu <kernelfans@gmail.com> Tested-by: Pingfan Liu <kernelfans@gmail.com> Fixes: 87a342f5db69d ("tracing/filters: Support filtering for char * strings") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Have syscall trace events use trace_event_buffer_lock_reserve()Steven Rostedt2022-01-131-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the syscall trace events call trace_buffer_lock_reserve() directly, which means that it misses out on some of the filtering optimizations provided by the helper function trace_event_buffer_lock_reserve(). Have the syscall trace events call that instead, as it was missed when adding the update to use the temp buffer when filtering. Link: https://lkml.kernel.org/r/20220107225839.823118570@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 0fc1b09ff1ff4 ("tracing: Use temp buffer when filtering events") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace: Add test to make sure compiled time sorts workSteven Rostedt (VMware)2022-01-132-0/+37
| | | | | | | | | | | | | | | | | | | | | | Now that ftrace function pointers are sorted at compile time, add a test that makes sure they are sorted at run time. This test is only run if it is configured in. Link: https://lkml.kernel.org/r/20211206151858.4d21a24d@gandalf.local.home Cc: Yinan Liu <yinan@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * scripts: ftrace - move the sort-processing in ftrace_initYinan Liu2022-01-131-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When the kernel starts, the initialization of ftrace takes up a portion of the time (approximately 6~8ms) to sort mcount addresses. We can save this time by moving mcount-sorting to compile time. Link: https://lkml.kernel.org/r/20211212113358.34208-2-yinan@linux.alibaba.com Signed-off-by: Yinan Liu <yinan@linux.alibaba.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing/probes: check the return value of kstrndup() for pbufXiaoke Wang2022-01-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | kstrndup() is a memory allocation-related function, it returns NULL when some internal memory errors happen. It is better to check the return value of it so to catch the memory error in time. Link: https://lkml.kernel.org/r/tencent_4D6E270731456EB88712ED7F13883C334906@qq.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: a42e3c4de964 ("tracing/probe: Add immediate string parameter support") Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing/uprobes: Check the return value of kstrdup() for tu->filenameXiaoke Wang2022-01-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | kstrdup() returns NULL when some internal memory errors happen, it is better to check the return value of it so to catch the memory error in time. Link: https://lkml.kernel.org/r/tencent_3C2E330722056D7891D2C83F29C802734B06@qq.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU") Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Account bottom half disabled sections.Sebastian Andrzej Siewior2022-01-132-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disabling only bottom halves via local_bh_disable() disables also preemption but this remains invisible to tracing. On a CONFIG_PREEMPT kernel one might wonder why there is no scheduling happening despite the N flag in the trace. The reason might be the a rcu_read_lock_bh() section. Add a 'b' to the tracing output if in task context with disabled bottom halves. Link: https://lkml.kernel.org/r/YbcbtdtC/bjCKo57@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Add helper functions to simplify event_command.parse() callback ↵Tom Zanussi2022-01-102-0/+366
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | handling The event_command.parse() callback is responsible for parsing and registering triggers. The existing command implementions for this callback duplicate a lot of the same code, so to clean up and consolidate those implementations, introduce a handful of helper functions for implementors to use. This also makes it easier for new commands to be implemented and allows them to focus more on the customizations they provide rather than obscuring and complicating it with boilerplate code. Link: https://lkml.kernel.org/r/c1ff71f594d45177706571132bd3119491097221.1641823001.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Remove ops param from event_command reg()/unreg() callbacksTom Zanussi2022-01-104-28/+20
| | | | | | | | | | | | | | | | | | | | The event_trigger_ops for an event_command are already accessible via event_trigger_data.ops so remove the redundant ops from the callback. Link: https://lkml.kernel.org/r/4c6f2a41820452f9cacddc7634ad442928aa2aa6.1641823001.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Change event_trigger_ops func() to trigger()Tom Zanussi2022-01-104-26/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The name of the func() callback on event_trigger_ops is too generic and is easily confused with other callbacks with that name, so change it to something that reflects its actual purpose. In this case, the main purpose of the callback is to implement an event trigger, so call it trigger() instead. Also add some more documentation to event_trigger_ops describing the callbacks a bit better. Link: https://lkml.kernel.org/r/36ab812e3ee74ee03ae0043fda41a858ee728c00.1641823001.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Change event_command func() to parse()Tom Zanussi2022-01-104-39/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The name of the func() callback on event_command is too generic and is easily confused with other callbacks with that name, so change it to something that reflects its actual purpose. In this case, the main purpose of the callback is to parse an event command, so call it parse() instead. Link: https://lkml.kernel.org/r/7784e321840752ed88aac0b349c0c685fc9247b1.1641823001.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Use trace_iterator_reset() in tracing_read_pipe()Steven Rostedt (VMware)2021-12-111-2/+1
| | | | | | | | | | | | | | | | | | Currently tracing_read_pipe() open codes trace_iterator_reset(). Just have it use trace_iterator_reset() instead. Link: https://lkml.kernel.org/r/20211210202616.64d432d2@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Use memset_startat helper in trace_iterator_reset()Xiu Jianfeng2021-12-111-8/+1
| | | | | | | | | | | | | | | | | | | | Make use of memset_startat helper to simplify the code, there should be no functional change as a result of this patch. Link: https://lkml.kernel.org/r/20211210012245.207489-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Do not let synth_events block other dyn_event systems during createBeau Belgrave2021-12-111-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | synth_events is returning -EINVAL if the dyn_event create command does not contain ' \t'. This prevents other systems from getting called back. synth_events needs to return -ECANCELED in these cases when the command is not targeting the synth_event system. Link: https://lore.kernel.org/linux-trace-devel/20210930223821.11025-1-beaub@linux.microsoft.com Fixes: c9e759b1e8456 ("tracing: Rework synthetic event command parsing") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Iterate trace_[ku]probe objects directlyJiri Olsa2021-12-112-24/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested by Linus [1] using list_for_each_entry to iterate directly trace_[ku]probe objects so we can skip another call to container_of in these loops. [1] https://lore.kernel.org/r/CAHk-=wjakjw6-rDzDDBsuMoDCqd+9ogifR_EE1F0K-jYek1CdA@mail.gmail.com Link: https://lkml.kernel.org/r/20211125202852.406405-1-jolsa@kernel.org Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Make trace_marker{,_raw} stream-likeJohn Keeping2021-12-071-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tracing marker files are write-only streams with no meaningful concept of file position. Using stream_open() to mark them as stream-link indicates this and has the added advantage that a single file descriptor can now be used from multiple threads without contention thanks to clearing FMODE_ATOMIC_POS. Note that this has the potential to break existing userspace by since both lseek(2) and pwrite(2) will now return ESPIPE when previously lseek would have updated the stored offset and pwrite would have appended to the trace. A survey of libtracefs and several other projects found to use trace_marker(_raw) [1][2][3] suggests that everyone limits themselves to calling write(2) and close(2) on these file descriptors so there is a good chance this will go unnoticed and the benefits of reduced overhead and lock contention seem worth the risk. [1] https://github.com/google/perfetto [2] https://github.com/intel/media-driver/ [3] https://w1.fi/cgit/hostap/ Link: https://lkml.kernel.org/r/20211207142558.347029-1-john@metanate.com Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Switch to kvfree_rcu() APIUladzislau Rezki (Sony)2021-12-062-4/+2
| | | | | | | | | | | | | | | | | | | | | | Instead of invoking a synchronize_rcu() to free a pointer after a grace period we can directly make use of new API that does the same but in more efficient way. Link: https://lkml.kernel.org/r/20211124110308.2053-10-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Fix synth_event_add_val() kernel-doc commentQiujun Huang2021-12-061-1/+1
| | | | | | | | | | | | | | | | | | It's named field here. Link: https://lkml.kernel.org/r/20210516022410.64271-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/uprobes: Use trace_event_buffer_reserve() helperSteven Rostedt (VMware)2021-12-061-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | To be consistent with kprobes and eprobes, use trace_event_buffer_reserver() and trace_event_buffer_commit(). This will ensure that any updates to trace events will also be implemented on uprobe events. Link: https://lkml.kernel.org/r/20211206162440.69fbf96c@gandalf.local.home Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/kprobes: Do not open code event reserve logicSteven Rostedt (VMware)2021-12-061-18/+7
| | | | | | | | | | | | | | | | | | | | | | | | As kprobe events use trace_event_buffer_commit() to commit the event to the ftrace ring buffer, for consistency, it should use trace_event_buffer_reserve() to allocate it, as the two functions are related. Link: https://lkml.kernel.org/r/20211130024319.257430762@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Have eprobes use filtering logic of trace eventsSteven Rostedt (VMware)2021-12-061-11/+5
| | | | | | | | | | | | | | | | | | | | | | The eprobes open code the reserving of the event on the ring buffer for ftrace instead of using the ftrace event wrappers, which means that it doesn't get affected by the filters, breaking the filtering logic on user space. Link: https://lkml.kernel.org/r/20211130024319.068451680@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Disable preemption when using the filter bufferSteven Rostedt (VMware)2021-12-062-27/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case trace_event_buffer_lock_reserve() is called with preemption enabled, the algorithm that defines the usage of the per cpu filter buffer may fail if the task schedules to another CPU after determining which buffer it will use. Disable preemption when using the filter buffer. And because that same buffer must be used throughout the call, keep preemption disabled until the filter buffer is released. This will also keep the semantics between the use case of when the filter buffer is used, and when the ring buffer itself is used, as that case also disables preemption until the ring buffer is released. Link: https://lkml.kernel.org/r/20211130024318.880190623@goodmis.org [ Fixed warning of assignment in if statement Reported-by: kernel test robot <lkp@intel.com> ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Use __this_cpu_read() in trace_event_buffer_lock_reserver()Steven Rostedt (VMware)2021-12-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The value read by this_cpu_read() is used later and its use is expected to stay on the same CPU as being read. But this_cpu_read() does not warn if it is called without preemption disabled, where as __this_cpu_read() will check if preemption is disabled on CONFIG_DEBUG_PREEMPT Currently all callers have preemption disabled, but there may be new callers in the future that may not. Link: https://lkml.kernel.org/r/20211130024318.698165354@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Add '__rel_loc' using trace event macrosMasami Hiramatsu2021-12-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add '__rel_loc' using trace event macros. These macros are usually not used in the kernel, except for testing purpose. This also add "rel_" variant of macros for dynamic_array string, and bitmask. Link: https://lkml.kernel.org/r/163757342119.510314.816029622439099016.stgit@devnote2 Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Support __rel_loc relative dynamic data location attributeMasami Hiramatsu2021-12-064-6/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add '__rel_loc' new dynamic data location attribute which encodes the data location from the next to the field itself. The '__data_loc' is used for encoding the dynamic data location on the trace event record. But '__data_loc' is not useful if the writer doesn't know the event header (e.g. user event), because it records the dynamic data offset from the entry of the record, not the field itself. This new '__rel_loc' attribute encodes the data location relatively from the next of the field. For example, when there is a record like below (the number in the parentheses is the size of fields) |header(N)|common(M)|fields(K)|__data_loc(4)|fields(L)|data(G)| In this case, '__data_loc' field will be __data_loc = (G << 16) | (N+M+K+4+L) If '__rel_loc' is used, this will be |header(N)|common(M)|fields(K)|__rel_loc(4)|fields(L)|data(G)| where __rel_loc = (G << 16) | (L) This case shows L bytes after the '__rel_loc' attribute field, if there is no fields after the __rel_loc field, L must be 0. This is relatively easy (and no need to consider the kernel header change) when the event data fields are composed by user who doesn't know header and common fields. Link: https://lkml.kernel.org/r/163757341258.510314.4214431827833229956.stgit@devnote2 Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Fix spelling mistake "aritmethic" -> "arithmetic"Colin Ian King2021-12-061-1/+1
| | | | | | | | | | | | | | | | | | There is a spelling mistake in the tracing mini-HOWTO text. Fix it. Link: https://lkml.kernel.org/r/20211108201513.42876-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2022-01-153-12/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc updates from Andrew Morton: "146 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak, dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap, memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb, userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp, ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits) mm/damon: hide kernel pointer from tracepoint event mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging mm/damon/dbgfs: remove an unnecessary variable mm/damon: move the implementation of damon_insert_region to damon.h mm/damon: add access checking for hugetlb pages Docs/admin-guide/mm/damon/usage: update for schemes statistics mm/damon/dbgfs: support all DAMOS stats Docs/admin-guide/mm/damon/reclaim: document statistics parameters mm/damon/reclaim: provide reclamation statistics mm/damon/schemes: account how many times quota limit has exceeded mm/damon/schemes: account scheme actions that successfully applied mm/damon: remove a mistakenly added comment for a future feature Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning Docs/admin-guide/mm/damon/usage: remove redundant information Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks mm/damon: convert macro functions to static inline functions mm/damon: modify damon_rand() macro to static inline function mm/damon: move damon_rand() definition into damon.h ...
| * | trace/hwlat: make use of the helper function kthread_run_on_cpu()Cai Huoqing2022-01-151-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace kthread_create_on_cpu/wake_up_process() with kthread_run_on_cpu() to simplify the code. Link: https://lkml.kernel.org/r/20211022025711.3673-7-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Cc: Bernard Metzler <bmt@zurich.ibm.com> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Doug Ledford <dledford@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | trace/osnoise: make use of the helper function kthread_run_on_cpu()Cai Huoqing2022-01-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace kthread_create_on_cpu/wake_up_process() with kthread_run_on_cpu() to simplify the code. Link: https://lkml.kernel.org/r/20211022025711.3673-6-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Cc: Bernard Metzler <bmt@zurich.ibm.com> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Doug Ledford <dledford@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | ring-buffer: make use of the helper function kthread_run_on_cpu()Cai Huoqing2022-01-151-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace kthread_create/kthread_bind/wake_up_process() with kthread_run_on_cpu() to simplify the code. Link: https://lkml.kernel.org/r/20211022025711.3673-4-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Cc: Bernard Metzler <bmt@zurich.ibm.com> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Doug Ledford <dledford@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-blockLinus Torvalds2022-01-121-10/+10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - Unify where the struct request handling code is located in the blk-mq code (Christoph) - Header cleanups (Christoph) - Clean up the io_context handling code (Christoph, me) - Get rid of ->rq_disk in struct request (Christoph) - Error handling fix for add_disk() (Christoph) - request allocation cleanusp (Christoph) - Documentation updates (Eric, Matthew) - Remove trivial crypto unregister helper (Eric) - Reduce shared tag overhead (John) - Reduce poll_stats memory overhead (me) - Known indirect function call for dio (me) - Use atomic references for struct request (me) - Support request list issue for block and NVMe (me) - Improve queue dispatch pinning (Ming) - Improve the direct list issue code (Keith) - BFQ improvements (Jan) - Direct completion helper and use it in mmc block (Sebastian) - Use raw spinlock for the blktrace code (Wander) - fsync error handling fix (Ye) - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me) * tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits) MAINTAINERS: add entries for block layer documentation docs: block: remove queue-sysfs.rst docs: sysfs-block: document virt_boundary_mask docs: sysfs-block: document stable_writes docs: sysfs-block: fill in missing documentation from queue-sysfs.rst docs: sysfs-block: add contact for nomerges docs: sysfs-block: sort alphabetically docs: sysfs-block: move to stable directory block: don't protect submit_bio_checks by q_usage_counter block: fix old-style declaration nvme-pci: fix queue_rqs list splitting block: introduce rq_list_move block: introduce rq_list_for_each_safe macro block: move rq_list macros to blk-mq.h block: drop needless assignment in set_task_ioprio() block: remove unnecessary trailing '\' bio.h: fix kernel-doc warnings block: check minor range in device_add_disk() block: use "unsigned long" for blk_validate_block_size(). block: fix error unwinding in device_add_disk ...
| * | | blktrace: switch trace spinlock to a raw spinlockWander Lairson Costa2021-12-201-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The running_trace_lock protects running_trace_list and is acquired within the tracepoint which implies disabled preemption. The spinlock_t typed lock can not be acquired with disabled preemption on PREEMPT_RT because it becomes a sleeping lock. The runtime of the tracepoint depends on the number of entries in running_trace_list and has no limit. The blk-tracer is considered debug code and higher latencies here are okay. Make running_trace_lock a raw_spinlock_t. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20211220192827.38297-1-wander@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: remove the ->rq_disk field in struct requestChristoph Hellwig2021-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just use the disk attached to the request_queue instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge tag '5.17-net-next' of ↵Linus Torvalds2022-01-103-21/+74
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core ---- - Defer freeing TCP skbs to the BH handler, whenever possible, or at least perform the freeing outside of the socket lock section to decrease cross-CPU allocator work and improve latency. - Add netdevice refcount tracking to locate sources of netdevice and net namespace refcount leaks. - Make Tx watchdog less intrusive - avoid pausing Tx and restarting all queues from a single CPU removing latency spikes. - Various small optimizations throughout the stack from Eric Dumazet. - Make netdev->dev_addr[] constant, force modifications to go via appropriate helpers to allow us to keep addresses in ordered data structures. - Replace unix_table_lock with per-hash locks, improving performance of bind() calls. - Extend skb drop tracepoint with a drop reason. - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW. BPF --- - New helpers: - bpf_find_vma(), find and inspect VMAs for profiling use cases - bpf_loop(), runtime-bounded loop helper trading some execution time for much faster (if at all converging) verification - bpf_strncmp(), improve performance, avoid compiler flakiness - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt() for tracing programs, all inlined by the verifier - Support BPF relocations (CO-RE) in the kernel loader. - Further the support for BTF_TYPE_TAG annotations. - Allow access to local storage in sleepable helpers. - Convert verifier argument types to a composable form with different attributes which can be shared across types (ro, maybe-null). - Prepare libbpf for upcoming v1.0 release by cleaning up APIs, creating new, extensible ones where missing and deprecating those to be removed. Protocols --------- - WiFi (mac80211/cfg80211): - notify user space about long "come back in N" AP responses, allow it to react to such temporary rejections - allow non-standard VHT MCS 10/11 rates - use coarse time in airtime fairness code to save CPU cycles - Bluetooth: - rework of HCI command execution serialization to use a common queue and work struct, and improve handling errors reported in the middle of a batch of commands - rework HCI event handling to use skb_pull_data, avoiding packet parsing pitfalls - support AOSP Bluetooth Quality Report - SMC: - support net namespaces, following the RDMA model - improve connection establishment latency by pre-clearing buffers - introduce TCP ULP for automatic redirection to SMC - Multi-Path TCP: - support ioctls: SIOCINQ, OUTQ, and OUTQNSD - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT, IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY - support cmsgs: TCP_INQ - improvements in the data scheduler (assigning data to subflows) - support fastclose option (quick shutdown of the full MPTCP connection, similar to TCP RST in regular TCP) - MCTP (Management Component Transport) over serial, as defined by DMTF spec DSP0253 - "MCTP Serial Transport Binding". Driver API ---------- - Support timestamping on bond interfaces in active/passive mode. - Introduce generic phylink link mode validation for drivers which don't have any quirks and where MAC capability bits fully express what's supported. Allow PCS layer to participate in the validation. Convert a number of drivers. - Add support to set/get size of buffers on the Rx rings and size of the tx copybreak buffer via ethtool. - Support offloading TC actions as first-class citizens rather than only as attributes of filters, improve sharing and device resource utilization. - WiFi (mac80211/cfg80211): - support forwarding offload (ndo_fill_forward_path) - support for background radar detection hardware - SA Query Procedures offload on the AP side New hardware / drivers ---------------------- - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with real-time requirements for isochronous communication with protocols like OPC UA Pub/Sub. - Qualcomm BAM-DMUX WWAN - driver for data channels of modems integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974 (qcom_bam_dmux). - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver with support for bridging, VLANs and multicast forwarding (lan966x). - iwlmei driver for co-operating between Intel's WiFi driver and Intel's Active Management Technology (AMT) devices. - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips - Bluetooth: - MediaTek MT7921 SDIO devices - Foxconn MT7922A - Realtek RTL8852AE Drivers ------- - Significantly improve performance in the datapaths of: lan78xx, ax88179_178a, lantiq_xrx200, bnxt. - Intel Ethernet NICs: - igb: support PTP/time PEROUT and EXTTS SDP functions on 82580/i354/i350 adapters - ixgbevf: new PF -> VF mailbox API which avoids the risk of mailbox corruption with ESXi - iavf: support configuration of VLAN features of finer granularity, stacked tags and filtering - ice: PTP support for new E822 devices with sub-ns precision - ice: support firmware activation without reboot - Mellanox Ethernet NICs (mlx5): - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - support TC forwarding when tunnel encap and decap happen between two ports of the same NIC - dynamically size and allow disabling various features to save resources for running in embedded / SmartNIC scenarios - Broadcom Ethernet NICs (bnxt): - use page frag allocator to improve Rx performance - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - Other Ethernet NICs: - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support - Microsoft cloud/virtual NIC (mana): - add XDP support (PASS, DROP, TX) - Mellanox Ethernet switches (mlxsw): - initial support for Spectrum-4 ASICs - VxLAN with IPv6 underlay - Marvell Ethernet switches (prestera): - support flower flow templates - add basic IP forwarding support - NXP embedded Ethernet switches (ocelot & felix): - support Per-Stream Filtering and Policing (PSFP) - enable cut-through forwarding between ports by default - support FDMA to improve packet Rx/Tx to CPU - Other embedded switches: - hellcreek: improve trapping management (STP and PTP) packets - qca8k: support link aggregation and port mirroring - Qualcomm 802.11ax WiFi (ath11k): - qca6390, wcn6855: enable 802.11 power save mode in station mode - BSS color change support - WCN6855 hw2.1 support - 11d scan offload support - scan MAC address randomization support - full monitor mode, only supported on QCN9074 - qca6390/wcn6855: report signal and tx bitrate - qca6390: rfkill support - qca6390/wcn6855: regdb.bin support - Intel WiFi (iwlwifi): - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS) in cooperation with the BIOS - support for Optimized Connectivity Experience (OCE) scan - support firmware API version 68 - lots of preparatory work for the upcoming Bz device family - MediaTek WiFi (mt76): - Specific Absorption Rate (SAR) support - mt7921: 160 MHz channel support - RealTek WiFi (rtw88): - Specific Absorption Rate (SAR) support - scan offload - Other WiFi NICs - ath10k: support fetching (pre-)calibration data from nvmem - brcmfmac: configure keep-alive packet on suspend - wcn36xx: beacon filter support" * tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits) tcp: tcp_send_challenge_ack delete useless param `skb` net/qla3xxx: Remove useless DMA-32 fallback configuration rocker: Remove useless DMA-32 fallback configuration hinic: Remove useless DMA-32 fallback configuration lan743x: Remove useless DMA-32 fallback configuration net: enetc: Remove useless DMA-32 fallback configuration cxgb4vf: Remove useless DMA-32 fallback configuration cxgb4: Remove useless DMA-32 fallback configuration cxgb3: Remove useless DMA-32 fallback configuration bnx2x: Remove useless DMA-32 fallback configuration et131x: Remove useless DMA-32 fallback configuration be2net: Remove useless DMA-32 fallback configuration vmxnet3: Remove useless DMA-32 fallback configuration bna: Simplify DMA setting net: alteon: Simplify DMA setting myri10ge: Simplify DMA setting qlcnic: Simplify DMA setting net: allwinner: Fix print format page_pool: remove spinlock in page_pool_refill_alloc_cache() amt: fix wrong return type of amt_send_membership_update() ...
| * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2021-12-313-14/+69
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.Hao Luo2021-12-181-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some helper functions may modify its arguments, for example, bpf_d_path, bpf_get_stack etc. Previously, their argument types were marked as ARG_PTR_TO_MEM, which is compatible with read-only mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate, but technically incorrect, to modify a read-only memory by passing it into one of such helper functions. This patch tags the bpf_args compatible with immutable memory with MEM_RDONLY flag. The arguments that don't have this flag will be only compatible with mutable memory types, preventing the helper from modifying a read-only memory. The bpf_args that have MEM_RDONLY are compatible with both mutable memory and immutable memory. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
| | * | | add missing bpf-cgroup.h includesJakub Kicinski2021-12-162-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency, make sure those who actually need more than the definition of struct cgroup_bpf include bpf-cgroup.h explicitly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20211216025538.1649516-3-kuba@kernel.org
| | * | | bpf: Add get_func_[arg|ret|arg_cnt] helpersJiri Olsa2021-12-131-1/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding following helpers for tracing programs: Get n-th argument of the traced function: long bpf_get_func_arg(void *ctx, u32 n, u64 *value) Get return value of the traced function: long bpf_get_func_ret(void *ctx, u64 *value) Get arguments count of the traced function: long bpf_get_func_arg_cnt(void *ctx) The trampoline now stores number of arguments on ctx-8 address, so it's easy to verify argument index and find return value argument's position. Moving function ip address on the trampoline stack behind the number of functions arguments, so it's now stored on ctx-16 address if it's needed. All helpers above are inlined by verifier. Also bit unrelated small change - using newly added function bpf_prog_has_trampoline in check_get_func_ip. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211208193245.172141-5-jolsa@kernel.org
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2021-12-162-6/+13
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2021-12-101-5/+1
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrii Nakryiko says: ==================== bpf-next 2021-12-10 v2 We've added 115 non-merge commits during the last 26 day(s) which contain a total of 182 files changed, 5747 insertions(+), 2564 deletions(-). The main changes are: 1) Various samples fixes, from Alexander Lobakin. 2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov. 3) A batch of new unified APIs for libbpf, logging improvements, version querying, etc. Also a batch of old deprecations for old APIs and various bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko. 4) BPF documentation reorganization and improvements, from Christoph Hellwig and Dave Tucker. 5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in libbpf, from Hengqi Chen. 6) Verifier log fixes, from Hou Tao. 7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong. 8) Extend branch record capturing to all platforms that support it, from Kajol Jain. 9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi. 10) bpftool doc-generating script improvements, from Quentin Monnet. 11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet. 12) Deprecation warning fix for perf/bpf_counter, from Song Liu. 13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf, from Tiezhu Yang. 14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song. 15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun, and others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits) libbpf: Add "bool skipped" to struct bpf_map libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition bpftool: Switch bpf_object__load_xattr() to bpf_object__load() selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr() selftests/bpf: Add test for libbpf's custom log_buf behavior selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load() libbpf: Deprecate bpf_object__load_xattr() libbpf: Add per-program log buffer setter and getter libbpf: Preserve kernel error code and remove kprobe prog type guessing libbpf: Improve logging around BPF program loading libbpf: Allow passing user log setting through bpf_object_open_opts libbpf: Allow passing preallocated log_buf when loading BTF into kernel libbpf: Add OPTS-based bpf_btf_load() API libbpf: Fix bpf_prog_load() log_buf logic for log_level 0 samples/bpf: Remove unneeded variable bpf: Remove redundant assignment to pointer t selftests/bpf: Fix a compilation warning perf/bpf_counter: Use bpf_map_create instead of bpf_create_map samples: bpf: Fix 'unknown warning group' build warning on Clang samples: bpf: Fix xdp_sample_user.o linking with Clang ... ==================== Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | bpf: Remove config check to enable bpf support for branch recordsKajol Jain2021-12-061-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Branch data available to BPF programs can be very useful to get stack traces out of userspace application. Commit fff7b64355ea ("bpf: Add bpf_read_branch_records() helper") added BPF support to capture branch records in x86. Enable this feature also for other architectures as well by removing checks specific to x86. If an architecture doesn't support branch records, bpf_read_branch_records() still has appropriate checks and it will return an -EINVAL in that scenario. Based on UAPI helper doc in include/uapi/linux/bpf.h, unsupported architectures should return -ENOENT in such case. Hence, update the appropriate check to return -ENOENT instead. Selftest 'perf_branches' result on power9 machine which has the branch stacks support: - Before this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:FAIL #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:FAIL Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED - After this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:OK #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Selftest 'perf_branches' result on power9 machine which doesn't have branch stack report: - After this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:SKIP #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:OK Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED Fixes: fff7b64355eac ("bpf: Add bpf_read_branch_records() helper") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211206073315.77432-1-kjain@linux.ibm.com
| * | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2021-12-024-7/+34
| |\ \ \ \ \ | | | |_|_|/ | | |/| | | | | | | | | Signed-off-by: Jakub Kicinski <kuba@kernel.org>