summaryrefslogtreecommitdiffstats
path: root/kernel/trace
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2018-04-052-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: kfifo: fix inaccurate comment tools/thermal: tmon: fix for segfault net: Spelling s/stucture/structure/ edd: don't spam log if no EDD information is present Documentation: Fix early-microcode.txt references after file rename tracing: Block comments should align the * on each line treewide: Fix typos in printk GenWQE: Fix a typo in two comments treewide: Align function definition open/close braces
| * tracing: Block comments should align the * on each lineRohit Visavalia2018-03-271-1/+1
| | | | | | | | | | | | | | | | Resolved Block comments use * on subsequent lines checkpatch warning. Issue found by checkpatch. Signed-off-by: Rohit Visavalia <rohit.visavalia@softnautics.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
| * treewide: Align function definition open/close bracesJoe Perches2018-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some functions definitions have either the initial open brace and/or the closing brace outside of column 1. Move those braces to column 1. This allows various function analyzers like gnu complexity to work properly for these modified functions. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2018-04-041-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull time(r) updates from Thomas Gleixner: "A small set of updates for timers and timekeeping: - The most interesting change is the consolidation of clock MONOTONIC and clock BOOTTIME. Clock MONOTONIC behaves now exactly like clock BOOTTIME and does not longer ignore the time spent in suspend. A new clock MONOTONIC_ACTIVE is provived which behaves like clock MONOTONIC in kernels before this change. This allows applications to programmatically check for the clock MONOTONIC behaviour. As discussed in the review thread, this has the potential of breaking user space and we might have to revert this. Knock on wood that we can avoid that exercise. - Updates to the NTP mechanism to improve accuracy - A new kernel internal data structure to aid the ongoing Y2038 work. - Cleanups and simplifications of the clocksource code. - Make the alarmtimer code play nicely with debugobjects" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: alarmtimer: Init nanosleep alarm timer on stack y2038: Introduce struct __kernel_old_timeval tracing: Unify the "boot" and "mono" tracing clocks hrtimer: Unify MONOTONIC and BOOTTIME clock behavior posix-timers: Unify MONOTONIC and BOOTTIME clock behavior timekeeping: Remove boot time specific code Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock timekeeping/ntp: Determine the multiplier directly from NTP tick length timekeeping/ntp: Don't align NTP frequency adjustments to ticks clocksource: Use ATTRIBUTE_GROUPS clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributes clocksource: Don't walk the clocksource list for empty override
| * | tracing: Unify the "boot" and "mono" tracing clocksThomas Gleixner2018-03-131-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the "boot" and "mono" tracing clocks and document the new behaviour. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Easton <kevin@guarana.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20180301165150.489635255@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2018-04-031-11/+215
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Support offloading wireless authentication to userspace via NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari. 2) A lot of work on network namespace setup/teardown from Kirill Tkhai. Setup and cleanup of namespaces now all run asynchronously and thus performance is significantly increased. 3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon Streiff. 4) Support zerocopy on RDS sockets, from Sowmini Varadhan. 5) Use denser instruction encoding in x86 eBPF JIT, from Daniel Borkmann. 6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime Chevallier. 7) Support grafting of child qdiscs in mlxsw driver, from Nogah Frankel. 8) Add packet forwarding tests to selftests, from Ido Schimmel. 9) Deal with sub-optimal GSO packets better in BBR congestion control, from Eric Dumazet. 10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern. 11) Add path MTU tests to selftests, from Stefano Brivio. 12) Various bits of IPSEC offloading support for mlx5, from Aviad Yehezkel, Yossi Kuperman, and Saeed Mahameed. 13) Support RSS spreading on ntuple filters in SFC driver, from Edward Cree. 14) Lots of sockmap work from John Fastabend. Applications can use eBPF to filter sendmsg and sendpage operations. 15) In-kernel receive TLS support, from Dave Watson. 16) Add XDP support to ixgbevf, this is significant because it should allow optimized XDP usage in various cloud environments. From Tony Nguyen. 17) Add new Intel E800 series "ice" ethernet driver, from Anirudh Venkataramanan et al. 18) IP fragmentation match offload support in nfp driver, from Pieter Jansen van Vuuren. 19) Support XDP redirect in i40e driver, from Björn Töpel. 20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of tracepoints in their raw form, from Alexei Starovoitov. 21) Lots of striding RQ improvements to mlx5 driver with many performance improvements, from Tariq Toukan. 22) Use rhashtable for inet frag reassembly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits) net: mvneta: improve suspend/resume net: mvneta: split rxq/txq init and txq deinit into SW and HW parts ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() net: bgmac: Correctly annotate register space route: check sysctl_fib_multipath_use_neigh earlier than hash fix typo in command value in drivers/net/phy/mdio-bitbang. sky2: Increase D3 delay to sky2 stops working after suspend net/mlx5e: Set EQE based as default TX interrupt moderation mode ibmvnic: Disable irqs before exiting reset from closed state net: sched: do not emit messages while holding spinlock vlan: also check phy_driver ts_info for vlan's real device Bluetooth: Mark expected switch fall-throughs Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME Bluetooth: btrsi: remove unused including <linux/version.h> Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4 sh_eth: kill useless check in __sh_eth_get_regs() sh_eth: add sh_eth_cpu_data::no_xdfar flag ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data() ipv4: factorize sk_wmem_alloc updates done by __ip_append_data() ...
| * \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-04-013-8/+6
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bpf: Check attach type at prog load timeAndrey Ignatov2018-03-311-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | == The problem == There are use-cases when a program of some type can be attached to multiple attach points and those attach points must have different permissions to access context or to call helpers. E.g. context structure may have fields for both IPv4 and IPv6 but it doesn't make sense to read from / write to IPv6 field when attach point is somewhere in IPv4 stack. Same applies to BPF-helpers: it may make sense to call some helper from some attach point, but not from other for same prog type. == The solution == Introduce `expected_attach_type` field in in `struct bpf_attr` for `BPF_PROG_LOAD` command. If scenario described in "The problem" section is the case for some prog type, the field will be checked twice: 1) At load time prog type is checked to see if attach type for it must be known to validate program permissions correctly. Prog will be rejected with EINVAL if it's the case and `expected_attach_type` is not specified or has invalid value. 2) At attach time `attach_type` is compared with `expected_attach_type`, if prog type requires to have one, and, if they differ, attach will be rejected with EINVAL. The `expected_attach_type` is now available as part of `struct bpf_prog` in both `bpf_verifier_ops->is_valid_access()` and `bpf_verifier_ops->get_func_proto()` () and can be used to check context accesses and calls to helpers correspondingly. Initially the idea was discussed by Alexei Starovoitov <ast@fb.com> and Daniel Borkmann <daniel@iogearbox.net> here: https://marc.info/?l=linux-netdev&m=152107378717201&w=2 Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | bpf: introduce BPF_RAW_TRACEPOINTAlexei Starovoitov2018-03-281-0/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access kernel internal arguments of the tracepoints in their raw form. >From bpf program point of view the access to the arguments look like: struct bpf_raw_tracepoint_args { __u64 args[0]; }; int bpf_prog(struct bpf_raw_tracepoint_args *ctx) { // program can read args[N] where N depends on tracepoint // and statically verified at program load+attach time } kprobe+bpf infrastructure allows programs access function arguments. This feature allows programs access raw tracepoint arguments. Similar to proposed 'dynamic ftrace events' there are no abi guarantees to what the tracepoints arguments are and what their meaning is. The program needs to type cast args properly and use bpf_probe_read() helper to access struct fields when argument is a pointer. For every tracepoint __bpf_trace_##call function is prepared. In assembler it looks like: (gdb) disassemble __bpf_trace_xdp_exception Dump of assembler code for function __bpf_trace_xdp_exception: 0xffffffff81132080 <+0>: mov %ecx,%ecx 0xffffffff81132082 <+2>: jmpq 0xffffffff811231f0 <bpf_trace_run3> where TRACE_EVENT(xdp_exception, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, u32 act), The above assembler snippet is casting 32-bit 'act' field into 'u64' to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is. All of ~500 of __bpf_trace_*() functions are only 5-10 byte long and in total this approach adds 7k bytes to .text. This approach gives the lowest possible overhead while calling trace_xdp_exception() from kernel C code and transitioning into bpf land. Since tracepoint+bpf are used at speeds of 1M+ events per second this is valuable optimization. The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced that returns anon_inode FD of 'bpf-raw-tracepoint' object. The user space looks like: // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type prog_fd = bpf_prog_load(...); // receive anon_inode fd for given bpf_raw_tracepoint with prog attached raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd); Ctrl-C of tracing daemon or cmdline tool that uses this feature will automatically detach bpf program, unload it and unregister tracepoint probe. On the kernel side the __bpf_raw_tp_map section of pointers to tracepoint definition and to __bpf_trace_*() probe function is used to find a tracepoint with "xdp_exception" name and corresponding __bpf_trace_xdp_exception() probe function which are passed to tracepoint_probe_register() to connect probe with tracepoint. Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf tracepoint mechanisms. perf_event_open() can be used in parallel on the same tracepoint. Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted. Each with its own bpf program. The kernel will execute all tracepoint probes and all attached bpf programs. In the future bpf_raw_tracepoints can be extended with query/introspection logic. __bpf_raw_tp_map section logic was contributed by Steven Rostedt Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-03-231-28/+40
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | bpf: add support to read sample address in bpf programTeng Qin2018-03-081-4/+16
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds new field "addr" to bpf_perf_event_data which could be read and used by bpf programs attached to perf events. The value of the field is copied from bpf_perf_event_data_kern.addr and contains the address value recorded by specifying sample_type with PERF_SAMPLE_ADDR when calling perf_event_open. Signed-off-by: Teng Qin <qinteng@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | | Merge branch 'perf/urgent' into perf/coreIngo Molnar2018-03-294-36/+46
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | Conflicts: kernel/events/hw_breakpoint.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Merge tag 'trace-v4.16-rc4' of ↵Linus Torvalds2018-03-233-8/+6
| |\ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull kprobe fixes from Steven Rostedt: "The documentation for kprobe events says that symbol offets can take both a + and - sign to get to befor and after the symbol address. But in actuality, the code does not support the minus. This fixes that issue, and adds a few more selftests to kprobe events" * tag 'trace-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests: ftrace: Add a testcase for probepoint selftests: ftrace: Add a testcase for string type with kprobe_event selftests: ftrace: Add probe event argument syntax testcase tracing: probeevent: Fix to support minus offset from symbol
| | * | tracing: probeevent: Fix to support minus offset from symbolMasami Hiramatsu2018-03-233-8/+6
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In Documentation/trace/kprobetrace.txt, it says @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) However, the parser doesn't parse minus offset correctly, since commit 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") drops minus ("-") offset support for kprobe probe address usage. This fixes the traceprobe_split_symbol_offset() to parse minus offset again with checking the offset range, and add a minus offset check in kprobe probe address usage. Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * / trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programsYonghong Song2018-03-201-28/+40
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value") added helper bpf_perf_prog_read_value so that perf_event type program can read event counter and enabled/running time. This commit, however, introduced a bug which allows this helper for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value needs to access perf_event through its bpf_perf_event_data_kern type context, which is not available for tracepoint type program. This patch fixed the issue by separating bpf_func_proto between tracepoint and perf_event type programs and removed bpf_perf_prog_read_value from tracepoint func prototype. Fixes: 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | Merge tag 'v4.16-rc4' into perf/core, to pick up fixesIngo Molnar2018-03-061-0/+2
|\| | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * bpf: fix bpf_prog_array_copy_to_user warning from perf event prog queryDaniel Borkmann2018-02-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller tried to perform a prog query in perf_event_query_prog_array() where struct perf_event_query_bpf had an ids_len of 1,073,741,353 and thus causing a warning due to failed kcalloc() allocation out of the bpf_prog_array_copy_to_user() helper. Given we cannot attach more than 64 programs to a perf event, there's no point in allowing huge ids_len. Therefore, allow a buffer that would fix the maximum number of ids and also add a __GFP_NOWARN to the temporary ids buffer. Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp") Fixes: 0911287ce32b ("bpf: fix bpf_prog_array_copy_to_user() issues") Reported-by: syzbot+cab5816b0edbabf598b3@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar2018-02-1711-37/+155
|\| | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds2018-02-112-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Merge tag 'trace-v4.16-rc1' of ↵Linus Torvalds2018-02-092-6/+4
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Al Viro discovered some breakage with the parsing of the set_ftrace_filter as well as the removing of function probes. This fixes the code with Al's suggestions. I also added a few selftests to test the broken cases such that they wont happen again" * tag 'trace-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests/ftrace: Add more tests for removing of function probes selftests/ftrace: Add some missing glob checks selftests/ftrace: Have reset_ftrace_filter handle multiple instances selftests/ftrace: Have reset_ftrace_filter handle modules tracing: Fix parsing of globs with a wildcard at the beginning ftrace: Remove incorrect setting of glob search field
| | * tracing: Fix parsing of globs with a wildcard at the beginningSteven Rostedt (VMware)2018-02-081-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Al Viro reported: For substring - sure, but what about something like "*a*b" and "a*b"? AFAICS, filter_parse_regex() ends up with identical results in both cases - MATCH_GLOB and *search = "a*b". And no way for the caller to tell one from another. Testing this with the following: # cd /sys/kernel/tracing # echo '*raw*lock' > set_ftrace_filter bash: echo: write error: Invalid argument With this patch: # echo '*raw*lock' > set_ftrace_filter # cat set_ftrace_filter _raw_read_trylock _raw_write_trylock _raw_read_unlock _raw_spin_unlock _raw_write_unlock _raw_spin_trylock _raw_spin_lock _raw_write_lock _raw_read_lock Al recommended not setting the search buffer to skip the first '*' unless we know we are not using MATCH_GLOB. This implements his suggested logic. Link: http://lkml.kernel.org/r/20180127170748.GF13338@ZenIV.linux.org.uk Cc: stable@vger.kernel.org Fixes: 60f1d5e3bac44 ("ftrace: Support full glob matching") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Suggsted-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| | * ftrace: Remove incorrect setting of glob search fieldSteven Rostedt (VMware)2018-02-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __unregister_ftrace_function_probe() will incorrectly parse the glob filter because it resets the search variable that was setup by filter_parse_regex(). Al Viro reported this: After that call of filter_parse_regex() we could have func_g.search not equal to glob only if glob started with '!' or '*'. In the former case we would've buggered off with -EINVAL (not = 1). In the latter we would've set func_g.search equal to glob + 1, calculated the length of that thing in func_g.len and proceeded to reset func_g.search back to glob. Suppose the glob is e.g. *foo*. We end up with func_g.type = MATCH_MIDDLE_ONLY; func_g.len = 3; func_g.search = "*foo"; Feeding that to ftrace_match_record() will not do anything sane - we will be looking for names containing "*foo" (->len is ignored for that one). Link: http://lkml.kernel.org/r/20180127031706.GE13338@ZenIV.linux.org.uk Cc: stable@vger.kernel.org Fixes: 3ba009297149f ("ftrace: Introduce ftrace_glob structure") Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | Merge tag 'trace-v4.16' of ↵Linus Torvalds2018-02-015-14/+11
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "There's not much changes for the tracing system this release. Mostly small clean ups and fixes. The biggest change is to how bprintf works. bprintf is used by trace_printk() to just save the format and args of a printf call, and the formatting is done when the trace buffer is read. This is done to keep the formatting out of the fast path (this was recommended by you). The issue is when arguments are de-referenced. If a pointer is saved, and the format has something like "%*pbl", when the buffer is read, it will de-reference the argument then. The problem is if the data no longer exists. This can cause the kernel to oops. The fix for this was to make these de-reference pointes do the formatting at the time it is called (the fast path), as this guarantees that the data exists (and doesn't change later)" * tag 'trace-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: vsprintf: Do not have bprintf dereference pointers ftrace: Mark function tracer test functions noinline/noclone trace_uprobe: Display correct offset in uprobe_events tracing: Make sure the parsed string always terminates with '\0' tracing: Clear parser->idx if only spaces are read tracing: Detect the string nul character when parsing user input string
| | * ftrace: Mark function tracer test functions noinline/nocloneAndi Kleen2018-01-231-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ftrace function tracer self tests calls some functions to verify the get traced. This relies on them not being inlined. Previously this was ensured by putting them into another file, but with LTO the compiler can inline across files, which makes the tests fail. Mark these functions as noinline and noclone. Link: http://lkml.kernel.org/r/20171221233732.31896-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| | * trace_uprobe: Display correct offset in uprobe_eventsRavi Bangoria2018-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently, how the pointers being printed with %p has been changed by commit ad67b74d2469 ("printk: hash addresses printed with %p"). This is causing a regression while showing offset in the uprobe_events file. Instead of %p, use %px to display offset. Before patch: # perf probe -vv -x /tmp/a.out main Opening /sys/kernel/debug/tracing//uprobe_events write=1 Writing event: p:probe_a/main /tmp/a.out:0x58c # cat /sys/kernel/debug/tracing/uprobe_events p:probe_a/main /tmp/a.out:0x0000000049a0f352 After patch: # cat /sys/kernel/debug/tracing/uprobe_events p:probe_a/main /tmp/a.out:0x000000000000058c Link: http://lkml.kernel.org/r/20180106054246.15375-1-ravi.bangoria@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| | * tracing: Make sure the parsed string always terminates with '\0'Changbin Du2018-01-233-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Always mark the parsed string with a terminated nul '\0' character. This removes the need for the users to have to append the '\0' before using the parsed string. Link: http://lkml.kernel.org/r/1516093350-12045-4-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| | * tracing: Clear parser->idx if only spaces are readChangbin Du2018-01-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If only spaces were read while parsing the next string, then parser->idx should be cleared in order to make trace_parser_loaded() return false. Link: http://lkml.kernel.org/r/1516093350-12045-3-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| | * tracing: Detect the string nul character when parsing user input stringChangbin Du2018-01-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User space can pass in a C nul character '\0' along with its input. The function trace_get_user() will try to process it as a normal character, and that will fail to parse. open("/sys/kernel/debug/tracing//set_ftrace_pid", O_WRONLY|O_TRUNC) = 3 write(3, " \0", 2) = -1 EINVAL (Invalid argument) while parse can handle spaces, so below works. $ echo "" > set_ftrace_pid $ echo " " > set_ftrace_pid $ echo -n " " > set_ftrace_pid Have the parser stop on '\0' and cease any further parsing. Only process the characters up to the nul '\0' character and do not process it. Link: http://lkml.kernel.org/r/1516093350-12045-2-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2018-01-314-9/+132
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Significantly shrink the core networking routing structures. Result of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf 2) Add netdevsim driver for testing various offloads, from Jakub Kicinski. 3) Support cross-chip FDB operations in DSA, from Vivien Didelot. 4) Add a 2nd listener hash table for TCP, similar to what was done for UDP. From Martin KaFai Lau. 5) Add eBPF based queue selection to tun, from Jason Wang. 6) Lockless qdisc support, from John Fastabend. 7) SCTP stream interleave support, from Xin Long. 8) Smoother TCP receive autotuning, from Eric Dumazet. 9) Lots of erspan tunneling enhancements, from William Tu. 10) Add true function call support to BPF, from Alexei Starovoitov. 11) Add explicit support for GRO HW offloading, from Michael Chan. 12) Support extack generation in more netlink subsystems. From Alexander Aring, Quentin Monnet, and Jakub Kicinski. 13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From Russell King. 14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso. 15) Many improvements and simplifications to the NFP driver bpf JIT, from Jakub Kicinski. 16) Support for ipv6 non-equal cost multipath routing, from Ido Schimmel. 17) Add resource abstration to devlink, from Arkadi Sharshevsky. 18) Packet scheduler classifier shared filter block support, from Jiri Pirko. 19) Avoid locking in act_csum, from Davide Caratti. 20) devinet_ioctl() simplifications from Al viro. 21) More TCP bpf improvements from Lawrence Brakmo. 22) Add support for onlink ipv6 route flag, similar to ipv4, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits) tls: Add support for encryption using async offload accelerator ip6mr: fix stale iterator net/sched: kconfig: Remove blank help texts openvswitch: meter: Use 64-bit arithmetic instead of 32-bit tcp_nv: fix potential integer overflow in tcpnv_acked r8169: fix RTL8168EP take too long to complete driver initialization. qmi_wwan: Add support for Quectel EP06 rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK ipmr: Fix ptrdiff_t print formatting ibmvnic: Wait for device response when changing MAC qlcnic: fix deadlock bug tcp: release sk_frag.page in tcp_disconnect ipv4: Get the address of interface correctly. net_sched: gen_estimator: fix lockdep splat net: macb: Handle HRESP error net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring ipv6: addrconf: break critical section in addrconf_verify_rtnl() ipv6: change route cache aging logic i40e/i40evf: Update DESC_NEEDED value to reflect larger value bnxt_en: cleanup DIM work on device shutdown ...
| | * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-01-244-41/+84
| | |\| | | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2018-01-201-1/+1
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-01-19 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) bpf array map HW offload, from Jakub. 2) support for bpf_get_next_key() for LPM map, from Yonghong. 3) test_verifier now runs loaded programs, from Alexei. 4) xdp cpumap monitoring, from Jesper. 5) variety of tests, cleanups and small x64 JIT optimization, from Daniel. 6) user space can now retrieve HW JITed program, from Jiong. Note there is a minor conflict between Russell's arm32 JIT fixes and removal of bpf_jit_enable variable by Daniel which should be resolved by keeping Russell's comment and removing that variable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | bpf: change fake_ip for bpf_trace_printk helperYonghong Song2018-01-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, for bpf_trace_printk helper, fake ip address 0x1 is used with comments saying that fake ip will not be printed. This is indeed true for 4.12 and earlier version, but for 4.13 and later version, the ip address will be printed if it cannot be resolved with kallsym. Running samples/bpf/tracex5 program and you will have the following in the debugfs trace_pipe output: ... <...>-1819 [003] .... 443.497877: 0x00000001: mmap <...>-1819 [003] .... 443.498289: 0x00000001: syscall=102 (one of get/set uid/pid/gid) ... The kernel commit changed this behavior is: commit feaf1283d11794b9d518fcfd54b6bf8bee1f0b4b Author: Steven Rostedt (VMware) <rostedt@goodmis.org> Date: Thu Jun 22 17:04:55 2017 -0400 tracing: Show address when function names are not found ... This patch changed the comment and also altered the fake ip address to 0x0 as users may think 0x1 has some special meaning while it doesn't. The new output: ... <...>-1799 [002] .... 25.953576: 0: mmap <...>-1799 [002] .... 25.953865: 0: read(fd=0, buf=00000000053936b5, size=512) ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-01-192-3/+16
| | |\ \ \ | | | |/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The BPF verifier conflict was some minor contextual issue. The TUN conflict was less trivial. Cong Wang fixed a memory leak of tfile->tx_array in 'net'. This is an skb_array. But meanwhile in net-next tun changed tfile->tx_arry into tfile->tx_ring which is a ptr_ring. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-01-172-18/+46
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overlapping changes all over. The mini-qdisc bits were a little bit tricky, however. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | error-injection: Separate error-injection from kprobeMasami Hiramatsu2018-01-123-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since error-injection framework is not limited to be used by kprobes, nor bpf. Other kernel subsystems can use it freely for checking safeness of error-injection, e.g. livepatch, ftrace etc. So this separate error-injection framework from kprobes. Some differences has been made: - "kprobe" word is removed from any APIs/structures. - BPF_ALLOW_ERROR_INJECTION() is renamed to ALLOW_ERROR_INJECTION() since it is not limited for BPF too. - CONFIG_FUNCTION_ERROR_INJECTION is the config item of this feature. It is automatically enabled if the arch supports error injection feature for kprobe or ftrace etc. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | tracing/kprobe: bpf: Compare instruction pointer with original oneMasami Hiramatsu2018-01-122-15/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compare instruction pointer with original one on the stack instead using per-cpu bpf_kprobe_override flag. This patch also consolidates reset_current_kprobe() and preempt_enable_no_resched() blocks. Those can be done in one place. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | tracing/kprobe: bpf: Check error injectable event is on function entryMasami Hiramatsu2018-01-124-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check whether error injectable event is on function entry or not. Currently it checks the event is ftrace-based kprobes or not, but that is wrong. It should check if the event is on the entry of target function. Since error injection will override a function to just return with modified return value, that operation must be done before the target function starts making stackframe. As a side effect, bpf error injection is no need to depend on function-tracer. It can work with sw-breakpoint based kprobe events too. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-12-292-10/+15
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net/ipv6/ip6_gre.c is a case of parallel adds. include/trace/events/tcp.h is a little bit more tricky. The removal of in-trace-macro ifdefs in 'net' paralleled with moving show_tcp_state_name and friends over to include/trace/events/sock.h in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * \ \ \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2017-12-184-8/+137
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf-next 2017-12-18 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Allow arbitrary function calls from one BPF function to another BPF function. As of today when writing BPF programs, __always_inline had to be used in the BPF C programs for all functions, unnecessarily causing LLVM to inflate code size. Handle this more naturally with support for BPF to BPF calls such that this __always_inline restriction can be overcome. As a result, it allows for better optimized code and finally enables to introduce core BPF libraries in the future that can be reused out of different projects. x86 and arm64 JIT support was added as well, from Alexei. 2) Add infrastructure for tagging functions as error injectable and allow for BPF to return arbitrary error values when BPF is attached via kprobes on those. This way of injecting errors generically eases testing and debugging without having to recompile or restart the kernel. Tags for opting-in for this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef. 3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper call for XDP programs. First part of this work adds handling of BPF capabilities included in the firmware, and the later patches add support to the nfp verifier part and JIT as well as some small optimizations, from Jakub. 4) The bpftool now also gets support for basic cgroup BPF operations such as attaching, detaching and listing current BPF programs. As a requirement for the attach part, bpftool can now also load object files through 'bpftool prog load'. This reuses libbpf which we have in the kernel tree as well. bpftool-cgroup man page is added along with it, from Roman. 5) Back then commit e87c6bc3852b ("bpf: permit multiple bpf attachments for a single perf event") added support for attaching multiple BPF programs to a single perf event. Given they are configured through perf's ioctl() interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF command in this work in order to return an array of one or multiple BPF prog ids that are currently attached, from Yonghong. 6) Various minor fixes and cleanups to the bpftool's Makefile as well as a new 'uninstall' and 'doc-uninstall' target for removing bpftool itself or prior installed documentation related to it, from Quentin. 7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is required for the test_dev_cgroup test case to run, from Naresh. 8) Fix reporting of XDP prog_flags for nfp driver, from Jakub. 9) Fix libbpf's exit code from the Makefile when libelf was not found in the system, also from Jakub. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | | trace: reenable preemption if we modify the ipJosef Bacik2017-12-171-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Things got moved around between the original bpf_override_return patches and the final version, and now the ftrace kprobe dispatcher assumes if you modified the ip that you also enabled preemption. Make a comment of this and enable preemption, this fixes the lockdep splat that happened when using this feature. Fixes: 9802d86585db ("bpf: add a bpf_override_function helper") Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | | * | | | | bpf/tracing: fix kernel/events/core.c compilation errorYonghong Song2017-12-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp") introduced a perf ioctl command to query prog array attached to the same perf tracepoint. The commit introduced a compilation error under certain config conditions, e.g., (1). CONFIG_BPF_SYSCALL is not defined, or (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS nor CONFIG_KPROBE_EVENTS is defined. Error message: kernel/events/core.o: In function `perf_ioctl': core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array' This patch fixed this error by guarding the real definition under CONFIG_BPF_EVENTS and provided static inline dummy function if CONFIG_BPF_EVENTS was not defined. It renamed the function from bpf_event_query_prog_array to perf_event_query_prog_array and moved the definition from linux/bpf.h to linux/trace_events.h so the definition is in proximity to other prog_array related functions. Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | | * | | | | bpf: add a bpf_override_function helperJosef Bacik2017-12-124-7/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Error injection is sloppy and very ad-hoc. BPF could fill this niche perfectly with it's kprobe functionality. We could make sure errors are only triggered in specific call chains that we care about with very specific situations. Accomplish this with the bpf_override_funciton helper. This will modify the probe'd callers return value to the specified value and set the PC to an override function that simply returns, bypassing the originally probed function. This gives us a nice clean way to implement systematic error injection for all of our code paths. Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * | | | | bpf/tracing: allow user space to query prog array on the same tpYonghong Song2017-12-121-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e87c6bc3852b ("bpf: permit multiple bpf attachments for a single perf event") added support to attach multiple bpf programs to a single perf event. Although this provides flexibility, users may want to know what other bpf programs attached to the same tp interface. Besides getting visibility for the underlying bpf system, such information may also help consolidate multiple bpf programs, understand potential performance issues due to a large array, and debug (e.g., one bpf program which overwrites return code may impact subsequent program results). Commit 2541517c32be ("tracing, perf: Implement BPF programs attached to kprobes") utilized the existing perf ioctl interface and added the command PERF_EVENT_IOC_SET_BPF to attach a bpf program to a tracepoint. This patch adds a new ioctl command, given a perf event fd, to query the bpf program array attached to the same perf tracepoint event. The new uapi ioctl command: PERF_EVENT_IOC_QUERY_BPF The new uapi/linux/perf_event.h structure: struct perf_event_query_bpf { __u32 ids_len; __u32 prog_cnt; __u32 ids[0]; }; User space provides buffer "ids" for kernel to copy to. When returning from the kernel, the number of available programs in the array is set in "prog_cnt". The usage: struct perf_event_query_bpf *query = malloc(sizeof(*query) + sizeof(u32) * ids_len); query.ids_len = ids_len; err = ioctl(pmu_efd, PERF_EVENT_IOC_QUERY_BPF, query); if (err == 0) { /* query.prog_cnt is the number of available progs, * number of progs in ids: (ids_len == 0) ? 0 : query.prog_cnt */ } else if (errno == ENOSPC) { /* query.ids_len number of progs copied, * query.prog_cnt is the number of available progs */ } else { /* other errors */ } Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | | | | Merge branch 'misc.poll' of ↵Linus Torvalds2018-01-302-4/+4
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull poll annotations from Al Viro: "This introduces a __bitwise type for POLL### bitmap, and propagates the annotations through the tree. Most of that stuff is as simple as 'make ->poll() instances return __poll_t and do the same to local variables used to hold the future return value'. Some of the obvious brainos found in process are fixed (e.g. POLLIN misspelled as POLL_IN). At that point the amount of sparse warnings is low and most of them are for genuine bugs - e.g. ->poll() instance deciding to return -EINVAL instead of a bitmap. I hadn't touched those in this series - it's large enough as it is. Another problem it has caught was eventpoll() ABI mess; select.c and eventpoll.c assumed that corresponding POLL### and EPOLL### were equal. That's true for some, but not all of them - EPOLL### are arch-independent, but POLL### are not. The last commit in this series separates userland POLL### values from the (now arch-independent) kernel-side ones, converting between them in the few places where they are copied to/from userland. AFAICS, this is the least disruptive fix preserving poll(2) ABI and making epoll() work on all architectures. As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and it will trigger only on what would've triggered EPOLLWRBAND on other architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered at all on sparc. With this patch they should work consistently on all architectures" * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) make kernel-side POLL... arch-independent eventpoll: no need to mask the result of epi_item_poll() again eventpoll: constify struct epoll_event pointers debugging printk in sg_poll() uses %x to print POLL... bitmap annotate poll(2) guts 9p: untangle ->poll() mess ->si_band gets POLL... bitmap stored into a user-visible long field ring_buffer_poll_wait() return value used as return value of ->poll() the rest of drivers/*: annotate ->poll() instances media: annotate ->poll() instances fs: annotate ->poll() instances ipc, kernel, mm: annotate ->poll() instances net: annotate ->poll() instances apparmor: annotate ->poll() instances tomoyo: annotate ->poll() instances sound: annotate ->poll() instances acpi: annotate ->poll() instances crypto: annotate ->poll() instances block: annotate ->poll() instances x86: annotate ->poll() instances ...
| | * | | | | | | ring_buffer_poll_wait() return value used as return value of ->poll()Al Viro2017-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | ipc, kernel, mm: annotate ->poll() instancesAl Viro2017-11-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | | | perf/core: Implement the 'perf_uprobe' PMUSong Liu2018-02-063-8/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds perf_uprobe support with similar pattern as previous patch (for kprobe). Two functions, create_local_trace_uprobe() and destroy_local_trace_uprobe(), are created so a uprobe can be created and attached to the file descriptor created by perf_event_open(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Yonghong Song <yhs@fb.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Cc: <daniel@iogearbox.net> Cc: <davem@davemloft.net> Cc: <kernel-team@fb.com> Cc: <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171206224518.3598254-7-songliubraving@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | | | perf/core: Implement the 'perf_kprobe' PMUSong Liu2018-02-063-8/+139
|/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new PMU type, perf_kprobe is added. Based on attr from perf_event_open(), perf_kprobe creates a kprobe (or kretprobe) for the perf_event. This kprobe is private to this perf_event, and thus not added to global lists, and not available in tracefs. Two functions, create_local_trace_kprobe() and destroy_local_trace_kprobe() are added to created and destroy these local trace_kprobe. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Yonghong Song <yhs@fb.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Cc: <daniel@iogearbox.net> Cc: <davem@davemloft.net> Cc: <kernel-team@fb.com> Cc: <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171206224518.3598254-6-songliubraving@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2018-01-302-12/+1
|\ \ \ \ \ \ \ \ | |_|_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU changes in this cycle were: - Updates to use cond_resched() instead of cond_resched_rcu_qs() where feasible (currently everywhere except in kernel/rcu and in kernel/torture.c). Also a couple of fixes to avoid sending IPIs to offline CPUs. - Updates to simplify RCU's dyntick-idle handling. - Updates to remove almost all uses of smp_read_barrier_depends() and read_barrier_depends(). - Torture-test updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) torture: Save a line in stutter_wait(): while -> for torture: Eliminate torture_runnable and perf_runnable torture: Make stutter less vulnerable to compilers and races locking/locktorture: Fix num reader/writer corner cases locking/locktorture: Fix rwsem reader_delay torture: Place all torture-test modules in one MAINTAINERS group rcutorture/kvm-build.sh: Skip build directory check rcutorture: Simplify functions.sh include path rcutorture: Simplify logging rcutorture/kvm-recheck-*: Improve result directory readability check rcutorture/kvm.sh: Support execution from any directory rcutorture/kvm.sh: Use consistent help text for --qemu-args rcutorture/kvm.sh: Remove unused variable, `alldone` rcutorture: Remove unused script, config2frag.sh rcutorture/configinit: Fix build directory error message rcutorture: Preempt RCU-preempt readers more vigorously torture: Reduce #ifdefs for preempt_schedule() rcu: Remove have_rcu_nocb_mask from tree_plugin.h rcu: Add comment giving debug strategy for double call_rcu() tracing, rcu: Hide trace event rcu_nocb_wake when not used ...
| * | | | | | | Merge branch 'for-mingo' of ↵Ingo Molnar2018-01-032-12/+1
| |\ \ \ \ \ \ \ | | |_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Updates to use cond_resched() instead of cond_resched_rcu_qs() where feasible (currently everywhere except in kernel/rcu and in kernel/torture.c). Also a couple of fixes to avoid sending IPIs to offline CPUs. - Updates to simplify RCU's dyntick-idle handling. - Updates to remove almost all uses of smp_read_barrier_depends() and read_barrier_depends(). - Miscellaneous fixes. - Torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>