summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* perf bench: Add event synthesis benchmarkIan Rogers2020-04-165-2/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Event synthesis may occur at the start or end (tail) of a perf command. In system-wide mode it can scan every process in /proc, which may add seconds of latency before event recording. Add a new benchmark that times how long event synthesis takes with and without data synthesis. An example execution looks like: $ perf bench internals synthesize # Running 'internals/synthesize' benchmark: Average synthesis took: 168.253800 usec Average data synthesis took: 208.104700 usec Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200402154357.107873-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf script: Simplify auxiliary event printing functionsAdrian Hunter2020-04-161-238/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | This simplifies the print functions for the following perf script options: --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events Example: # perf record --switch-events -a -e cycles -c 10000 sleep 1 Before: # perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-before.txt After: # perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-after.txt # diff -s out-before.txt out-after.txt Files out-before.txt and out-after.tx are identical Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200402141548.21283-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* doc/admin-guide: update kernel.rst with CAP_PERFMON informationAlexey Budankov2020-04-161-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Update the kernel.rst documentation file with the information related to usage of CAP_PERFMON capability to secure performance monitoring and observability operations in system. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: James Morris <jmorris@namei.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/84c32383-14a2-fa35-16b6-f9e59bd37240@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* doc/admin-guide: Update perf-security.rst with CAP_PERFMON informationAlexey Budankov2020-04-161-25/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update perf-security.rst documentation file with the information related to usage of CAP_PERFMON capability to secure performance monitoring and observability operations in system. Committer notes: While testing 'perf top' under cap_perfmon I noticed that it needs some more capability and Alexey pointed out cap_ipc_lock, as needed by this kernel chunk: kernel/events/core.c: 6101 if ((locked > lock_limit) && perf_is_paranoid() && !capable(CAP_IPC_LOCK)) { ret = -EPERM; goto unlock; } So I added it to the documentation, and also mentioned that if the libcap version doesn't yet supports 'cap_perfmon', its numeric value can be used instead, i.e. if: # setcap "cap_perfmon,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf Fails, try: # setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf I also added a paragraph stating that using an unpatched libcap will fail the check for CAP_PERFMON, as it checks the cap number against a maximum to see if it is valid, which makes it use as the default the 'cycles:u' event, even tho a cap_perfmon capable perf binary can get kernel samples, to workaround that just use, e.g.: # perf top -e cycles # perf record -e cycles And it will sample kernel and user modes. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: James Morris <jmorris@namei.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/17278551-9399-9ebe-d665-8827016a217d@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* drivers/oprofile: Open access for CAP_PERFMON privileged processAlexey Budankov2020-04-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: James Morris <jamorris@linux.microsoft.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/691f1096-b15f-9b12-50a0-c2b93918149e@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* drivers/perf: Open access for CAP_PERFMON privileged processAlexey Budankov2020-04-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/4ec1d6f7-548c-8d1c-f84a-cebeb9674e4e@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* parisc/perf: open access for CAP_PERFMON privileged processAlexey Budankov2020-04-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/8cc98809-d35b-de0f-de02-4cf554f3cf62@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* powerpc/perf: open access for CAP_PERFMON privileged processAlexey Budankov2020-04-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/ac98cd9f-b59e-673c-c70d-180b3e7695d2@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* trace/bpf_trace: Open access for CAP_PERFMON privileged processAlexey Budankov2020-04-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open access to bpf_trace monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to bpf_trace monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure bpf_trace monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/c0a0ae47-8b6e-ff3e-416b-3cd1faaf71c0@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* drm/i915/perf: Open access for CAP_PERFMON privileged processAlexey Budankov2020-04-161-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open access to i915_perf monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to i915_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure i915_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/e3e3292f-f765-ea98-e59c-fbe2db93fd34@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Support CAP_PERFMON capabilityAlexey Budankov2020-04-165-8/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend error messages to mention CAP_PERFMON capability as an option to substitute CAP_SYS_ADMIN capability for secure system performance monitoring and observability operations. Make perf_event_paranoid_check() and __cmd_ftrace() to be aware of CAP_PERFMON capability. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Committer testing: Using a libcap with this patch: diff --git a/libcap/include/uapi/linux/capability.h b/libcap/include/uapi/linux/capability.h index 78b2fd4c8a95..89b5b0279b60 100644 --- a/libcap/include/uapi/linux/capability.h +++ b/libcap/include/uapi/linux/capability.h @@ -366,8 +366,9 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_READ 37 +#define CAP_PERFMON 38 -#define CAP_LAST_CAP CAP_AUDIT_READ +#define CAP_LAST_CAP CAP_PERFMON #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) Note that using '38' in place of 'cap_perfmon' works to some degree with an old libcap, its only when cap_get_flag() is called that libcap performs an error check based on the maximum value known for capabilities that it will fail. This makes determining the default of perf_event_attr.exclude_kernel to fail, as it can't determine if CAP_PERFMON is in place. Using 'perf top -e cycles' avoids the default check and sets perf_event_attr.exclude_kernel to 1. As root, with a libcap supporting CAP_PERFMON: # groupadd perf_users # adduser perf -g perf_users # mkdir ~perf/bin # cp ~acme/bin/perf ~perf/bin/ # chgrp perf_users ~perf/bin/perf # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" ~perf/bin/perf # getcap ~perf/bin/perf /home/perf/bin/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep # ls -la ~perf/bin/perf -rwxr-xr-x. 1 root perf_users 16968552 Apr 9 13:10 /home/perf/bin/perf As the 'perf' user in the 'perf_users' group: $ perf top -a --stdio Error: Failed to mmap with 1 (Operation not permitted) $ Either add the cap_ipc_lock capability to the perf binary or reduce the ring buffer size to some smaller value: $ perf top -m10 -a --stdio rounding mmap pages size to 64K (16 pages) Error: Failed to mmap with 1 (Operation not permitted) $ perf top -m4 -a --stdio Error: Failed to mmap with 1 (Operation not permitted) $ perf top -m2 -a --stdio PerfTop: 762 irqs/sec kernel:49.7% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 4 CPUs) ------------------------------------------------------------------------------------------------------ 9.83% perf [.] __symbols__insert 8.58% perf [.] rb_next 5.91% [kernel] [k] module_get_kallsym 5.66% [kernel] [k] kallsyms_expand_symbol.constprop.0 3.98% libc-2.29.so [.] __GI_____strtoull_l_internal 3.66% perf [.] rb_insert_color 2.34% [kernel] [k] vsnprintf 2.30% [kernel] [k] string_nocheck 2.16% libc-2.29.so [.] _IO_getdelim 2.15% [kernel] [k] number 2.13% [kernel] [k] format_decode 1.58% libc-2.29.so [.] _IO_feof 1.52% libc-2.29.so [.] __strcmp_avx2 1.50% perf [.] rb_set_parent_color 1.47% libc-2.29.so [.] __libc_calloc 1.24% [kernel] [k] do_syscall_64 1.17% [kernel] [k] __x86_indirect_thunk_rax $ perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.552 MB perf.data (74 samples) ] $ perf evlist cycles $ perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 $ perf report | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 74 of event 'cycles' # Event count (approx.): 15694834 # # Overhead Command Shared Object Symbol # ........ ............... .......................... ...................................... # 19.62% perf [kernel.vmlinux] [k] strnlen_user 13.88% swapper [kernel.vmlinux] [k] intel_idle 13.83% ksoftirqd/0 [kernel.vmlinux] [k] pfifo_fast_dequeue 13.51% swapper [kernel.vmlinux] [k] kmem_cache_free 6.31% gnome-shell [kernel.vmlinux] [k] kmem_cache_free 5.66% kworker/u8:3+ix [kernel.vmlinux] [k] delay_tsc 4.42% perf [kernel.vmlinux] [k] __set_cpus_allowed_ptr 3.45% kworker/2:1-eve [kernel.vmlinux] [k] shmem_truncate_range 2.29% gnome-shell libgobject-2.0.so.0.6000.7 [.] g_closure_ref $ Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/a66d5648-2b8e-577e-e1f2-1d56c017ab5e@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf/core: open access to probes for CAP_PERFMON privileged processAlexey Budankov2020-04-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open access to monitoring via kprobes and uprobes and eBPF tracing for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses ftrace to define new kprobe events, and those events are treated as tracepoint events. eBPF defines new probes via perf_event_open interface and then the probes are used in eBPF tracing. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Cc: linux-man@vger.kernel.org Link: http://lore.kernel.org/lkml/3c129d9a-ba8a-3483-ecc5-ad6c8e7c203f@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf/core: Open access to the core for CAP_PERFMON privileged processAlexey Budankov2020-04-162-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open access to monitoring of kernel code, CPUs, tracepoints and namespaces data for a CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons the access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linux-man@vger.kernel.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/471acaef-bb8a-5ce2-923f-90606b78eef9@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* capabilities: Introduce CAP_PERFMON to kernel and user spaceAlexey Budankov2020-04-163-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce the CAP_PERFMON capability designed to secure system performance monitoring and observability operations so that CAP_PERFMON can assist CAP_SYS_ADMIN capability in its governing role for performance monitoring and observability subsystems. CAP_PERFMON hardens system security and integrity during performance monitoring and observability operations by decreasing attack surface that is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access to system performance monitoring and observability operations under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes the operation more secure. Thus, CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e: 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) CAP_PERFMON meets the demand to secure system performance monitoring and observability operations for adoption in security sensitive, restricted, multiuser production environments (e.g. HPC clusters, cloud and virtual compute environments), where root or CAP_SYS_ADMIN credentials are not available to mass users of a system, and securely unblocks applicability and scalability of system performance monitoring and observability operations beyond root and CAP_SYS_ADMIN use cases. CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance monitoring and observability operations and balances amount of CAP_SYS_ADMIN credentials following the recommendations in the capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel developers, below." For backward compatibility reasons access to system performance monitoring and observability subsystems of the kernel remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability usage for secure system performance monitoring and observability operations is discouraged with respect to the designed CAP_PERFMON capability. Although the software running under CAP_PERFMON can not ensure avoidance of related hardware issues, the software can still mitigate these issues following the official hardware issues mitigation procedure [2]. The bugs in the software itself can be fixed following the standard kernel development process [3] to maintain and harden security of system performance monitoring and observability operations. [1] http://man7.org/linux/man-pages/man7/capabilities.7.html [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/5590d543-82c6-490a-6544-08e6a5517db0@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Add basic support for bpf_imageJiri Olsa2020-04-165-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the DSO_BINARY_TYPE__BPF_IMAGE dso binary type to recognize BPF images that carry trampoline or dispatcher. Upcoming patches will add support to read the image data, store it within the BPF feature in perf.data and display it for annotation purposes. Currently we only display following message: # ./perf annotate bpf_trampoline_24456 --stdio Percent | Source code & Disassembly of . for cycles (504 ... --------------------------------------------------------------- ... : to be implemented Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-16-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf machine: Set ksymbol dso as loaded on arrivalJiri Olsa2020-04-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | There's no special load action for ksymbol data on map__load/dso__load action, where the kernel is getting loaded. It only gets confused with kernel kallsyms/vmlinux load for bpf object, which fails and could mess up with the map. Disabling any further load of the map for ksymbol related dso/map. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-15-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Synthesize bpf_trampoline/dispatcher ksymbol eventJiri Olsa2020-04-161-0/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Synthesize bpf images (trampolines/dispatchers) on start, as ksymbol events from /proc/kallsyms. Having this perf can recognize samples from those images and perf report and top shows them correctly. The rest of the ksymbol handling is already in place from for the bpf programs monitoring, so only the initial state was needed. perf report output: # Overhead Command Shared Object Symbol 12.37% test_progs [kernel.vmlinux] [k] entry_SYSCALL_64 11.80% test_progs [kernel.vmlinux] [k] syscall_return_via_sysret 9.63% test_progs bpf_prog_bcf7977d3b93787c_prog2 [k] bpf_prog_bcf7977d3b93787c_prog2 6.90% test_progs bpf_trampoline_24456 [k] bpf_trampoline_24456 6.36% test_progs [kernel.vmlinux] [k] memcpy_erms Committer notes: Use scnprintf() instead of strncpy() to overcome this on fedora:32, rawhide and OpenMandriva Cooker: CC /tmp/build/perf/util/bpf-event.o In file included from /usr/include/string.h:495, from /git/linux/tools/lib/bpf/libbpf_common.h:12, from /git/linux/tools/lib/bpf/bpf.h:31, from util/bpf-event.c:4: In function 'strncpy', inlined from 'process_bpf_image' at util/bpf-event.c:323:2, inlined from 'kallsyms_process_symbol' at util/bpf-event.c:358:9: /usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-14-jolsa@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf stat: Honour --timeout for forked workloadsArnaldo Carvalho de Melo2020-04-161-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When --timeout is used and a workload is specified to be started by 'perf stat', i.e. $ perf stat --timeout 1000 sleep 1h The --timeout wasn't being honoured, i.e. the workload, 'sleep 1h' in the above example, should be terminated after 1000ms, but it wasn't, 'perf stat' was waiting for it to finish. Fix it by sending a SIGTERM when the timeout expires. Now it works: # perf stat -e cycles --timeout 1234 sleep 1h sleep: Terminated Performance counter stats for 'sleep 1h': 1,066,692 cycles 1.234314838 seconds time elapsed 0.000750000 seconds user 0.000000000 seconds sys # Fixes: f1f8ad52f8bf ("perf stat: Add support to print counts after a period of time") Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207243 Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru> Cc: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: https://lore.kernel.org/lkml/20200415153803.GB20324@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge tag 'perf-urgent-for-mingo-5.7-20200414' of ↵Ingo Molnar2020-04-1622-387/+646
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: perf stat: Jin Yao: - Fix no metric header if --per-socket and --metric-only set build system: - Fix python building when built with clang, that was failing if the clang version doesn't support -fno-semantic-interposition. tools UAPI headers: Arnaldo Carvalho de Melo: - Update various copies of kernel headers, some ended up automatically updating build-time generated tables to enable tools such as 'perf trace' to decode syscalls and tracepoints arguments. Now the tools/perf build is free of UAPI drift warnings. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * tools headers: Synchronize linux/bits.h with the kernel sourcesArnaldo Carvalho de Melo2020-04-145-6/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick up the changes in these csets: 295bcca84916 ("linux/bits.h: add compile time sanity check of GENMASK inputs") 3945ff37d2f4 ("linux/bits.h: Extract common header for vDSO") To address this tools/perf build warning: Warning: Kernel ABI header at 'tools/include/linux/bits.h' differs from latest version at 'include/linux/bits.h' diff -u tools/include/linux/bits.h include/linux/bits.h This clashes with usage of userspace's static_assert(), that, at least on glibc, is guarded by a ifnded/endif pair, do the same to our copy of build_bug.h and avoid that diff in check_headers.sh so that we continue checking for drifts with the kernel sources master copy. This will all be tested with the set of build containers that includes uCLibc, musl libc, lots of glibc versions in lots of distros and cross build environments. The tools/objtool, tools/bpf, etc were tested as well. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools headers: Adopt verbatim copy of compiletime_assert() from kernel sourcesArnaldo Carvalho de Melo2020-04-141-0/+26
| | | | | | | | | | | | | | | | | | Will be needed when syncing the linux/bits.h header, in the next cset. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools headers: Update x86's syscall_64.tbl with the kernel sourcesArnaldo Carvalho de Melo2020-04-141-370/+370
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick the changes from: d3b1b776eefc ("x86/entry/64: Remove ptregs qualifier from syscall table") cab56d3484d4 ("x86/entry: Remove ABI prefixes from functions in syscall tables") 27dd84fafcd5 ("x86/entry/64: Use syscall wrappers for x32_rt_sigreturn") Addressing this tools/perf build warning: Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl That didn't result in any tooling changes, as what is extracted are just the first two columns, and these patches touched only the third. $ cp /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c /tmp $ cp arch/x86/entry/syscalls/syscall_64.tbl tools/perf/arch/x86/entry/syscalls/syscall_64.tbl $ make -C tools/perf O=/tmp/build/perf install-bin make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j12' parallel build DESCEND plugins CC /tmp/build/perf/util/syscalltbl.o INSTALL trace_plugins LD /tmp/build/perf/util/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf $ diff -u /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c /tmp/syscalls_64.c $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools headers UAPI: Sync drm/i915_drm.h with the kernel sourcesArnaldo Carvalho de Melo2020-04-141-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick the change in: 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on construction") That don't result in any changes in tooling, just silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools headers UAPI: Update tools's copy of drm.h headersArnaldo Carvalho de Melo2020-04-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Picking the changes from: 455e00f1412f ("drm: Add getfb2 ioctl") Silencing these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h' diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h Now 'perf trace' and other code that might use the tools/perf/trace/beauty autogenerated tables will be able to translate this new ioctl code into a string: $ tools/perf/trace/beauty/drm_ioctl.sh > before $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h $ tools/perf/trace/beauty/drm_ioctl.sh > after $ diff -u before after --- before 2020-04-14 09:28:45.461821077 -0300 +++ after 2020-04-14 09:28:53.594782685 -0300 @@ -107,6 +107,7 @@ [0xCB] = "SYNCOBJ_QUERY", [0xCC] = "SYNCOBJ_TRANSFER", [0xCD] = "SYNCOBJ_TIMELINE_SIGNAL", + [0xCE] = "MODE_GETFB2", [DRM_COMMAND_BASE + 0x00] = "I915_INIT", [DRM_COMMAND_BASE + 0x01] = "I915_FLUSH", [DRM_COMMAND_BASE + 0x02] = "I915_FLIP", $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Stone <daniels@collabora.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools headers kvm: Sync linux/kvm.h with the kernel sourcesArnaldo Carvalho de Melo2020-04-141-2/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick up the changes from: 9a5788c615f5 ("KVM: PPC: Book3S HV: Add a capability for enabling secure guests") 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in small chunks") 13da9ae1cdbf ("KVM: s390: protvirt: introduce and enable KVM_CAP_S390_PROTECTED") e0d2773d487c ("KVM: s390: protvirt: UV calls in support of diag308 0, 1") 19e122776886 ("KVM: S390: protvirt: Introduce instruction data area bounce buffer") 29b40f105ec8 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling") So far we're ignoring those arch specific ioctls, we need to revisit this at some time to have arch specific tables, etc: $ grep S390 tools/perf/trace/beauty/kvm_ioctl.sh egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \ $ This addresses these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h' diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jay Zhou <jianjay.zhou@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools headers UAPI: Sync linux/fscrypt.h with the kernel sourcesArnaldo Carvalho de Melo2020-04-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick the changes from: e98ad464750c ("fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl") That don't trigger any changes in tooling. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h' diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h In time we should come up with something like: $ tools/perf/trace/beauty/fsconfig.sh static const char *fsconfig_cmds[] = { [0] = "SET_FLAG", [1] = "SET_STRING", [2] = "SET_BINARY", [3] = "SET_PATH", [4] = "SET_PATH_EMPTY", [5] = "SET_FD", [6] = "CMD_CREATE", [7] = "CMD_RECONFIGURE", }; $ And: $ tools/perf/trace/beauty/drm_ioctl.sh | head #ifndef DRM_COMMAND_BASE #define DRM_COMMAND_BASE 0x40 #endif static const char *drm_ioctl_cmds[] = { [0x00] = "VERSION", [0x01] = "GET_UNIQUE", [0x02] = "GET_MAGIC", [0x03] = "IRQ_BUSID", [0x04] = "GET_MAP", [0x05] = "GET_CLIENT", $ For fscrypt's ioctls. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools include UAPI: Sync linux/vhost.h with the kernel sourcesArnaldo Carvalho de Melo2020-04-141-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To get the changes in: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Silencing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h' diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h This automatically picks these new ioctls, making tools such as 'perf trace' aware of them and possibly allowing to use the strings in filters, etc: $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after $ diff -u before after --- before 2020-04-14 09:12:28.559748968 -0300 +++ after 2020-04-14 09:12:38.781696242 -0300 @@ -24,9 +24,16 @@ [0x44] = "SCSI_GET_EVENTS_MISSED", [0x60] = "VSOCK_SET_GUEST_CID", [0x61] = "VSOCK_SET_RUNNING", + [0x72] = "VDPA_SET_STATUS", + [0x74] = "VDPA_SET_CONFIG", + [0x75] = "VDPA_SET_VRING_ENABLE", }; static const char *vhost_virtio_ioctl_read_cmds[] = { [0x00] = "GET_FEATURES", [0x12] = "GET_VRING_BASE", [0x26] = "GET_BACKEND_FEATURES", + [0x70] = "VDPA_GET_DEVICE_ID", + [0x71] = "VDPA_GET_STATUS", + [0x73] = "VDPA_GET_CONFIG", + [0x76] = "VDPA_GET_VRING_NUM", }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools arch x86: Sync asm/cpufeatures.h with the kernel sourcesArnaldo Carvalho de Melo2020-04-141-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick up the changes from: 077168e241ec ("x86/mce/amd: Add PPIN support for AMD MCE") 753039ef8b2f ("x86/cpu/amd: Call init_amd_zn() om Family 19h processors too") 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") These don't cause any changes in tooling, just silences this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Wei Huang <wei.huang2@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools headers UAPI: Sync linux/mman.h with the kernelArnaldo Carvalho de Melo2020-04-142-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To get the changes in: e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Add that to 'perf trace's mremap 'flags' decoder. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h' diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools headers UAPI: Sync sched.h with the kernelArnaldo Carvalho de Melo2020-04-142-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To get the changes in: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups") Add that to 'perf trace's clone 'flags' decoder. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/sched.h' differs from latest version at 'include/uapi/linux/sched.h' diff -u tools/include/uapi/linux/sched.h include/uapi/linux/sched.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools headers: Update linux/vdso.h and grab a copy of vdso/const.hArnaldo Carvalho de Melo2020-04-143-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To get in line with: 8165b57bca21 ("linux/const.h: Extract common header for vDSO") And silence this tools/perf/ build warning: Warning: Kernel ABI header at 'tools/include/linux/const.h' differs from latest version at 'include/linux/const.h' diff -u tools/include/linux/const.h include/linux/const.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf stat: Fix no metric header if --per-socket and --metric-only setJin Yao2020-04-141-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We received a report that was no metric header displayed if --per-socket and --metric-only were both set. It's hard for script to parse the perf-stat output. This patch fixes this issue. Before: root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket ^C Performance counter stats for 'system wide': S0 8 2.6 2.215270071 seconds time elapsed root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000 # time socket cpus 1.000411692 S0 8 2.2 2.001547952 S0 8 3.4 3.002446511 S0 8 3.4 4.003346157 S0 8 4.0 5.004245736 S0 8 0.3 After: root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket ^C Performance counter stats for 'system wide': CPI S0 8 2.1 1.813579830 seconds time elapsed root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000 # time socket cpus CPI 1.000415122 S0 8 3.2 2.001630051 S0 8 2.9 3.002612278 S0 8 4.3 4.003523594 S0 8 3.0 5.004504256 S0 8 3.7 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200331180226.25915-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf python: Check if clang supports -fno-semantic-interpositionArnaldo Carvalho de Melo2020-04-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The set of C compiler options used by distros to build python bindings may include options that are unknown to clang, we check for a variety of such options, add -fno-semantic-interposition to that mix: This fixes the build on, among others, Manjaro Linux: GEN /tmp/build/perf/python/perf.so clang-9: error: unknown argument: '-fno-semantic-interposition' error: command 'clang' failed with exit status 1 make: Leaving directory '/git/perf/tools/perf' [perfbuilder@602aed1c266d ~]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pkgversion='Arch Linux 9.3.0-1' --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto gdc_include_dir=/usr/include/dlang/gdc Thread model: posix gcc version 9.3.0 (Arch Linux 9.3.0-1) [perfbuilder@602aed1c266d ~]$ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo2020-04-141-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick up the changes in: 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Which causes these changes in tooling: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2020-04-01 12:11:14.789344795 -0300 +++ after 2020-04-01 12:11:56.907798879 -0300 @@ -10,6 +10,7 @@ [0x00000029] = "KNC_EVNTSEL1", [0x0000002a] = "IA32_EBL_CR_POWERON", [0x0000002c] = "EBC_FREQUENCY_ID", + [0x00000033] = "TEST_CTRL", [0x00000034] = "SMI_COUNT", [0x0000003a] = "IA32_FEAT_CTL", [0x0000003b] = "IA32_TSC_ADJUST", @@ -27,6 +28,7 @@ [0x000000c2] = "IA32_PERFCTR1", [0x000000cd] = "FSB_FREQ", [0x000000ce] = "PLATFORM_INFO", + [0x000000cf] = "IA32_CORE_CAPS", [0x000000e2] = "PKG_CST_CONFIG_CONTROL", [0x000000e7] = "IA32_MPERF", [0x000000e8] = "IA32_APERF", $ $ make -C tools/perf O=/tmp/build/perf install-bin <SNIP> CC /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o LD /tmp/build/perf/trace/beauty/tracepoints/perf-in.o LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf <SNIP> Now one can do: perf trace -e msr:* --filter=msr==IA32_CORE_CAPS or: perf trace -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL' And see only those MSRs being accessed via: # perf trace -v -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL' New filter for msr:read_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250) New filter for msr:write_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250) New filter for msr:rdpmc: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20200401153325.GC12534@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | Merge tag 'efi-urgent-2020-04-15' of ↵Linus Torvalds2020-04-158-30/+61
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Misc EFI fixes, including the boot failure regression caused by the BSS section not being cleared by the loaders" * tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Revert struct layout change to fix kexec boot regression efi/x86: Don't remap text<->rodata gap read-only for mixed mode efi/x86: Fix the deletion of variables in mixed mode efi/libstub/file: Merge file name buffers to reduce stack usage Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements efi/arm: Deal with ADR going out of range in efi_enter_kernel() efi/x86: Always relocate the kernel for EFI handover entry efi/x86: Move efi stub globals from .bss to .data efi/libstub/x86: Remove redundant assignment to pointer hdr efi/cper: Use scnprintf() for avoiding potential buffer overflow
| * | efi/x86: Revert struct layout change to fix kexec boot regressionArd Biesheuvel2020-04-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0a67361dcdaa29 ("efi/x86: Remove runtime table address from kexec EFI setup data") removed the code that retrieves the non-remapped UEFI runtime services pointer from the data structure provided by kexec, as it was never really needed on the kexec boot path: mapping the runtime services table at its non-remapped address is only needed when calling SetVirtualAddressMap(), which never happens during a kexec boot in the first place. However, dropping the 'runtime' member from struct efi_setup_data was a mistake. That struct is shared ABI between the kernel and the kexec tooling for x86, and so we cannot simply change its layout. So let's put back the removed field, but call it 'unused' to reflect the fact that we never look at its contents. While at it, add a comment to remind our future selves that the layout is external ABI. Fixes: 0a67361dcdaa29 ("efi/x86: Remove runtime table address from kexec EFI setup data") Reported-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | efi/x86: Don't remap text<->rodata gap read-only for mixed modeArd Biesheuvel2020-04-141-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d9e3d2c4f10320 ("efi/x86: Don't map the entire kernel text RW for mixed mode") updated the code that creates the 1:1 memory mapping to use read-only attributes for the 1:1 alias of the kernel's text and rodata sections, to protect it from inadvertent modification. However, it failed to take into account that the unused gap between text and rodata is given to the page allocator for general use. If the vmap'ed stack happens to be allocated from this region, any by-ref output arguments passed to EFI runtime services that are allocated on the stack (such as the 'datasize' argument taken by GetVariable() when invoked from efivar_entry_size()) will be referenced via a read-only mapping, resulting in a page fault if the EFI code tries to write to it: BUG: unable to handle page fault for address: 00000000386aae88 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD fd61063 P4D fd61063 PUD fd62063 PMD 386000e1 Oops: 0003 [#1] SMP PTI CPU: 2 PID: 255 Comm: systemd-sysv-ge Not tainted 5.6.0-rc4-default+ #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0008:0x3eaeed95 Code: ... <89> 03 be 05 00 00 80 a1 74 63 b1 3e 83 c0 48 e8 44 d2 ff ff eb 05 RSP: 0018:000000000fd73fa0 EFLAGS: 00010002 RAX: 0000000000000001 RBX: 00000000386aae88 RCX: 000000003e9f1120 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001 RBP: 000000000fd73fd8 R08: 00000000386aae88 R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc0f040220000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f21160ac940(0000) GS:ffff9cf23d500000(0000) knlGS:0000000000000000 CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033 CR2: 00000000386aae88 CR3: 000000000fd6c004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: Modules linked in: CR2: 00000000386aae88 ---[ end trace a8bfbd202e712834 ]--- Let's fix this by remapping text and rodata individually, and leave the gaps mapped read-write. Fixes: d9e3d2c4f10320 ("efi/x86: Don't map the entire kernel text RW for mixed mode") Reported-by: Jiri Slaby <jslaby@suse.cz> Tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200409130434.6736-10-ardb@kernel.org
| * | efi/x86: Fix the deletion of variables in mixed modeGary Lin2020-04-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | efi_thunk_set_variable() treated the NULL "data" pointer as an invalid parameter, and this broke the deletion of variables in mixed mode. This commit fixes the check of data so that the userspace program can delete a variable in mixed mode. Fixes: 8319e9d5ad98ffcc ("efi/x86: Handle by-ref arguments covering multiple pages in mixed mode") Signed-off-by: Gary Lin <glin@suse.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200408081606.1504-1-glin@suse.com Link: https://lore.kernel.org/r/20200409130434.6736-9-ardb@kernel.org
| * | efi/libstub/file: Merge file name buffers to reduce stack usageArd Biesheuvel2020-04-141-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arnd reports that commit 9302c1bb8e47 ("efi/libstub: Rewrite file I/O routine") reworks the file I/O routines in a way that triggers the following warning: drivers/firmware/efi/libstub/file.c:240:1: warning: the frame size of 1200 bytes is larger than 1024 bytes [-Wframe-larger-than=] We can work around this issue dropping an instance of efi_char16_t[256] from the stack frame, and reusing the 'filename' field of the file info struct that we use to obtain file information from EFI (which contains the file name even though we already know it since we used it to open the file in the first place) Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200409130434.6736-8-ardb@kernel.org
| * | Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirementsArd Biesheuvel2020-04-141-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EFI handover protocol was introduced on x86 to permit the boot loader to pass a populated boot_params structure as an additional function argument to the entry point. This allows the bootloader to pass the base and size of a initrd image, which is more flexible than relying on the EFI stub's file I/O routines, which can only access the file system from which the kernel image itself was loaded from firmware. This approach requires a fair amount of internal knowledge regarding the layout of the boot_params structure on the part of the boot loader, as well as knowledge regarding the allowed placement of the initrd in memory, and so it has been deprecated in favour of a new initrd loading method that is based on existing UEFI protocols and best practices. So update the x86 boot protocol documentation to clarify that the EFI handover protocol has been deprecated, and while at it, add a note that invoking the EFI handover protocol still requires the PE/COFF image to be loaded properly (as opposed to simply being copied into memory). Also, drop the code32_start header field from the list of values that need to be provided, as this is no longer required. Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200409130434.6736-7-ardb@kernel.org
| * | efi/arm: Deal with ADR going out of range in efi_enter_kernel()Ard Biesheuvel2020-04-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0698fac4ac2a ("efi/arm: Clean EFI stub exit code from cache instead of avoiding it") introduced a PC-relative reference to 'call_cache_fn' into efi_enter_kernel(), which lives way at the end of head.S. In some cases, the ARM version of the ADR instruction does not have sufficient range, resulting in a build error: arch/arm/boot/compressed/head.S:1453: Error: invalid constant (fffffffffffffbe4) after fixup ARM defines an alternative with a wider range, called ADRL, but this does not exist for Thumb-2. At the same time, the ADR instruction in Thumb-2 has a wider range, and so it does not suffer from the same issue. So let's switch to ADRL for ARM builds, and keep the ADR for Thumb-2 builds. Reported-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200409130434.6736-6-ardb@kernel.org
| * | efi/x86: Always relocate the kernel for EFI handover entryArvind Sankar2020-04-141-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary") tries to avoid relocating the kernel in the EFI stub as far as possible. However, when systemd-boot is used to boot a unified kernel image [1], the image is constructed by embedding the bzImage as a .linux section in a PE executable that contains a small stub loader from systemd that will call the EFI stub handover entry, together with additional sections and potentially an initrd. When this image is constructed, by for example dracut, the initrd is placed after the bzImage without ensuring that at least init_size bytes are available for the bzImage. If the kernel is not relocated by the EFI stub, this could result in the compressed kernel's startup code in head_{32,64}.S overwriting the initrd. To prevent this, unconditionally relocate the kernel if the EFI stub was entered via the handover entry point. [1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images Fixes: d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary") Reported-by: Sergey Shatunov <me@prok.pw> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200406180614.429454-2-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200409130434.6736-5-ardb@kernel.org
| * | efi/x86: Move efi stub globals from .bss to .dataArvind Sankar2020-04-142-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage") removed the .bss section from the bzImage. However, while a PE loader is required to zero-initialize the .bss section before calling the PE entry point, the EFI handover protocol does not currently document any requirement that .bss be initialized by the bootloader prior to calling the handover entry. When systemd-boot is used to boot a unified kernel image [1], the image is constructed by embedding the bzImage as a .linux section in a PE executable that contains a small stub loader from systemd together with additional sections and potentially an initrd. As the .bss section within the bzImage is no longer explicitly present as part of the file, it is not initialized before calling the EFI handover entry. Furthermore, as the size of the embedded .linux section is only the size of the bzImage file itself, the .bss section's memory may not even have been allocated. In particular, this can result in efi_disable_pci_dma being true even when it was not specified via the command line or configuration option, which in turn causes crashes while booting on some systems. To avoid issues, place all EFI stub global variables into the .data section instead of .bss. As of this writing, only boolean flags for a few command line arguments and the sys_table pointer were in .bss and will now move into the .data section. [1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images Fixes: 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage") Reported-by: Sergey Shatunov <me@prok.pw> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200406180614.429454-1-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200409130434.6736-4-ardb@kernel.org
| * | efi/libstub/x86: Remove redundant assignment to pointer hdrColin Ian King2020-04-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pointer hdr is being assigned a value that is never read and it is being updated later with a new value. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200402102537.503103-1-colin.king@canonical.com Link: https://lore.kernel.org/r/20200409130434.6736-3-ardb@kernel.org
| * | efi/cper: Use scnprintf() for avoiding potential buffer overflowTakashi Iwai2020-04-141-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200311072145.5001-1-tiwai@suse.de Link: https://lore.kernel.org/r/20200409130434.6736-2-ardb@kernel.org
* | Merge tag 'hyperv-fixes-signed' of ↵Linus Torvalds2020-04-147-24/+67
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - a series from Tianyu Lan to fix crash reporting on Hyper-V - three miscellaneous cleanup patches * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/Hyper-V: Report crash data in die() when panic_on_oops is set x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set x86/Hyper-V: Report crash register data or kmsg before running crash kernel x86/Hyper-V: Trigger crash enlightenment only once during system crash. x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump x86/Hyper-V: Unload vmbus channel in hv panic callback x86: hyperv: report value of misc_features hv_debugfs: Make hv_debug_root static hv: hyperv_vmbus.h: Replace zero-length array with flexible-array member
| * | x86/Hyper-V: Report crash data in die() when panic_on_oops is setTianyu Lan2020-04-113-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When oops happens with panic_on_oops unset, the oops thread is killed by die() and system continues to run. In such case, guest should not report crash register data to host since system still runs. Check panic_on_oops and return directly in hyperv_report_panic() when the function is called in the die() and panic_on_oops is unset. Fix it. Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful") Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200406155331.2105-7-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
| * | x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not setTianyu Lan2020-04-111-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sysctl_record_panic_msg is not set, the panic will not be reported to Hyper-V via hyperv_report_panic_msg(). So the crash should be reported via hyperv_report_panic(). Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20200406155331.2105-6-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
| * | x86/Hyper-V: Report crash register data or kmsg before running crash kernelTianyu Lan2020-04-111-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to notify Hyper-V when a Linux guest VM crash occurs, so there is a record of the crash even when kdump is enabled. But crash_kexec_post_notifiers defaults to "false", so the kdump kernel runs before the notifiers and Hyper-V never gets notified. Fix this by always setting crash_kexec_post_notifiers to be true for Hyper-V VMs. Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20200406155331.2105-5-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
| * | x86/Hyper-V: Trigger crash enlightenment only once during system crash.Tianyu Lan2020-04-111-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a guest VM panics, Hyper-V should be notified only once via the crash synthetic MSRs. Current Linux code might write these crash MSRs twice during a system panic: 1) hyperv_panic/die_event() calling hyperv_report_panic() 2) hv_kmsg_dump() calling hyperv_report_panic_msg() Fix this by not calling hyperv_report_panic() if a kmsg dump has been successfully registered. The notification will happen later via hyperv_report_panic_msg(). Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20200406155331.2105-4-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>