summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tools headers: Synchronize perf_event.h headerArnaldo Carvalho de Melo2017-11-281-0/+1
| | | | | | | | | | | | | | | To get the changes in the 085b30625e39 ("perf/core: Add PERF_AUX_FLAG_COLLISION to report colliding samples") commit, that will be eventually used by perf to handle the ARM SPE architecture. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/n/tip-178ohv0oy0csq3kzfdk8ky4n@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools headers: Synchronize kernel ABI headers wrt SPDX tagsArnaldo Carvalho de Melo2017-11-282-0/+2
| | | | | | | | | | | | | | | | Two more, that were just in perf/core and thus weren't covered by Ingo's latest headers synch, kcmp.h and prctl.h, silencing this: Warning: Kernel ABI header at 'tools/include/uapi/linux/kcmp.h' differs from latest version at 'include/uapi/linux/kcmp.h' Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h' Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2a0r7iybyqpkftllyy5t9hfk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools/headers: Synchronize kernel x86 UAPI headersIngo Molnar2017-11-282-264/+281
| | | | | | | | | | | | | | | | | | | | | | Two x86 headers got modified in this merge window: arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/disabled-features.h To support x86 UMIP feature, to add new AVX instructions, plus cleanups. None of those changes have an effect on tooling, so do a plain copy. Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* perf intel-pt: Bring instruction decoder files into line with the kernelAdrian Hunter2017-11-281-0/+10
| | | | | | | | There are just a few new defines which do not affect perf tools. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/1511253326-22308-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf test: Fix test 21 for s390xThomas Richter2017-11-281-0/+4
| | | | | | | | | | | | | | | | | | Test case 21 (Number of exit events of a simple workload) fails on s390x. The reason is the invalid sample frequency supplied for this test. On s390x the minimum sample frequency is much higher (see output of /proc/service_levels). Supply a save sample frequency value for s390x to fix this. The value will be adjusted by the s390x CPUMF frequency convertion function to a value well below the sysctl kernel.perf_event_max_sample_rate value. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> LPU-Reference: 20171123114611.93397-1-tmricht@linux.vnet.ibm.com Link: https://lkml.kernel.org/n/tip-1ynblyhi1n81idpido59nt1y@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fixup discontiguous/sparse numa nodesSatheesh Rajendran2017-11-281-5/+51
| | | | | | | | | | | | | | Certain systems are designed to have sparse/discontiguous nodes. On such systems, 'perf bench numa' hangs, shows wrong number of nodes and shows values for non-existent nodes. Handle this by only taking nodes that are exposed by kernel to userspace. Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1edbcd353c009e109e93d78f2f46381930c340fe.1511368645.git.sathnaga@linux.vnet.ibm.com Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf top: Use signal interface for SIGWINCH handlerJiri Olsa2017-11-281-12/+3
| | | | | | | | | | | | | | | | | There's no need for SA_SIGINFO data in SIGWINCH handler, switching it to register the handler via signal interface as we do for the rest of the signals in perf top. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-elxp1vdnaog1scaj13cx7cu0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf top: Fix window dimensions change handlingJiri Olsa2017-11-281-3/+12
| | | | | | | | | | | | | | | | | | | | | | The stdio perf top crashes when we change the terminal window size. The reason is that we assumed we get the perf_top pointer as a signal handler argument which is not the case. Changing the SIGWINCH handler logic to change global resize variable, which is checked in the main thread loop. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ysuzwz77oev1ftgvdscn9bpu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf: Fix header.size for namespace eventsJiri Olsa2017-11-281-1/+4
| | | | | | | | | | | Reset header size for namespace events, otherwise it only gets bigger in ctx iterations. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Fixes: e422267322cd ("perf: Add PERF_RECORD_NAMESPACES to include namespaces related info") Link: http://lkml.kernel.org/n/tip-nlo4gonz9d4guyb8153ukzt0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf top: Ignore kptr_restrict when not sampling the kernelArnaldo Carvalho de Melo2017-11-281-3/+5
| | | | | | | | | | | | | | If all events have attr.exclude_kernel set, no need to look at kptr_restrict. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-yegpzg5bf2im69g0tfizqaqz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf record: Ignore kptr_restrict when not sampling the kernelArnaldo Carvalho de Melo2017-11-281-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we're not sampling the kernel, we shouldn't care about kptr_restrict neither synthesize anything for assisting in resolving kernel samples, like the reference relocation symbol or kernel modules information. Before: $ cat /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid 2 2 $ perf record sleep 1 WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check /proc/sys/kernel/kptr_restrict. Samples in kernel functions may not be resolved if a suitable vmlinux file is not found in the buildid cache or in the vmlinux path. Samples in kernel modules won't be resolved at all. If some relocation was applied (e.g. kexec) symbols may be misresolved even with a suitable vmlinux or kallsyms file. Couldn't record kernel reference relocation symbol Symbol resolution may be skewed if relocation was used (e.g. kexec). Check /proc/kallsyms permission or run as root. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ] $ perf evlist -v cycles:uppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 $ After: $ perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (10 samples) ] $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-t025e9zftbx2b8cq2w01g5e5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Ignore kptr_restrict when not sampling the kernelArnaldo Carvalho de Melo2017-11-281-0/+3
| | | | | | | | | | | | | | | If none of the evsels has attr.exclude_kernel set to zero, no kernel samples, so no point in warning the user about problems in processing kernel samples, as there will be none. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-7dn926v3at8txxkky92aesz2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Add helper to check if attr.exclude_kernel is set in all evselsArnaldo Carvalho de Melo2017-11-282-0/+14
| | | | | | | | | | | | | | The warning about kptr_restrict needs to be emitted only when it is set and we ask for kernel space samples, so add a helper to help with that. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-fh7drty6yljei9gxxzer6eup@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf test shell: Fix test case probe libc's inet_pton on s390xThomas Richter2017-11-281-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'perf test' case "probe libc's inet_pton & backtrace it with ping" fails on s390x. The reason is the 'realpath /lib64/ld*.so.* | uniq' line which returns 2 libraries: root@s35lp76 shell]# realpath /lib64/ld*.so.* | uniq /usr/lib64/ld-2.26.so /usr/lib64/ld_pre_smc.so.1.0.1 [root@s35lp76 shell] This output makes the "perf probe" command lines invalid. Use ldd tool to find out the libraries required by "bash" and check if symbol "inet_pton" is part of the "libc" library. Some distros do not have a /lib64 directory. I have also added a check for the existence of an IPv6 network interface before it is being used. Committer changes: We can't really use ldd for libc, as in some systems, such as x86_64, it has hardlinks and then ldd sees one and the kernel the other, so grep for libc in /proc/self/maps to get the one we'll receive from PERF_RECORD_MMAP. Thomas checked this change and acked it. Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Suggested-by: Hendrik Brückner <brueckner@linux.vnet.ibm.com> Reviewed-by: Hendrik Brückner <brueckner@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20171114133409.GN8836@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf test shell: Fix check open filename arg using 'perf trace' on s390xThomas Richter2017-11-281-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This 'perf test' case fails on s390x. The 'touch' command on s390x uses the 'openat' system call to open the file named on the command line: [root@s35lp76 perf]# perf probe -l probe:vfs_getname (on getname_flags:72@fs/namei.c with pathname) [root@s35lp76 perf]# perf trace -e open touch /tmp/abc 0.400 ( 0.015 ms): touch/27542 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3 [root@s35lp76 perf]# There is no 'open' system call for file '/tmp/abc'. Instead the 'openat' system call is used: [root@s35lp76 perf]# strace touch /tmp/abc execve("/usr/bin/touch", ["touch", "/tmp/abc"], 0x3ffd547ec98 /* 30 vars */) = 0 [...] openat(AT_FDCWD, "/tmp/abc", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 [...] On s390x the 'egrep' command does not find a matching pattern and returns an error. Fix this for s390x create a platform dependent command line to enable the 'perf probe' call to listen to the 'openat' system call and get the expected output. Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> LPU-Reference: 20171114071847.2381-1-tmricht@linux.vnet.ibm.com Link: http://lkml.kernel.org/n/tip-3qf38jk0prz54rhmhyu871my@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Do not truncate instruction names at 6 charsRavi Bangoria2017-11-281-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are many instructions, esp on PowerPC, whose mnemonics are longer than 6 characters. Using precision limit causes truncation of such mnemonics. Fix this by removing precision limit. Note that, 'width' is still 6, so alignment won't get affected for length <= 6. Before: li r11,-1 xscvdp vs1,vs1 add. r10,r10,r11 After: li r11,-1 xscvdpsxds vs1,vs1 add. r10,r10,r11 Reported-by: Donald Stence <dstence@us.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/20171114032540.4564-1-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf help: Fix a bug during strstart() conversionNamhyung Kim2017-11-281-2/+2
| | | | | | | | | | | | | | | | | | | | | The commit 8e99b6d4533c changed prefixcmp() to strstart() but missed to change the return value in some place. It makes perf help print annoying output even for sane config items like below: $ perf help '.root': unsupported man viewer sub key. ... Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Sihyeon Jang <uneedsihyeon@gmail.com> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20171114001542.GA16464@sejong Fixes: 8e99b6d4533c ("tools include: Adopt strstarts() from the kernel") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf machine: Guard against NULL in machine__exit()Arnaldo Carvalho de Melo2017-11-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent fix for 'perf trace' introduced a bug where machine__exit(trace->host) could be called while trace->host was still NULL, so make this more robust by guarding against NULL, just like free() does. The problem happens, for instance, when !root users try to run 'perf trace': [acme@jouet linux]$ trace Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit) Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' perf: Segmentation fault Obtained 7 stack frames. [0x4f1b2e] /lib64/libc.so.6(+0x3671f) [0x7f43a1dd971f] [0x4f3fec] [0x47468b] [0x42a2db] /lib64/libc.so.6(__libc_start_main+0xe9) [0x7f43a1dc3509] [0x42a6c9] Segmentation fault (core dumped) [acme@jouet linux]$ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 33974a414ce2 ("perf trace: Call machine__exit() at exit") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf script: Fix --per-event-dump for auxtrace synth evselsArnaldo Carvalho de Melo2017-11-281-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | When processing PERF_RECORD_AUXTRACE_INFO several perf_evsel entries will be synthesized and inserted into session->evlist, eventually ending in perf_script.tool.sample(), which ends up calling builtin-script.c's process_event(), that expects evsel->priv to be a perf_evsel_script object with a valid FILE pointer in fp. So we need to intercept the processing of PERF_RECORD_AUXTRACE_INFO and then setup evsel->priv for these newly created perf_evsel instances, do it to fix the segfault in process_event() trying to use a NULL for that FILE pointer. Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Fixes: a14390fde64e ("perf script: Allow creating per-event dump files") Link: http://lkml.kernel.org/n/tip-bthnur8r8de01gxvn2qayx6e@git.kernel.org [ Merge fix by Ravi Bangoria before pushing upstream to preserv bisectability ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Fix up leftover perf_evsel_stat usage via evsel->privArnaldo Carvalho de Melo2017-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | I forgot one conversion, which got noticed by Thomas when running: $ perf stat -e '{cpu-clock,instructions}' kill kill: not enough arguments Segmentation fault (core dumped) $ Fix it, those stats are in evsel->stats, not anymore in evsel->priv. Reported-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: e669e833da8d ("perf evsel: Restore evsel->priv as a tool private area") Link: http://lkml.kernel.org/r/20171109150046.GN4333@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf trace: Fix an exit code of trace__symbols_initAndrei Vagin2017-11-281-2/+4
| | | | | | | | | | | | | | Currently if trace_event__register_resolver() fails, we return -errno, but we can't be sure that errno isn't zero in this case. Signed-off-by: Andrei Vagin <avagin@openvz.org> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vasily Averin <vvs@virtuozzo.com> Link: http://lkml.kernel.org/r/20171108002246.8924-2-avagin@openvz.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf record: Fix -c/-F options for cpu event aliasesAndi Kleen2017-11-285-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Intel PMU event aliases have a implicit period= specifier to set the default period. Unfortunately this breaks overriding these periods with -c or -F, because the alias terms look like they are user specified to the internal parser, and user specified event qualifiers override the command line options. Track that they are coming from aliases by adding a "weak" state to the term. Any weak terms don't override command line options. I only did it for -c/-F for now, I think that's the only case that's broken currently. Before: $ perf record -c 1000 -vv -e uops_issued.any ... { sample_period, sample_freq } 2000003 After: $ perf record -c 1000 -vv -e uops_issued.any ... { sample_period, sample_freq } 1000 Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20171020202755.21410-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf record: Generate PERF_RECORD_{MMAP,COMM,EXEC} with --delayArnaldo Carvalho de Melo2017-11-281-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we use an initial delay, e.g.: 'perf record --delay 1000', we do not enable the events until that delay has passed after we started the workload, including the tracking event, i.e. the one for which we have attr.mmap, etc, enabled to ask the kernel to generate the PERF_RECORD_{MMAP,COMM,EXEC} metadata events that will then allow us to resolve addresses in samples to the map, dso and symbol. There will be a shadow that even synthesizing samples won't cover, i.e. the workload that we start and other processes forking while we wait for the initial delay to expire. So use a dummy event to be the tracking one and make it be enabled on exec. Before: # perf record --delay 1000 stress --cpu 1 --timeout 5 stress: info: [9029] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd stress: info: [9029] successful run completed in 5s [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.624 MB perf.data (15908 samples) ] # perf script | head :9031 9031 32001.826888: 1 cycles:ppp: ffffffff831aa30d event_function (/lib/modules/4.14.0-rc6+/build/vmlinux) :9031 9031 32001.826893: 1 cycles:ppp: ffffffff8300d1a0 intel_bts_enable_local (/lib/modules/4.14.0-rc6+/build/vmlinux) :9031 9031 32001.826895: 7 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux) :9031 9031 32001.826897: 103 cycles:ppp: ffffffff8300c331 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux) :9031 9031 32001.826899: 1615 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux) :9031 9031 32001.826902: 26724 cycles:ppp: ffffffff8384c6a7 native_irq_return_iret (/lib/modules/4.14.0-rc6+/build/vmlinux) :9031 9031 32001.826913: 329739 cycles:ppp: 7fb2a5410932 [unknown] ([unknown]) :9031 9031 32001.827033: 1225451 cycles:ppp: 7fb2a5410930 [unknown] ([unknown]) :9031 9031 32001.827474: 1391725 cycles:ppp: 7fb2a5410930 [unknown] ([unknown]) :9031 9031 32001.827978: 1233697 cycles:ppp: 7fb2a5410928 [unknown] ([unknown]) # After: # perf record --delay 1000 stress --cpu 1 --timeout 5 stress: info: [9741] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd stress: info: [9741] successful run completed in 5s [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.751 MB perf.data (15976 samples) ] # perf script | head stress 9742 32110.959106: 1 cycles:ppp: ffffffff831b26f6 __perf_event_task_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux) stress 9742 32110.959110: 1 cycles:ppp: ffffffff8300c2e9 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux) stress 9742 32110.959112: 7 cycles:ppp: ffffffff830231e0 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux) stress 9742 32110.959115: 101 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux) stress 9742 32110.959117: 1533 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux) stress 9742 32110.959119: 23992 cycles:ppp: ffffffff831b0900 ctx_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux) stress 9742 32110.959129: 329406 cycles:ppp: 7f4b1b661930 __random_r (/usr/lib64/libc-2.25.so) stress 9742 32110.959249: 1288322 cycles:ppp: 5566e1e7cbc9 hogcpu (/usr/bin/stress) stress 9742 32110.959712: 1464046 cycles:ppp: 7f4b1b66179e __random (/usr/lib64/libc-2.25.so) stress 9742 32110.960241: 1266918 cycles:ppp: 7f4b1b66195b __random_r (/usr/lib64/libc-2.25.so) # Reported-by: Bram Stolk <b.stolk@gmail.com> Tested-by: Bram Stolk <b.stolk@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 6619a53ef757 ("perf record: Add --initial-delay option") Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Set the correct idx when adding dummy eventsArnaldo Carvalho de Melo2017-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The evsel->idx field is used mainly to access the right bucket in per-event arrays such as the annotation ones, but also to set evsel->tracking, that in turn will decide what of the events will ask for PERF_RECORD_{MMAP,COMM,EXEC} to be generated, i.e. which perf_event_attr will have its mmap, etc fields set. When we were adding the "dummy" event using perf_evlist__add_dummy() we were not setting it correctly, which could result in multiple tracking events. Now that I'll try using a dummy event to be the tracking one when using 'perf record --delay', i.e. when we process the --delay setting we may already have the evlist set up, like with: perf record -e cycles,instructions --delay 1000 ./workload We will need to add a "dummy" event, then reset evsel->tracking for the first event, "cycles", and set it instead to the dummy one, and also setting its attr.enable_on_exec, so that we get the PERF_RECORD_MMAP, etc metadata events while waiting to enable the explicitely requested events, so lets get this straight and set the right evsel->idx. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Bram Stolk <b.stolk@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu()Rafael J. Wysocki2017-11-131-4/+7
| | | | | | | | | | | | | | | | | | | | Even though aperfmperf_snapshot_khz() caches the samples.khz value to return if called again in a sufficiently short time, its caller, arch_freq_get_on_cpu(), still uses smp_call_function_single() to run it which may allow user space to trigger an IPI storm by reading from the scaling_cur_freq cpufreq sysfs file in a tight loop. To avoid that, move the decision on whether or not to return the cached samples.khz value to arch_freq_get_on_cpu(). This change was part of commit 941f5f0f6ef5 ("x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo"), but it was not the reason for the revert and it remains applicable. Fixes: 4815d3c56d1e (cpufreq: x86: Make scaling_cur_freq behave more as expected) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'x86-timers-for-linus' of ↵Linus Torvalds2017-11-137-51/+148
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer updates from Thomas Gleixner: "These updates are related to TSC handling: - Support platforms which have synchronized TSCs but the boot CPU has a non zero TSC_ADJUST value, which is considered a firmware bug on normal systems. This applies to HPE/SGI UV platforms where the platform firmware uses TSC_ADJUST to ensure TSC synchronization across a huge number of sockets, but due to power on timings the boot CPU cannot be guaranteed to have a zero TSC_ADJUST register value. - Fix the ordering of udelay calibration and kvmclock_init() - Cleanup the udelay and calibration code" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Mark cyc2ns_init() and detect_art() __init x86/platform/UV: Mark tsc_check_sync as an init function x86/tsc: Make CONFIG_X86_TSC=n build work again x86/platform/UV: Add check of TSC state set by UV BIOS x86/tsc: Provide a means to disable TSC ART x86/tsc: Drastically reduce the number of firmware bug warnings x86/tsc: Skip TSC test and error messages if already unstable x86/tsc: Add option that TSC on Socket 0 being non-zero is valid x86/timers: Move simple_udelay_calibration() past kvmclock_init() x86/timers: Make recalibrate_cpu_khz() void x86/timers: Move the simple udelay calibration to tsc.h
| * x86/tsc: Mark cyc2ns_init() and detect_art() __initDou Liyang2017-11-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | These two functions are only called by tsc_init(), which is an __init function during boot time, so mark them __init as well. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1510135792-17429-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/platform/UV: Mark tsc_check_sync as an init functionmike.travis@hpe.com2017-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix build problem: >> WARNING: vmlinux.o(.text+0x4223a): Section mismatch in reference from the function uv_tsc_check_sync() to the function .init.text:uv_early_read_mmr() The function uv_tsc_check_sync() references the function __init uv_early_read_mmr(). This is often because uv_tsc_check_sync lacks a __init Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171023191841.985614692@stormcage.americas.sgi.com
| * x86/tsc: Make CONFIG_X86_TSC=n build work againThomas Gleixner2017-10-171-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tsc_async_resets is only available when CONFIG_X86_TSC=y. So a build with CONFIG_X86_TSC=n breaks: arch/x86/kernel/tsc.o: In function `tsc_init': (.init.text+0x87b): undefined reference to `tsc_async_resets' Add a stub define for the TSC=n case. Side note: This config switch should simply be removed. Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 341102c3ef29 ("x86/tsc: Add option that TSC on Socket 0 being non-zero is valid") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Travis <mike.travis@hpe.com>
| * x86/platform/UV: Add check of TSC state set by UV BIOSmike.travis@hpe.com2017-10-162-5/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Insert a check early in UV system startup that checks whether BIOS was able to obtain satisfactory TSC Sync stability. If not, it usually is caused by an error in the external TSC clock generation source. In this case the best fallback is to use the builtin hardware RTC as the kernel will not be able to set an accurate TSC sync either. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Andrew Banman <andrew.abanman@hpe.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163202.406294490@stormcage.americas.sgi.com
| * x86/tsc: Provide a means to disable TSC ARTmike.travis@hpe.com2017-10-161-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On systems where multiple chassis are reset asynchronously, and thus the TSC counters are started asynchronously, the offset needed to convert to TSC to ART would be different. Disable ART in that case and rely on the TSC counters to supply the accurate time. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163202.289397994@stormcage.americas.sgi.com
| * x86/tsc: Drastically reduce the number of firmware bug warningsmike.travis@hpe.com2017-10-161-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to the TSC ADJUST MSR being available, the method to set TSC's in sync with each other naturally caused a small skew between cpu threads. This was NOT a firmware bug at the time so introducing a whole avalanche of alarming warning messages might cause unnecessary concern and customer complaints. (Example: >3000 msgs in a 32 socket Skylake system.) Simply report the warning condition, if possible do the necessary fixes, and move on. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163202.175062400@stormcage.americas.sgi.com
| * x86/tsc: Skip TSC test and error messages if already unstablemike.travis@hpe.com2017-10-161-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the TSC has already been determined to be unstable, then checking TSC ADJUST values is a waste of time and generates unnecessary error messages. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163202.060777495@stormcage.americas.sgi.com
| * x86/tsc: Add option that TSC on Socket 0 being non-zero is validmike.travis@hpe.com2017-10-162-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a flag to indicate and process that TSC counters are on chassis that reset at different times during system startup. Therefore which TSC ADJUST values should be zero is not predictable. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Andrew Banman <andrew.abanman@hpe.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163201.944370012@stormcage.americas.sgi.com
| * x86/timers: Move simple_udelay_calibration() past kvmclock_init()Boris Ostrovsky2017-09-251-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | simple_udelay_calibration() relies on x86_platform's calibration ops. For KVM these ops are set late in setup_arch() and so simple_udelay_calibration() ends up using native version. Besides being possibly incorrect, this significantly increases kernel boot time. For example, on my laptop executing start_kernel() by a guest takes ~10 times more than when KVM's ops are used. Since early_xdbc_setup_hardware() relies on calibration having been performed move it too. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: baolu.lu@linux.intel.com Link: https://lkml.kernel.org/r/20170911185111.20636-1-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86/timers: Make recalibrate_cpu_khz() voidDou Liyang2017-09-252-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | recalibrate_cpu_khz() is called from powernow K7 and Pentium 4/Xeon CPU freq driver. It recalibrates cpu frequency in case of SMP = n and doesn't need to return anything. Mark it void, also remove the #else branch. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1500003247-17368-2-git-send-email-douly.fnst@cn.fujitsu.com
| * x86/timers: Move the simple udelay calibration to tsc.hDou Liyang2017-09-253-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit dd759d93f4dd ("x86/timers: Add simple udelay calibration") adds an static function in x86 boot-time initializations. But, this function is actually related to TSC, so it should be maintained in tsc.c, not in setup.c. Move simple_udelay_calibration() from setup.c to tsc.c and rename it to tsc_early_delay_calibrate for more readability. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1500003247-17368-1-git-send-email-douly.fnst@cn.fujitsu.com
* | Merge branch 'x86-cache-for-linus' of ↵Linus Torvalds2017-11-136-32/+170
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache resource updates from Thomas Gleixner: "This update provides updates to RDT: - A diagnostic framework for the Resource Director Technology (RDT) user interface (sysfs). The failure modes of the user interface are hard to diagnose from the error codes. An extra last command status file provides now sensible textual information about the failure so its simpler to use. - A few minor cleanups and updates in the RDT code" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix a silent failure when writing zero value schemata x86/intel_rdt: Fix potential deadlock during resctrl mount x86/intel_rdt: Fix potential deadlock during resctrl unmount x86/intel_rdt: Initialize bitmask of shareable resource if CDP enabled x86/intel_rdt: Remove redundant assignment x86/intel_rdt/cqm: Make integer rmid_limbo_count static x86/intel_rdt: Add documentation for "info/last_cmd_status" x86/intel_rdt: Add diagnostics when making directories x86/intel_rdt: Add diagnostics when writing the cpus file x86/intel_rdt: Add diagnostics when writing the tasks file x86/intel_rdt: Add diagnostics when writing the schemata file x86/intel_rdt: Add framework for better RDT UI diagnostics
| * | x86/intel_rdt: Fix a silent failure when writing zero value schemataXiaochen Shen2017-11-121-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writing an invalid schemata with no domain values (e.g., "(L3|MB):"), results in a silent failure, i.e. the last_cmd_status returns OK, Check for an empty value and set the result string with a proper error message and return -EINVAL. Before the fix: # mkdir /sys/fs/resctrl/p1 # echo "L3:" > /sys/fs/resctrl/p1/schemata (silent failure) # cat /sys/fs/resctrl/info/last_cmd_status ok # echo "MB:" > /sys/fs/resctrl/p1/schemata (silent failure) # cat /sys/fs/resctrl/info/last_cmd_status ok After the fix: # mkdir /sys/fs/resctrl/p1 # echo "L3:" > /sys/fs/resctrl/p1/schemata -bash: echo: write error: Invalid argument # cat /sys/fs/resctrl/info/last_cmd_status Missing 'L3' value # echo "MB:" > /sys/fs/resctrl/p1/schemata -bash: echo: write error: Invalid argument # cat /sys/fs/resctrl/info/last_cmd_status Missing 'MB' value [ Tony: This is an unintended side effect of the patch earlier to allow the user to just write the value they want to change. While allowing user to specify less than all of the values, it also allows an empty value. ] Fixes: c4026b7b95a4 ("x86/intel_rdt: Implement "update" mode when writing schemata file") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: https://lkml.kernel.org/r/20171110191624.20280-1-tony.luck@intel.com
| * | x86/intel_rdt: Fix potential deadlock during resctrl mountReinette Chatre2017-10-211-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sai reported a warning during some MBA tests: [ 236.755559] ====================================================== [ 236.762443] WARNING: possible circular locking dependency detected [ 236.769328] 4.14.0-rc4-yocto-standard #8 Not tainted [ 236.774857] ------------------------------------------------------ [ 236.781738] mount/10091 is trying to acquire lock: [ 236.787071] (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8117f892>] static_key_enable+0x12/0x30 [ 236.797058] but task is already holding lock: [ 236.803552] (&type->s_umount_key#37/1){+.+.}, at: [<ffffffff81208b2f>] sget_userns+0x32f/0x520 [ 236.813247] which lock already depends on the new lock. [ 236.822353] the existing dependency chain (in reverse order) is: [ 236.830686] -> #4 (&type->s_umount_key#37/1){+.+.}: [ 236.837756] __lock_acquire+0x1100/0x11a0 [ 236.842799] lock_acquire+0xdf/0x1d0 [ 236.847363] down_write_nested+0x46/0x80 [ 236.852310] sget_userns+0x32f/0x520 [ 236.856873] kernfs_mount_ns+0x7e/0x1f0 [ 236.861728] rdt_mount+0x30c/0x440 [ 236.866096] mount_fs+0x38/0x150 [ 236.870262] vfs_kern_mount+0x67/0x150 [ 236.875015] do_mount+0x1df/0xd50 [ 236.879286] SyS_mount+0x95/0xe0 [ 236.883464] entry_SYSCALL_64_fastpath+0x18/0xad [ 236.889183] -> #3 (rdtgroup_mutex){+.+.}: [ 236.895292] __lock_acquire+0x1100/0x11a0 [ 236.900337] lock_acquire+0xdf/0x1d0 [ 236.904899] __mutex_lock+0x80/0x8f0 [ 236.909459] mutex_lock_nested+0x1b/0x20 [ 236.914407] intel_rdt_online_cpu+0x3b/0x4a0 [ 236.919745] cpuhp_invoke_callback+0xce/0xb80 [ 236.925177] cpuhp_thread_fun+0x1c5/0x230 [ 236.930222] smpboot_thread_fn+0x11a/0x1e0 [ 236.935362] kthread+0x152/0x190 [ 236.939536] ret_from_fork+0x27/0x40 [ 236.944097] -> #2 (cpuhp_state-up){+.+.}: [ 236.950199] __lock_acquire+0x1100/0x11a0 [ 236.955241] lock_acquire+0xdf/0x1d0 [ 236.959800] cpuhp_issue_call+0x12e/0x1c0 [ 236.964845] __cpuhp_setup_state_cpuslocked+0x13b/0x2f0 [ 236.971242] __cpuhp_setup_state+0xa7/0x120 [ 236.976483] page_writeback_init+0x43/0x67 [ 236.981623] pagecache_init+0x38/0x3b [ 236.986281] start_kernel+0x3c6/0x41a [ 236.990931] x86_64_start_reservations+0x2a/0x2c [ 236.996650] x86_64_start_kernel+0x72/0x75 [ 237.001793] verify_cpu+0x0/0xfb [ 237.005966] -> #1 (cpuhp_state_mutex){+.+.}: [ 237.012364] __lock_acquire+0x1100/0x11a0 [ 237.017408] lock_acquire+0xdf/0x1d0 [ 237.021969] __mutex_lock+0x80/0x8f0 [ 237.026527] mutex_lock_nested+0x1b/0x20 [ 237.031475] __cpuhp_setup_state_cpuslocked+0x54/0x2f0 [ 237.037777] __cpuhp_setup_state+0xa7/0x120 [ 237.043013] page_alloc_init+0x28/0x30 [ 237.047769] start_kernel+0x148/0x41a [ 237.052425] x86_64_start_reservations+0x2a/0x2c [ 237.058145] x86_64_start_kernel+0x72/0x75 [ 237.063284] verify_cpu+0x0/0xfb [ 237.067456] -> #0 (cpu_hotplug_lock.rw_sem){++++}: [ 237.074436] check_prev_add+0x401/0x800 [ 237.079286] __lock_acquire+0x1100/0x11a0 [ 237.084330] lock_acquire+0xdf/0x1d0 [ 237.088890] cpus_read_lock+0x42/0x90 [ 237.093548] static_key_enable+0x12/0x30 [ 237.098496] rdt_mount+0x406/0x440 [ 237.102862] mount_fs+0x38/0x150 [ 237.107035] vfs_kern_mount+0x67/0x150 [ 237.111787] do_mount+0x1df/0xd50 [ 237.116058] SyS_mount+0x95/0xe0 [ 237.120233] entry_SYSCALL_64_fastpath+0x18/0xad [ 237.125952] other info that might help us debug this: [ 237.134867] Chain exists of: cpu_hotplug_lock.rw_sem --> rdtgroup_mutex --> &type->s_umount_key#37/1 [ 237.148425] Possible unsafe locking scenario: [ 237.155015] CPU0 CPU1 [ 237.160057] ---- ---- [ 237.165100] lock(&type->s_umount_key#37/1); [ 237.169952] lock(rdtgroup_mutex); [ 237.176641] lock(&type->s_umount_key#37/1); [ 237.184287] lock(cpu_hotplug_lock.rw_sem); [ 237.189041] *** DEADLOCK *** When the resctrl filesystem is mounted the locks must be acquired in the same order as was done when the cpus came online: cpu_hotplug_lock before rdtgroup_mutex. This also requires to switch the static_branch_enable() calls to the _cpulocked variant because now cpu hotplug lock is held already. [ tglx: Switched to cpus_read_[un]lock ] Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Link: https://lkml.kernel.org/r/9c41b91bc2f47d9e95b62b213ecdb45623c47a9f.1508490116.git.reinette.chatre@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/intel_rdt: Fix potential deadlock during resctrl unmountReinette Chatre2017-10-211-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep warns about a potential deadlock: [ 66.782842] ====================================================== [ 66.782888] WARNING: possible circular locking dependency detected [ 66.782937] 4.14.0-rc2-test-test+ #48 Not tainted [ 66.782983] ------------------------------------------------------ [ 66.783052] umount/336 is trying to acquire lock: [ 66.783117] (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff81032395>] rdt_kill_sb+0x215/0x390 [ 66.783193] but task is already holding lock: [ 66.783244] (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390 [ 66.783305] which lock already depends on the new lock. [ 66.783364] the existing dependency chain (in reverse order) is: [ 66.783419] -> #3 (rdtgroup_mutex){+.+.}: [ 66.783467] __lock_acquire+0x1293/0x13f0 [ 66.783509] lock_acquire+0xaf/0x220 [ 66.783543] __mutex_lock+0x71/0x9b0 [ 66.783575] mutex_lock_nested+0x1b/0x20 [ 66.783610] intel_rdt_online_cpu+0x3b/0x430 [ 66.783649] cpuhp_invoke_callback+0xab/0x8e0 [ 66.783687] cpuhp_thread_fun+0x7a/0x150 [ 66.783722] smpboot_thread_fn+0x1cc/0x270 [ 66.783764] kthread+0x16e/0x190 [ 66.783794] ret_from_fork+0x27/0x40 [ 66.783825] -> #2 (cpuhp_state){+.+.}: [ 66.783870] __lock_acquire+0x1293/0x13f0 [ 66.783906] lock_acquire+0xaf/0x220 [ 66.783938] cpuhp_issue_call+0x102/0x170 [ 66.783974] __cpuhp_setup_state_cpuslocked+0x154/0x2a0 [ 66.784023] __cpuhp_setup_state+0xc7/0x170 [ 66.784061] page_writeback_init+0x43/0x67 [ 66.784097] pagecache_init+0x43/0x4a [ 66.784131] start_kernel+0x3ad/0x3f7 [ 66.784165] x86_64_start_reservations+0x2a/0x2c [ 66.784204] x86_64_start_kernel+0x72/0x75 [ 66.784241] verify_cpu+0x0/0xfb [ 66.784270] -> #1 (cpuhp_state_mutex){+.+.}: [ 66.784319] __lock_acquire+0x1293/0x13f0 [ 66.784355] lock_acquire+0xaf/0x220 [ 66.784387] __mutex_lock+0x71/0x9b0 [ 66.784419] mutex_lock_nested+0x1b/0x20 [ 66.784454] __cpuhp_setup_state_cpuslocked+0x52/0x2a0 [ 66.784497] __cpuhp_setup_state+0xc7/0x170 [ 66.784535] page_alloc_init+0x28/0x30 [ 66.784569] start_kernel+0x148/0x3f7 [ 66.784602] x86_64_start_reservations+0x2a/0x2c [ 66.784642] x86_64_start_kernel+0x72/0x75 [ 66.784678] verify_cpu+0x0/0xfb [ 66.784707] -> #0 (cpu_hotplug_lock.rw_sem){++++}: [ 66.784759] check_prev_add+0x32f/0x6e0 [ 66.784794] __lock_acquire+0x1293/0x13f0 [ 66.784830] lock_acquire+0xaf/0x220 [ 66.784863] cpus_read_lock+0x3d/0xb0 [ 66.784896] rdt_kill_sb+0x215/0x390 [ 66.784930] deactivate_locked_super+0x3e/0x70 [ 66.784968] deactivate_super+0x40/0x60 [ 66.785003] cleanup_mnt+0x3f/0x80 [ 66.785034] __cleanup_mnt+0x12/0x20 [ 66.785070] task_work_run+0x8b/0xc0 [ 66.785103] exit_to_usermode_loop+0x94/0xa0 [ 66.786804] syscall_return_slowpath+0xe8/0x150 [ 66.788502] entry_SYSCALL_64_fastpath+0xab/0xad [ 66.790194] other info that might help us debug this: [ 66.795139] Chain exists of: cpu_hotplug_lock.rw_sem --> cpuhp_state --> rdtgroup_mutex [ 66.800035] Possible unsafe locking scenario: [ 66.803267] CPU0 CPU1 [ 66.804867] ---- ---- [ 66.806443] lock(rdtgroup_mutex); [ 66.808002] lock(cpuhp_state); [ 66.809565] lock(rdtgroup_mutex); [ 66.811110] lock(cpu_hotplug_lock.rw_sem); [ 66.812608] *** DEADLOCK *** [ 66.816983] 2 locks held by umount/336: [ 66.818418] #0: (&type->s_umount_key#35){+.+.}, at: [<ffffffff81229738>] deactivate_super+0x38/0x60 [ 66.819922] #1: (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390 When the resctrl filesystem is unmounted the locks should be obtain in the locks in the same order as was done when the cpus came online: cpu_hotplug_lock before rdtgroup_mutex. This also requires to switch the static_branch_disable() calls to the _cpulocked variant because now cpu hotplug lock is held already. [ tglx: Switched to cpus_read_[un]lock ] Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/cc292e76be073f7260604651711c47b09fd0dc81.1508490116.git.reinette.chatre@intel.com
| * | x86/intel_rdt: Initialize bitmask of shareable resource if CDP enabledReinette Chatre2017-10-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The platform informs via CPUID.(EAX=0x10, ECX=res#):EBX[31:0] (valid res# are only 1 for L3 and 2 for L2) which unit of the allocation may be used by other entities in the platform. This information is valid whether CDP (Code and Data Prioritization) is enabled or not. Ensure that the bitmask of shareable resource is initialized when CDP is enabled. Fixes: 0dd2d7494cd8 ("x86/intel_rdt: Show bitmask of shareable resource with other executing units" Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/815747bddc820ca221a8924edaf4d1a7324547e4.1508490116.git.reinette.chatre@intel.com
| * | x86/intel_rdt: Remove redundant assignmentJithu Joseph2017-10-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assignment to the 'files' variable is immediately overwritten in the following line. Remove the older assignment, which was meant specifially for creating control groups files. Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Reported-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Cc: tony.luck@intel.com Cc: vikas.shivappa@intel.com Link: https://lkml.kernel.org/r/1507157337-18118-1-git-send-email-jithu.joseph@intel.com
| * | x86/intel_rdt/cqm: Make integer rmid_limbo_count staticColin Ian King2017-10-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rmid_limbo_count is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol 'rmid_limbo_count' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20171002145931.27479-1-colin.king@canonical.com
| * | x86/intel_rdt: Add documentation for "info/last_cmd_status"Tony Luck2017-09-271-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New file in the "info" directory helps diagnose what went wrong when using the /sys/fs/resctrl file system Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/387e78e444582403c2454479e576caf5721a363f.1506382469.git.tony.luck@intel.com
| * | x86/intel_rdt: Add diagnostics when making directoriesTony Luck2017-09-271-6/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mostly this is about running out of RMIDs or CLOSIDs. Other errors are various internal errors. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/027cf1ffb3a3695f2d54525813a1d644887353cf.1506382469.git.tony.luck@intel.com
| * | x86/intel_rdt: Add diagnostics when writing the cpus fileTony Luck2017-09-271-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Can't add a cpu to a monitor group unless it belongs to parent group. Can't delete cpus from the default group. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/757a869a25e9fc1b7a2e9bc43e1159455c1964a0.1506382469.git.tony.luck@intel.com
| * | x86/intel_rdt: Add diagnostics when writing the tasks fileTony Luck2017-09-272-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | About the only tricky case is trying to move a task into a monitor group that is a subdirectory of a different control group. But cover the simple cases too. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/f1841cce6a242aed37cb926dee8942727331bf78.1506382469.git.tony.luck@intel.com
| * | x86/intel_rdt: Add diagnostics when writing the schemata fileTony Luck2017-09-271-11/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Save helpful descriptions of what went wrong when writing a schemata file. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/9d6cef757dc88639c8ab47f1e7bc1b081a84bb88.1506382469.git.tony.luck@intel.com
| * | x86/intel_rdt: Add framework for better RDT UI diagnosticsTony Luck2017-09-272-0/+63
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commands are given to the resctrl file system by making/removing directories, or by writing to files. When something goes wrong the user is generally left wondering why they got: bash: echo: write error: Invalid argument Add a new file "last_cmd_status" to the "info" directory that will give the user some better clues on what went wrong. Provide functions to clear and update last_cmd_status which check that we hold the rdtgroup_mutex. [ tglx: Made last_cmd_status static and folded back the hunk from patch 3 which replaces the open coded access to last_cmd_status with the accessor function ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/edc4e0e9741eee89bba569f0021b1b2662fd9508.1506382469.git.tony.luck@intel.com