summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/perf
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'powerpc-4.4-1' of ↵Linus Torvalds2015-11-051-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Kconfig: remove BE-only platforms from LE kernel build from Boqun Feng - Refresh ps3_defconfig from Geoff Levand - Emit GNU & SysV hashes for the vdso from Michael Ellerman - Define an enum for the bolted SLB indexes from Anshuman Khandual - Use a local to avoid multiple calls to get_slb_shadow() from Michael Ellerman - Add gettimeofday() benchmark from Michael Neuling - Avoid link stack corruption in __get_datapage() from Michael Neuling - Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar K.V - Add ppc64le_defconfig from Michael Ellerman - pseries: extract of_helpers module from Andy Shevchenko - Correct string length in pseries_of_derive_parent() from Nathan Fontenot - Free the MSI bitmap if it was slab allocated from Denis Kirjanov - Shorten irq_chip name for the SIU from Christophe Leroy - Wait 1s for secondaries to enter OPAL during kexec from Samuel Mendoza-Jonas - Fix _ALIGN_* errors due to type difference, from Aneesh Kumar K.V - powerpc/pseries/hvcserver: don't memset pi_buff if it is null from Colin Ian King - Disable hugepd for 64K page size, from Aneesh Kumar K.V - Differentiate between hugetlb and THP during page walk from Aneesh Kumar K.V - Make PCI non-optional for pseries from Michael Ellerman - Individual System V IPC system calls from Sam bobroff - Add selftest of unmuxed IPC calls from Michael Ellerman - discard .exit.data at runtime from Stephen Rothwell - Delete old orphaned PrPMC 280/2800 DTS and boot file, from Paul Gortmaker - Use of_get_next_parent to simplify code from Christophe Jaillet - Paginate some xmon output from Sam bobroff - Add some more elements to the xmon PACA dump from Michael Ellerman - Allow the tm-syscall selftest to build with old headers from Michael Ellerman - Run EBB selftests only on POWER8 from Denis Kirjanov - Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael Ellerman - Avoid reference to potentially freed memory in prom.c from Christophe Jaillet - Quieten boot wrapper output with run_cmd from Geoff Levand - EEH fixes and cleanups from Gavin Shan - Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan - Use of_get_next_parent() in of_get_ibm_chip_id() from Michael Ellerman - Fix section mismatch warning in msi_bitmap_alloc() from Denis Kirjanov - Fix ps3-lpm white space from Rudhresh Kumar J - Fix ps3-vuart null dereference from Colin King - nvram: Add missing kfree in error path from Christophe Jaillet - nvram: Fix function name in some errors messages, from Christophe Jaillet - drivers/macintosh: adb: fix misleading Kconfig help text from Aaro Koskinen - agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov - cxl: Free virtual PHB when removing from Andrew Donnellan - scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from Michael Ellerman - scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building with O= from Michael Ellerman - Freescale updates from Scott: Highlights include 64-bit book3e kexec/kdump support, a rework of the qoriq clock driver, device tree changes including qoriq fman nodes, support for a new 85xx board, and some fixes. - MPC5xxx updates from Anatolij: Highlights include a driver for MPC512x LocalPlus Bus FIFO with its device tree binding documentation, mpc512x device tree updates and some minor fixes. * tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (106 commits) powerpc/msi: Fix section mismatch warning in msi_bitmap_alloc() powerpc/prom: Use of_get_next_parent() in of_get_ibm_chip_id() powerpc/pseries: Correct string length in pseries_of_derive_parent() powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s) powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan powerpc/fsl: Add #clock-cells and clockgen label to clockgen nodes powerpc: handle error case in cpm_muram_alloc() powerpc: mpic: use IRQCHIP_SKIP_SET_WAKE instead of redundant mpic_irq_set_wake powerpc/book3e-64: Enable kexec powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32 powerpc/book3e-64/kexec: Enable SMP release powerpc/book3e-64/kexec: create an identity TLB mapping powerpc/book3e-64: Don't limit paca to 256 MiB powerpc/book3e/kdump: Enable crash_kexec_wait_realmode powerpc/book3e: support CONFIG_RELOCATABLE powerpc/booke64: Fix args to copy_and_flush powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts powerpc/e6500: kexec: Handle hardware threads ...
| * powerpc/mm: Differentiate between hugetlb and THP during page walkAneesh Kumar K.V2015-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to properly identify whether a hugepage is an explicit or a transparent hugepage in follow_huge_addr(). We used to depend on hugepage shift argument to do that. But in some case that can result in wrong results. For ex: On finding a transparent hugepage we set hugepage shift to PMD_SHIFT. But we can end up clearing the thp pte, via pmdp_huge_get_and_clear. We do prevent reusing the pfn page via the usage of kick_all_cpus_sync(). But that happens after we updated the pte to 0. Hence in follow_huge_addr() we can find hugepage shift set, but transparent huge page check fail for a thp pte. NOTE: We fixed a variant of this race against thp split in commit 691e95fd7396905a38d98919e9c150dbc3ea21a3 ("powerpc/mm/thp: Make page table walk safe against thp split/collapse") Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in follow_page_mask occasionally. In the long term, we may want to switch ppc64 64k page size config to enable CONFIG_ARCH_WANT_GENERAL_HUGETLB Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | perf/powerpc: Add support for PERF_SAMPLE_BRANCH_CALLStephane Eranian2015-10-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch catches PERF_SAMPLE_BRANCH_CALL because it is not clear whether this is actually supported by the hardware. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: khandual@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1444720151-10275-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | perf/core: Drop PERF_EVENT_TXNSukadev Bhattiprolu2015-09-131-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use PERF_EVENT_TXN flag to determine if we are in the middle of a transaction. If in a transaction, we defer the schedulability checks from pmu->add() operation to the pmu->commit() operation. Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ) we can use the type to determine if we are in a transaction and drop the PERF_EVENT_TXN flag. When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures becomes unused, so drop that field as well. This is an extension of the Powerpc patch from Peter Zijlstra to s390, Sparc and x86 architectures. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | powerpc, perf/powerpc/hv-24x7: Use PMU_TXN_READ interfaceSukadev Bhattiprolu2015-09-131-2/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 24x7 counters in Powerpc allow monitoring a large number of counters simultaneously. They also allow reading several counters in a single HCALL so we can get a more consistent snapshot of the system. Use the PMU's transaction interface to monitor and read several event counters at once. The idea is that users can group several 24x7 events into a single group of events. We use the following logic to submit the group of events to the PMU and read the values: pmu->start_txn() // Initialize before first event for each event in group pmu->read(event); // Queue each event to be read pmu->commit_txn() // Read/update all queuedcounters The ->commit_txn() also updates the event counts in the respective perf_event objects. The perf subsystem can then directly get the event counts from the perf_event and can avoid submitting a new ->read() request to the PMU. Thanks to input from Peter Zijlstra. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-10-git-send-email-sukadev@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | perf/core: Add a 'flags' parameter to the PMU transactional interfacesSukadev Bhattiprolu2015-09-131-1/+29
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the PMU interface allows reading only one counter at a time. But some PMUs like the 24x7 counters in Power, support reading several counters at once. To leveage this functionality, extend the transaction interface to support a "transaction type". The first type, PERF_PMU_TXN_ADD, refers to the existing transactions, i.e. used to _schedule_ all the events on the PMU as a group. A second transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch, by the 24x7 counters to read several counters at once. Extend the transaction interfaces to the PMU to accept a 'txn_flags' parameter and use this parameter to ignore any transactions that are not of type PERF_PMU_TXN_ADD. Thanks to Peter Zijlstra for his input. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [peterz: s390 compile fix] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* powerpc/perf: Change type of the bhrb_users variableAnshuman Khandual2015-07-271-2/+2
| | | | | | | | | This patch just changes data type of bhrb_users variable from int to unsigned int because it never contains a negative value. Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/perf/hv-24x7: Simplify extracting counter from result bufferSukadev Bhattiprolu2015-07-251-3/+1
| | | | | | | | Simplify code that extracts a 24x7 counter from the HCALL's result buffer. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/perf/hv-24x7: Whitespace - fix parameter alignmentSukadev Bhattiprolu2015-07-251-10/+10
| | | | | | | Fix parameter alignment to be consistent with coding style. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/perf/24x7: Fix lockdep warningSukadev Bhattiprolu2015-07-081-0/+2
| | | | | | | | | | | | | | | | | | | The sysfs attributes for the 24x7 counters are dynamically allocated. Initialize the attributes using sysfs_attr_init() to fix following warning which occurs when CONFIG_DEBUG_LOCK_VMALLOC=y. [ 0.346249] audit: initializing netlink subsys (disabled) [ 0.346284] audit: type=2000 audit(1436295254.340:1): initialized [ 0.346489] BUG: key c0000000efe90198 not in .data! [ 0.346491] DEBUG_LOCKS_WARN_ON(1) [ 0.346502] ------------[ cut here ]------------ [ 0.346504] WARNING: at ../kernel/locking/lockdep.c:3002 [ 0.346506] Modules linked in: Reported-by: Gustavo Luiz Duarte <gustavold@linux.vnet.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Tested-by: Gustavo Luiz Duarte <gustavold@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/perf: Fix book3s kernel to userspace backtracesAnton Blanchard2015-06-021-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we take a PMU exception or a software event we call perf_read_regs(). This overloads regs->result with a boolean that describes if we should use the sampled instruction address register (SIAR) or the regs. If the exception is in kernel, we start with the kernel regs and backtrace through the kernel stack. At this point we switch to the userspace regs and backtrace the user stack with perf_callchain_user(). Unfortunately these regs have not got the perf_read_regs() treatment, so regs->result could be anything. If it is non zero, perf_instruction_pointer() decides to use the SIAR, and we get issues like this: 0.11% qemu-system-ppc [kernel.kallsyms] [k] _raw_spin_lock_irqsave | ---_raw_spin_lock_irqsave | |--52.35%-- 0 | | | |--46.39%-- __hrtimer_start_range_ns | | kvmppc_run_core | | kvmppc_vcpu_run_hv | | kvmppc_vcpu_run | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call | | | | | |--67.08%-- _raw_spin_lock_irqsave <--- hi mum | | | | | | | --100.00%-- 0x7e714 | | | 0x7e714 Notice the bogus _raw_spin_irqsave when we transition from kernel (system_call) to userspace (0x7e714). We inserted what was in the SIAR. Add a check in regs_use_siar() to check that the regs in question are from a PMU exception. With this fix the backtrace makes sense: 0.47% qemu-system-ppc [kernel.vmlinux] [k] _raw_spin_lock_irqsave | ---_raw_spin_lock_irqsave | |--53.83%-- 0 | | | |--44.73%-- hrtimer_try_to_cancel | | kvmppc_start_thread | | kvmppc_run_core | | kvmppc_vcpu_run_hv | | kvmppc_vcpu_run | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call | | __ioctl | | 0x7e714 | | 0x7e714 Cc: stable@vger.kernel.org Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm/thp: Make page table walk safe against thp split/collapseAneesh Kumar K.V2015-04-171-10/+14
| | | | | | | | | | | | | | | | | | | | We can disable a THP split or a hugepage collapse by disabling irq. We do send IPI to all the cpus in the early part of split/collapse, and disabling local irq ensure we don't make progress with split/collapse. If the THP is getting split we return NULL from find_linux_pte_or_hugepte(). For all the current callers it should be ok. We need to be careful if we want to use returned pte_t pointer outside the irq disabled region. W.r.t to THP split, the pfn remains the same, but then a hugepage collapse will result in a pfn change. There are few steps we can take to avoid a hugepage collapse.One way is to take page reference inside the irq disable region. Other option is to take mmap_sem so that a parallel collapse will not happen. We can also disable collapse by taking pmd_lock. Another method used by kvm subsystem is to check whether we had a mmu_notifer update in between using mmu_notifier_retry(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge tag 'powerpc-4.1-1' of ↵Linus Torvalds2015-04-164-93/+172
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - Numerous minor fixes, cleanups etc. - More EEH work from Gavin to remove its dependency on device_nodes. - Memory hotplug implemented entirely in the kernel from Nathan Fontenot. - Removal of redundant CONFIG_PPC_OF by Kevin Hao. - Rewrite of VPHN parsing logic & tests from Greg Kurz. - A fix from Nish Aravamudan to reduce memory usage by clamping nodes_possible_map. - Support for pstore on powernv from Hari Bathini. - Removal of old powerpc specific byte swap routines by David Gibson. - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing your firmware when it wasn't. - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver. - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek. - Some fixes for migration from Tyrel Datwyler. - A new syscall to switch the cpu endian by Michael Ellerman. - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn. - A fix for the OPAL sensor driver from Cédric Le Goater. - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman. - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per machine. - Small patch from Sam Bobroff to explicitly abort non-suspended transactions on syscalls, plus a test to exercise it. - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu. - Small patch to enable the hard lockup detector from Anton Blanchard. - Fix from Dave Olson for missing L2 cache information on some CPUs. - Some fixes from Michael Ellerman to get Cell machines booting again. - Freescale updates from Scott: Highlights include BMan device tree nodes, an MSI erratum workaround, a couple minor performance improvements, config updates, and misc fixes/cleanup. * tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits) powerpc/powermac: Fix build error seen with powermac smp builds powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE powerpc: Remove PPC32 code from pseries specific find_and_init_phbs() powerpc/cell: Fix iommu breakage caused by controller_ops change powerpc/eeh: Fix crash in eeh_add_device_early() on Cell powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails powerpc/pseries: Correct memory hotplug locking powerpc: Fix missing L2 cache size in /sys/devices/system/cpu powerpc: Add ppc64 hard lockup detector support oprofile: Disable oprofile NMI timer on ppc64 powerpc/perf/hv-24x7: Add missing put_cpu_var() powerpc/perf/hv-24x7: Break up single_24x7_request powerpc/perf/hv-24x7: Define update_event_count() powerpc/perf/hv-24x7: Whitespace cleanup powerpc/perf/hv-24x7: Define add_event_to_24x7_request() powerpc/perf/hv-24x7: Rename hv_24x7_event_update powerpc/perf/hv-24x7: Move debug prints to separate function powerpc/perf/hv-24x7: Drop event_24x7_request() powerpc/perf/hv-24x7: Use pr_devel() to log message ... Conflicts: tools/testing/selftests/powerpc/Makefile tools/testing/selftests/powerpc/tm/Makefile
| * powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTHAnton Blanchard2015-04-141-1/+1
| | | | | | | | | | | | | | | | | | We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH (currently 127), but we forgot to do the same for 64bit backtraces. Cc: stable@vger.kernel.org Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() failsLi Zhong2015-04-141-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Michael pointed out, create_events_from_catalog() fails when we either have: - a kernel bug - some sort of hypervisor misconfiguration - ENOMEM In all the above cases, we can also fail 24x7 initcall. For hypervisor errors, EIO is used so there is something reported in dmesg. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Add missing put_cpu_var()Sukadev Bhattiprolu2015-04-111-1/+3
| | | | | | | | | | | | | | | | Add missing put_cpu_var() for 24x7 requests. This went missing in commit f34b6c7 (3.18-rc3). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Break up single_24x7_requestSukadev Bhattiprolu2015-04-111-14/+42
| | | | | | | | | | | | | | | | | | Break up the function single_24x7_request() into smaller functions. This would later enable us to "prepare" a multi-event request buffer and then submit a single hcall for several events. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Define update_event_count()Sukadev Bhattiprolu2015-04-111-3/+9
| | | | | | | | | | | | | | | | Move the code to update an event count into a new function, update_event_count(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Whitespace cleanupSukadev Bhattiprolu2015-04-111-8/+10
| | | | | | | | | | | | | | Fix minor whitespace damages. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Define add_event_to_24x7_request()Sukadev Bhattiprolu2015-04-111-17/+42
| | | | | | | | | | | | | | | | Move code that maps a perf_event to a 24x7 request buffer into a separate function, add_event_to_24x7_request(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Rename hv_24x7_event_updateSukadev Bhattiprolu2015-04-111-3/+3
| | | | | | | | | | | | | | | | For consistency with the pmu operation ->read() and with other pmus, rename hv_24x7_event_update() to hv_24x7_event_read(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Move debug prints to separate functionSukadev Bhattiprolu2015-04-111-6/+16
| | | | | | | | | | | | | | | | To simplify/cleanup code, move the rather long printk() to a separate function. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Drop event_24x7_request()Sukadev Bhattiprolu2015-04-111-25/+15
| | | | | | | | | | | | | | | | The function event_24x7_request() is essentially a wrapper to the function single_24x7_request() and can be dropped to simplify code. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Use pr_devel() to log messageSukadev Bhattiprolu2015-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | Use pr_devel_ratelimited() to log error message when the 24x7 HCALL fails. Since users specify events by their sysfs name, the HCALL should succeed. Any errors reported by the HCALL would be of interest to the developer, rather than the user/administrator. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Remove unnecessary parameterSukadev Bhattiprolu2015-04-111-10/+6
| | | | | | | | | | | | | | Remove the 'success_expected' parameter and log the message unconditionally. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf/hv-24x7: Modify definition of request and result buffersSukadev Bhattiprolu2015-04-112-44/+41
| | | | | | | | | | | | | | | | | | | | | | | | The parameters to the 24x7 HCALL have variable number of elements in them. Set the minimum number of such elements to 1 rather than 0 and eliminate the temporary structures. This would enable us to submit multiple counter requests and process multiple results from a single HCALL (in a follow on patch). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf: add missing put_cpu_var in power_pmu_event_initJan Stancek2015-03-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One path in power_pmu_event_init() calls get_cpu_var(), but is missing matching call to put_cpu_var(), which causes preemption imbalance and crash in user-space: Page fault in user mode with in_atomic() = 1 mm = c000001fefa5a280 NIP = 3fff9bf2cae0 MSR = 900000014280f032 Oops: Weird page fault, sig: 11 [#23] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: <snip> CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 task: c000001fe82c9200 ti: c000001fe835c000 task.ti: c000001fe835c000 NIP: 00003fff9bf2cae0 LR: 00003fff9bee4898 CTR: 00003fff9bf2cae0 REGS: c000001fe835fea0 TRAP: 0401 Tainted: G D (4.0.0-rc5+) MSR: 900000014280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 22000028 XER: 00000000 CFAR: 00003fff9bee4894 SOFTE: 1 GPR00: 00003fff9bee494c 00003fffe01c2ee0 00003fff9c084410 0000000010020068 GPR04: 0000000000000000 0000000000000002 0000000000000008 0000000000000001 GPR08: 0000000000000001 00003fff9c074a30 00003fff9bf2cae0 00003fff9bf2cd70 GPR12: 0000000052000022 00003fff9c10b700 NIP [00003fff9bf2cae0] 0x3fff9bf2cae0 LR [00003fff9bee4898] 0x3fff9bee4898 Call Trace: ---[ end trace 5d3d952b5d4185d4 ]--- BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 in_atomic(): 1, irqs_disabled(): 0, pid: 10285, name: a.out INFO: lockdep is turned off. CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 Call Trace: [c000001fe835f990] [c00000000089c014] .dump_stack+0x98/0xd4 (unreliable) [c000001fe835fa10] [c0000000000e4138] .___might_sleep+0x1d8/0x2e0 [c000001fe835faa0] [c000000000888da8] .down_read+0x38/0x110 [c000001fe835fb30] [c0000000000bf2f4] .exit_signals+0x24/0x160 [c000001fe835fbc0] [c0000000000abde0] .do_exit+0xd0/0xe70 [c000001fe835fcb0] [c00000000001f4c4] .die+0x304/0x450 [c000001fe835fd60] [c00000000088e1f4] .do_page_fault+0x2d4/0x900 [c000001fe835fe30] [c000000000008664] handle_page_fault+0x10/0x30 note: a.out[10285] exited with preempt_count 1 Reproducer: #include <stdio.h> #include <unistd.h> #include <syscall.h> #include <sys/types.h> #include <sys/stat.h> #include <linux/perf_event.h> #include <linux/hw_breakpoint.h> static struct perf_event_attr event = { .type = PERF_TYPE_RAW, .size = sizeof(struct perf_event_attr), .sample_type = PERF_SAMPLE_BRANCH_STACK, .branch_sample_type = PERF_SAMPLE_BRANCH_ANY_RETURN, }; int main() { syscall(__NR_perf_event_open, &event, 0, -1, -1, 0); } Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2015-04-141-4/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
| * | Merge tag 'v4.0-rc1' into perf/core, to refresh the treeIngo Molnar2015-02-2616-49/+1360
| |\| | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | perf, powerpc: Fix up flush_branch_stack() usersPeter Zijlstra2015-02-181-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent LBR rework for x86 left a stray flush_branch_stack() user in the PowerPC code, fix that up. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Tejun Heo <tj@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2015-04-141-1/+1
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Usual trivial tree updates. Nothing outstanding -- mostly printk() and comment fixes and unused identifier removals" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: goldfish: goldfish_tty_probe() is not using 'i' any more powerpc: Fix comment in smu.h qla2xxx: Fix printks in ql_log message lib: correct link to the original source for div64_u64 si2168, tda10071, m88ds3103: Fix firmware wording usb: storage: Fix printk in isd200_log_config() qla2xxx: Fix printk in qla25xx_setup_mode init/main: fix reset_device comment ipwireless: missing assignment goldfish: remove unreachable line of code coredump: Fix do_coredump() comment stacktrace.h: remove duplicate declaration task_struct smpboot.h: Remove unused function prototype treewide: Fix typo in printk messages treewide: Fix typo in printk messages mod_devicetable: fix comment for match_flags
| * | treewide: Fix typo in printk messagesMasanari Iida2015-03-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This patch fix spelling typo in printk messages. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | Merge branch 'next' of ↵Michael Ellerman2015-02-041-2/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Highlights include 8xx optimizations, some more work on datapath device tree content, e300 machine check support, t1040 corenet error reporting, and various cleanups and fixes."
| * | | perf/powerpc: reset event hw state when adding it to the PMUAlexandru-Cezar Sardan2015-01-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When adding an event to the PMU with PERF_EF_START the STOPPED and UPTODATE flags need to be cleared in the hw.event status variable because they are preventing the update of the event count on overflow interrupt. Signed-off-by: Alexandru-Cezar Sardan <alexandru.sardan@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
| * | | powerpc/perf: fix fsl_emb_pmu_start to write correct pmc valueTom Huynh2015-01-291-1/+5
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PMCs on PowerPC increases towards 0x80000000 and triggers an overflow interrupt when the msb is set to collect a sample. Therefore, to setup for the next sample collection, pmu_start should set the pmc value to 0x80000000 - left instead of left which incorrectly delays the next overflow interrupt. Same as commit 9a45a9407c69 ("powerpc/perf: power_pmu_start restores incorrect values, breaking frequency events") for book3s. Signed-off-by: Tom Huynh <tom.huynh@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
* | | powerpc/perf/hv-gpci: add the remaining gpci requestsCody P Schafer2015-02-021-1/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the remaining gpci requests that contain counters suitable for use by perf. Omit those that don't contain any counters (but note their ommision). Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotatedCody P Schafer2015-02-0210-30/+316
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds (in req-gen/) a framework for defining gpci counter requests. It uses macro magic similar to ftrace. Also convert the existing hv-gpci request structures and enum values to use the new framework (and adjust old users of the structs and enum values to cope with changes in naming). In exchange for this macro disaster, we get autogenerated event listing for GPCI in sysfs, build time field offset checking, and zero duplication of information about GPCI requests. Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | powerpc/perf/hv-24x7: parse catalog and populate sysfs with eventsCody P Schafer2015-02-024-17/+841
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Retrieves and parses the 24x7 catalog on POWER systems that supply it (right now, only POWER 8). Events are exposed via sysfs in the standard fashion, and are all parameterized. $ cd /sys/bus/event_source/devices/hv_24x7/events $ cat HPM_CS_FROM_L4_LDATA__PHYS_CORE domain=0x2,offset=0xd58,core=?,lpar=0x0 $ cat HPM_TLBIE__VCPU_HOME_CHIP domain=0x4,offset=0x358,vcpu=?,lpar=? where user is required to specify values for the fields with '?' (like core, vcpu, lpar above), when specifying the event with the perf tool. Catalog is (at the moment) only parsed on boot. It needs re-parsing when a some hypervisor events occur. At that point we'll also need to prevent old events from continuing to function (counter that is passed in via spare space in the config values?). Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helpersukadev@linux.vnet.ibm.com2015-02-021-0/+10
|/ / | | | | | | | | | | | | | | Define a lite version of the EVENT_DEFINE_RANGE_FORMAT() that avoids defining helper functions for the bit-field ranges. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | power/perf/hv-24x7: Use kmem_cache_free() instead of kfreeSukadev Bhattiprolu2014-12-121-1/+1
| | | | | | | | | | | | | | Use kmem_cache_free() to free a buffer allocated with kmem_cache_alloc(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/perf/hv-24x7: Use per-cpu page buffersukadev@linux.vnet.ibm.com2014-12-121-12/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 24x7 counters are continuously running and not updated on an interrupt. So we record the event counts when stopping the event or deleting it. But to "read" a single counter in 24x7, we allocate a page and pass it into the hypervisor (The HV returns the page full of counters from which we extract the specific counter for this event). We allocate a page using GFP_USER and when deleting the event, we end up with the following warning because we are blocking in interrupt context. [ 698.641709] BUG: scheduling while atomic: swapper/0/0/0x10010000 We could use GFP_ATOMIC but that could result in failures. Pre-allocate a buffer so we don't have to allocate in interrupt context. Further as Michael Ellerman suggested, use Per-CPU buffer so we only need to allocate once per CPU. Cc: stable@vger.kernel.org Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: Replace __get_cpu_var usesChristoph Lameter2014-11-032-14/+14
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This still has not been merged and now powerpc is the only arch that does not have this change. Sorry about missing linuxppc-dev before. V2->V2 - Fix up to work against 3.18-rc1 __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> [mpe: Fix build errors caused by set/or_softirq_pending(), and rework assignment in __set_breakpoint() to use memcpy().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* perf: Fix and clean up initialization of pmu::event_idxPeter Zijlstra2014-10-282-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andy reported that the current state of event_idx is rather confused. So remove all but the x86_pmu implementation and change the default to return 0 (the safe option). Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Cody P Schafer <dev@codyps.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Himangi Saraogi <himangi774@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com> Cc: Thomas Huth <thuth@linux.vnet.ibm.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux390@de.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* powerpc/perf/hv-24x7: Simplify catalog_read()sukadev@linux.vnet.ibm.com2014-10-071-87/+14
| | | | | | | | | | | | | | | | | catalog_read() implements the read interface for the sysfs file /sys/bus/event_source/devices/hv_24x7/interface/catalog It essentially takes a buffer, an offset and count as parameters to the read() call. It makes a hypervisor call to read a specific page from the catalog and copy the required bytes into the given buffer. Each call to catalog_read() returns at most one 4K page. Given these requirements, we should be able to simplify the catalog_read(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocationsCody P Schafer2014-10-071-18/+37
| | | | | | | | | | | | | Ian pointed out the use of __aligned(4096) caused rather large stack consumption in single_24x7_request(), so use the kmem_cache hv_page_cache (which we've already got set up for other allocations) insead of allocating locally. CC: Haren Myneni <hbabu@us.ibm.com> Reported-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Cody P Schafer <dev@codyps.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Make a bunch of things staticAnton Blanchard2014-09-251-9/+9
| | | | | Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/perf: Fix ABIv2 kernel backtracesAnton Blanchard2014-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ABIv2 kernels are failing to backtrace through the kernel. An example: 39.30% readseek2_proce [kernel.kallsyms] [k] find_get_entry | --- find_get_entry __GI___libc_read The problem is in valid_next_sp() where we check that the new stack pointer is at least STACK_FRAME_OVERHEAD below the previous one. ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes with no paramter save area. STACK_FRAME_OVERHEAD is in theory the minimum stack frame size, but we over 240 uses of it, some of which assume that it includes space for the parameter area. We need to work through all our stack defines and rationalise them but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using in valid_next_sp(). This fixes the issue: 30.64% readseek2_proce [kernel.kallsyms] [k] find_get_entry | --- find_get_entry pagecache_get_page generic_file_read_iter new_sync_read vfs_read sys_read syscall_exit __GI___libc_read Cc: stable@vger.kernel.org # 3.16+ Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org>
* powerpc/perf/hv-24x7: Use kmem_cache_freeHimangi Saraogi2014-08-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Free memory allocated using kmem_cache_zalloc using kmem_cache_free rather than kfree. The Coccinelle semantic patch that makes this change is as follows: // <smpl> @@ expression x,E,c; @@ x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...) ... when != x = E when != &x ?-kfree(x) +kmem_cache_free(c,x) // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge branch 'next' of ↵Linus Torvalds2014-08-079-35/+82
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc new goodies for 3.17. The short story: The biggest bit is Michael removing all of pre-POWER4 processor support from the 64-bit kernel. POWER3 and rs64. This gets rid of a ton of old cruft that has been bitrotting in a long while. It was broken for quite a few versions already and nobody noticed. Nobody uses those machines anymore. While at it, he cleaned up a bunch of old dusty cabinets, getting rid of a skeletton or two. Then, we have some base VFIO support for KVM, which allows assigning of PCI devices to KVM guests, support for large 64-bit BARs on "powernv" platforms, support for HMI (Hardware Management Interrupts) on those same platforms, some sparse-vmemmap improvements (for memory hotplug), There is the usual batch of Freescale embedded updates (summary in the merge commit) and fixes here or there, I think that's it for the highlights" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits) powerpc/eeh: Export eeh_iommu_group_to_pe() powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API powerpc: Reduce scariness of interrupt frames in stack traces powerpc: start loop at section start of start in vmemmap_populated() powerpc: implement vmemmap_free() powerpc: implement vmemmap_remove_mapping() for BOOK3S powerpc: implement vmemmap_list_free() powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE powerpc/book3s: Fix endianess issue for HMI handling on napping cpus. powerpc/book3s: handle HMIs for cpus in nap mode. powerpc/powernv: Invoke opal call to handle hmi. powerpc/book3s: Add basic infrastructure to handle HMI in Linux. powerpc/iommu: Fix comments with it_page_shift powerpc/powernv: Handle compound PE in config accessors powerpc/powernv: Handle compound PE for EEH powerpc/powernv: Handle compound PE powerpc/powernv: Split ioda_eeh_get_state() powerpc/powernv: Allow to freeze PE powerpc/powernv: Enable M64 aperatus for PHB3 powerpc/eeh: Aux PE data for error log ...
| * powerpc/perf: Add per-event excludes on Power8Michael Ellerman2014-07-282-24/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Power8 has a new register (MMCR2), which contains individual freeze bits for each counter. This is an improvement on previous chips as it means we can have multiple events on the PMU at the same time with different exclude_{user,kernel,hv} settings. Previously we had to ensure all events on the PMU had the same exclude settings. The core of the patch is fairly simple. We use the 207S feature flag to indicate that the PMU backend supports per-event excludes, if it's set we skip the generic logic that enforces the equality of excludes between events. We also use that flag to skip setting the freeze bits in MMCR0, the PMU backend is expected to have handled setting them in MMCR2. The complication arises with EBB. The FCxP bits in MMCR2 are accessible R/W to a task using EBB. Which means a task using EBB will be able to see that we are using MMCR2 for freezing, whereas the old logic which used MMCR0 is not user visible. The task can not see or affect exclude_kernel & exclude_hv, so we only need to consider exclude_user. The table below summarises the behaviour both before and after this commit is applied: exclude_user true false ------------------------------------ | User visible | N N Before | Can freeze | Y Y | Can unfreeze | N Y ------------------------------------ | User visible | Y Y After | Can freeze | Y Y | Can unfreeze | Y/N Y ------------------------------------ So firstly I assert that the simple visibility of the exclude_user setting in MMCR2 is a non-issue. The event belongs to the task, and was most likely created by the task. So the exclude_user setting is not privileged information in any way. Secondly, the behaviour in the exclude_user = false case is unchanged. This is important as it is the case that is actually useful, ie. the event is created with no exclude setting and the task uses MMCR2 to implement exclusion manually. For exclude_user = true there is no meaningful change to freezing the event. Previously the task could use MMCR2 to freeze the event, though it was already frozen with MMCR0. With the new code the task can use MMCR2 to freeze the event, though it was already frozen with MMCR2. The only real change is when exclude_user = true and the task tries to use MMCR2 to unfreeze the event. Previously this had no effect, because the event was already frozen in MMCR0. With the new code the task can unfreeze the event in MMCR2, but at some indeterminate time in the future the kernel will overwrite its setting and refreeze the event. Therefore my final assertion is that any task using exclude_user = true and also fiddling with MMCR2 was deeply confused before this change, and remains so after it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>