summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* cxl/mbox: Introduce the mbox_send operationDan Williams2021-09-212-55/+63
| | | | | | | | | | | | | | | | | In preparation for implementing a unit test backend transport for ioctl operations, and making the mailbox available to the cxl/pmem infrastructure, move the existing PCI specific portion of mailbox handling to an "mbox_send" operation. With this split all the PCI-specific transport details are comprehended by a single operation and the rest of the mailbox infrastructure is 'struct cxl_mem' and 'struct cxl_memdev' generic. Acked-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/163116434098.2460985.9004760022659400540.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* cxl/pci: Clean up cxl_mem_get_partition_info()Dan Williams2021-09-212-26/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0b9159d0ff21 ("cxl/pci: Store memory capacity values") missed updating the kernel-doc for 'struct cxl_mem' leading to the following warnings: ./scripts/kernel-doc -v drivers/cxl/cxlmem.h 2>&1 | grep warn drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'total_bytes' not described in 'cxl_mem' drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'volatile_only_bytes' not described in 'cxl_mem' drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'persistent_only_bytes' not described in 'cxl_mem' drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'partition_align_bytes' not described in 'cxl_mem' drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'active_volatile_bytes' not described in 'cxl_mem' drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'active_persistent_bytes' not described in 'cxl_mem' drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'next_volatile_bytes' not described in 'cxl_mem' drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'next_persistent_bytes' not described in 'cxl_mem' Also, it is redundant to describe those same parameters in the kernel-doc for cxl_mem_get_partition_info(). Given the only user of that routine updates the values in @cxlm, just do that implicitly internal to the helper. Cc: Ira Weiny <ira.weiny@intel.com> Reported-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163157174216.2653013.1277706528753990974.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* cxl/pci: Make 'struct cxl_mem' device type genericDan Williams2021-09-213-47/+41
| | | | | | | | | | | | | | | | | | | In preparation for adding a unit test provider of a cxl_memdev, convert the 'struct cxl_mem' driver context to carry a generic device rather than a pci device. Note, some dev_dbg() lines needed extra reformatting per clang-format. This conversion also allows the cxl_mem_create() and devm_cxl_add_memdev() calling conventions to be simplified. The "host" for a cxl_memdev, must be the same device for the driver that allocated @cxlm. Acked-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/163116432973.2460985.7553504957932024222.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm/labels: Introduce CXL labelsDan Williams2021-09-213-50/+241
| | | | | | | | | | | | | | | Now that all of use sites of label data have been converted to nsl_* helpers, introduce the CXL label format. The ->cxl flag in nvdimm_drvdata indicates the label format the device expects. A follow-on patch allows a bus provider to select the label style. Note that the EFI definition of the labels represents the Linux "claim class" with a GUID. The CXL definition of the labels stores the same identifier in UUID byte order. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116432405.2460985.5547867384570123403.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm/label: Define CXL region labelsDan Williams2021-09-211-0/+32
| | | | | | | | | | | Add a definition of the CXL 2.0 region label format. Note this is done as a separate patch to make the next patch that adds namespace label support easier to read. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116431893.2460985.4003511000574373922.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm/labels: Fix kernel-doc for label.hDan Williams2021-09-211-2/+8
| | | | | | | | | | | | | | | | | | | Clean up existing kernel-doc warnings before adding new CXL label data structures. drivers/nvdimm/label.h:66: warning: Function parameter or member 'labelsize' not described in 'nd_namespace_index' drivers/nvdimm/label.h:66: warning: Function parameter or member 'free' not described in 'nd_namespace_index' drivers/nvdimm/label.h:103: warning: Function parameter or member 'align' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'reserved' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'type_guid' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'abstraction_guid' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'reserved2' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'checksum' not described in 'nd_namespace_label' Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116431381.2460985.6990754901097922099.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm/labels: Introduce the concept of multi-range namespace labelsDan Williams2021-09-212-0/+14
| | | | | | | | | | | | | | | The CXL specification defines a mechanism for namespaces to be comprised of multiple dis-contiguous ranges. Introduce that concept to the legacy NVDIMM namespace implementation with a new nsl_set_nrange() helper, that sets the number of ranges to 1. Once the NVDIMM subsystem supports CXL labels and updates its namespace capacity provisioning for dis-contiguous support nsl_set_nrange() can be updated, but in the meantime CXL label validation requires nrange be non-zero. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116430804.2460985.5482188351381597529.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm/label: Add a helper for nlabel validationDan Williams2021-09-212-3/+9
| | | | | | | | | | In the CXL namespace label there is no need for nlabel since that is inferred from the region. Add a helper that moves nsl_get_label() behind a helper that validates the number of labels relative to the region. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116430293.2460985.12693942353621355232.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm/labels: Add uuid helpersDan Williams2021-09-219-116/+124
| | | | | | | | | | | | | | | | | | | | | | | | In preparation for CXL labels that move the uuid to a different offset in the label, add nsl_{ref,get,validate}_uuid(). These helpers use the proper uuid_t type. That type definition predated the libnvdimm subsystem, so now is as a good a time as any to convert all the uuid handling in the subsystem to uuid_t to match the helpers. Note that the uuid fields in the label data and superblocks is not replaced per Andy's expectation that uuid_t is a kernel internal type not to appear in external ABI interfaces. So, in those case {import,export}_uuid() is used to go between the 2 types. Also note that this rework uncovered some unnecessary copies for label comparisons, those are cleaned up with nsl_uuid_equal(). As for the whitespace changes, all new code is clang-format compliant. Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116429748.2460985.15659993454313919977.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Linux 5.15-rc2v5.15-rc2Linus Torvalds2021-09-191-1/+1
|
* pci_iounmap'2: Electric Boogaloo: try to make sense of it allLinus Torvalds2021-09-192-23/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nathan Chancellor reports that the recent change to pci_iounmap in commit 9caea0007601 ("parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled") causes build errors on arm64. It took me about two hours to convince myself that I think I know what the logic of that mess of #ifdef's in the <asm-generic/io.h> header file really aim to do, and rewrite it to be easier to follow. Famous last words. Anyway, the code has now been lifted from that grotty header file into lib/pci_iomap.c, and has fairly extensive comments about what the logic is. It also avoids indirecting through another confusing (and badly named) helper function that has other preprocessor config conditionals. Let's see what odd architecture did something else strange in this area to break things. But my arm64 cross build is clean. Fixes: 9caea0007601 ("parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled") Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ulrich Teichert <krypton@ulrich-teichert.org> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'x86_urgent_for_v5.15_rc2' of ↵Linus Torvalds2021-09-195-15/+47
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Prevent a infinite loop in the MCE recovery on return to user space, which was caused by a second MCE queueing work for the same page and thereby creating a circular work list. - Make kern_addr_valid() handle existing PMD entries, which are marked not present in the higher level page table, correctly instead of blindly dereferencing them. - Pass a valid address to sanitize_phys(). This was caused by the mixture of inclusive and exclusive ranges. memtype_reserve() expect 'end' being exclusive, but sanitize_phys() wants it inclusive. This worked so far, but with end being the end of the physical address space the fail is exposed. - Increase the maximum supported GPIO numbers for 64bit. Newer SoCs exceed the previous maximum. * tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Avoid infinite loop for copy from user recovery x86/mm: Fix kern_addr_valid() to cope with existing but not present entries x86/platform: Increase maximum GPIO number for X86_64 x86/pat: Pass valid address to sanitize_phys()
| * x86/mce: Avoid infinite loop for copy from user recoveryTony Luck2021-09-142-11/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two cases for machine check recovery: 1) The machine check was triggered by ring3 (application) code. This is the simpler case. The machine check handler simply queues work to be executed on return to user. That code unmaps the page from all users and arranges to send a SIGBUS to the task that triggered the poison. 2) The machine check was triggered in kernel code that is covered by an exception table entry. In this case the machine check handler still queues a work entry to unmap the page, etc. but this will not be called right away because the #MC handler returns to the fix up code address in the exception table entry. Problems occur if the kernel triggers another machine check before the return to user processes the first queued work item. Specifically, the work is queued using the ->mce_kill_me callback structure in the task struct for the current thread. Attempting to queue a second work item using this same callback results in a loop in the linked list of work functions to call. So when the kernel does return to user, it enters an infinite loop processing the same entry for ever. There are some legitimate scenarios where the kernel may take a second machine check before returning to the user. 1) Some code (e.g. futex) first tries a get_user() with page faults disabled. If this fails, the code retries with page faults enabled expecting that this will resolve the page fault. 2) Copy from user code retries a copy in byte-at-time mode to check whether any additional bytes can be copied. On the other side of the fence are some bad drivers that do not check the return value from individual get_user() calls and may access multiple user addresses without noticing that some/all calls have failed. Fix by adding a counter (current->mce_count) to keep track of repeated machine checks before task_work() is called. First machine check saves the address information and calls task_work_add(). Subsequent machine checks before that task_work call back is executed check that the address is in the same page as the first machine check (since the callback will offline exactly one page). Expected worst case is four machine checks before moving on (e.g. one user access with page faults disabled, then a repeat to the same address with page faults enabled ... repeat in copy tail bytes). Just in case there is some code that loops forever enforce a limit of 10. [ bp: Massage commit message, drop noinstr, fix typo, extend panic messages. ] Fixes: 5567d11c21a1 ("x86/mce: Send #MC singal from task work") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
| * x86/mm: Fix kern_addr_valid() to cope with existing but not present entriesMike Rapoport2021-09-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri Olsa reported a fault when running: # cat /proc/kallsyms | grep ksys_read ffffffff8136d580 T ksys_read # objdump -d --start-address=0xffffffff8136d580 --stop-address=0xffffffff8136d590 /proc/kcore /proc/kcore: file format elf64-x86-64 Segmentation fault general protection fault, probably for non-canonical address 0xf887ffcbff000: 0000 [#1] SMP PTI CPU: 12 PID: 1079 Comm: objdump Not tainted 5.14.0-rc5qemu+ #508 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:kern_addr_valid Call Trace: read_kcore ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? trace_hardirqs_on ? rcu_read_lock_sched_held ? lock_acquire ? lock_acquire ? rcu_read_lock_sched_held ? lock_acquire ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? lock_release ? _raw_spin_unlock ? __handle_mm_fault ? rcu_read_lock_sched_held ? lock_acquire ? rcu_read_lock_sched_held ? lock_release proc_reg_read ? vfs_read vfs_read ksys_read do_syscall_64 entry_SYSCALL_64_after_hwframe The fault happens because kern_addr_valid() dereferences existent but not present PMD in the high kernel mappings. Such PMDs are created when free_kernel_image_pages() frees regions larger than 2Mb. In this case, a part of the freed memory is mapped with PMDs and the set_memory_np_noalias() -> ... -> __change_page_attr() sequence will mark the PMD as not present rather than wipe it completely. Have kern_addr_valid() check whether higher level page table entries are present before trying to dereference them to fix this issue and to avoid similar issues in the future. Stable backporting note: ------------------------ Note that the stable marking is for all active stable branches because there could be cases where pagetable entries exist but are not valid - see 9a14aefc1d28 ("x86: cpa, fix lookup_address"), for example. So make sure to be on the safe side here and use pXY_present() accessors rather than pXY_none() which could #GP when accessing pages in the direct map. Also see: c40a56a7818c ("x86/mm/init: Remove freed kernel image areas from alias mapping") for more info. Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: <stable@vger.kernel.org> # 4.4+ Link: https://lkml.kernel.org/r/20210819132717.19358-1-rppt@kernel.org
| * x86/platform: Increase maximum GPIO number for X86_64Andy Shevchenko2021-09-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default the 512 GPIOs is the maximum on any x86 platform. With, for example, Intel Tiger Lake-H the SoC based controller occupies up to 480 pins. This leaves only 32 available for GPIO expanders or other drivers, like PMIC. Hence, bump the maximum GPIO number to 1024 for X86_64 and leave 512 for X86_32. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20210826150317.29435-1-andriy.shevchenko@linux.intel.com
| * x86/pat: Pass valid address to sanitize_phys()Jeff Moyer2021-09-021-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The end address passed to memtype_reserve() is handed directly to sanitize_phys(). However, end is exclusive and sanitize_phys() expects an inclusive address. If end falls at the end of the physical address space, sanitize_phys() will return 0. This can result in drivers failing to load, and the following warning: WARNING: CPU: 26 PID: 749 at arch/x86/mm/pat.c:354 reserve_memtype+0x262/0x450 reserve_memtype failed: [mem 0x3ffffff00000-0xffffffffffffffff], req uncached-minus Call Trace: [<ffffffffa427b1f2>] reserve_memtype+0x262/0x450 [<ffffffffa42764aa>] ioremap_nocache+0x1a/0x20 [<ffffffffc04620a1>] mpt3sas_base_map_resources+0x151/0xa60 [mpt3sas] [<ffffffffc0465555>] mpt3sas_base_attach+0xf5/0xa50 [mpt3sas] ---[ end trace 6d6eea4438db89ef ]--- ioremap reserve_memtype failed -22 mpt3sas_cm0: unable to map adapter memory! or resource not found mpt3sas_cm0: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:10597/_scsih_probe()! Fix this by passing the inclusive end address to sanitize_phys(). Fixes: 510ee090abc3 ("x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses") Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/x49o8a3pu5i.fsf@segfault.boston.devel.redhat.com
* | Merge tag 'perf-urgent-2021-09-19' of ↵Linus Torvalds2021-09-191-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Thomas Gleixner: "A single fix for the perf core where a value read with READ_ONCE() was checked and then reread which makes all the checks invalid. Reuse the already read value instead" * tag 'perf-urgent-2021-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: events: Reuse value read using READ_ONCE instead of re-reading it
| * | events: Reuse value read using READ_ONCE instead of re-reading itBaptiste Lepers2021-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In perf_event_addr_filters_apply, the task associated with the event (event->ctx->task) is read using READ_ONCE at the beginning of the function, checked, and then re-read from event->ctx->task, voiding all guarantees of the checks. Reuse the value that was read by READ_ONCE to ensure the consistency of the task struct throughout the function. Fixes: 375637bc52495 ("perf/core: Introduce address range filtering") Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210906015310.12802-1-baptiste.lepers@gmail.com
* | | Merge tag 'locking-urgent-2021-09-19' of ↵Linus Torvalds2021-09-191-20/+45
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A set of updates for the RT specific reader/writer locking base code: - Make the fast path reader ordering guarantees correct. - Code reshuffling to make the fix simpler" [ This plays ugly games with atomic_add_return_release() because we don't have a plain atomic_add_release(), and should really be cleaned up, I think - Linus ] * tag 'locking-urgent-2021-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwbase: Take care of ordering guarantee for fastpath reader locking/rwbase: Extract __rwbase_write_trylock() locking/rwbase: Properly match set_and_save_state() to restore_state()
| * | | locking/rwbase: Take care of ordering guarantee for fastpath readerBoqun Feng2021-09-151-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Readers of rwbase can lock and unlock without taking any inner lock, if that happens, we need the ordering provided by atomic operations to satisfy the ordering semantics of lock/unlock. Without that, considering the follow case: { X = 0 initially } CPU 0 CPU 1 ===== ===== rt_write_lock(); X = 1 rt_write_unlock(): atomic_add(READER_BIAS - WRITER_BIAS, ->readers); // ->readers is READER_BIAS. rt_read_lock(): if ((r = atomic_read(->readers)) < 0) // True atomic_try_cmpxchg(->readers, r, r + 1); // succeed. <acquire the read lock via fast path> r1 = X; // r1 may be 0, because nothing prevent the reordering // of "X=1" and atomic_add() on CPU 1. Therefore audit every usage of atomic operations that may happen in a fast path, and add necessary barriers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20210909110203.953991276@infradead.org
| * | | locking/rwbase: Extract __rwbase_write_trylock()Peter Zijlstra2021-09-151-18/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in rwbase_write_lock() is a little non-obvious vs the read+set 'trylock', extract the sequence into a helper function to clarify the code. This also provides a single site to fix fast-path ordering. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/YUCq3L+u44NDieEJ@hirez.programming.kicks-ass.net
| * | | locking/rwbase: Properly match set_and_save_state() to restore_state()Peter Zijlstra2021-09-151-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | Noticed while looking at the readers race. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/20210909110203.828203010@infradead.org
* | | Merge tag 'powerpc-5.15-2' of ↵Linus Torvalds2021-09-197-55/+159
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes when scv (System Call Vectored) is used to make a syscall when a transaction is active, on Power9 or later. - Fix bad interactions between rfscv (Return-from scv) and Power9 fake-suspend mode. - Fix crashes when handling machine checks in LPARs using the Hash MMU. - Partly revert a recent change to our XICS interrupt controller code, which broke the recently added Microwatt support. Thanks to Cédric Le Goater, Eirik Fuller, Ganesh Goudar, Gustavo Romero, Joel Stanley, Nicholas Piggin. * tag 'powerpc-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xics: Set the IRQ chip data for the ICS native backend powerpc/mce: Fix access error in mce handler KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers powerpc/64s: system call rfscv workaround for TM bugs selftests/powerpc: Add scv versions of the basic TM syscall tests powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
| * | | powerpc/xics: Set the IRQ chip data for the ICS native backendCédric Le Goater2021-09-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ICS native driver relies on the IRQ chip data to find the struct 'ics_native' describing the ICS controller but it was removed by commit 248af248a8f4 ("powerpc/xics: Rename the map handler in a check handler"). Revert this change to fix the Microwatt SoC platform. Fixes: 248af248a8f4 ("powerpc/xics: Rename the map handler in a check handler") Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Gustavo Romero <gustavo.romero@linaro.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210913134056.3761960-1-clg@kaod.org
| * | | powerpc/mce: Fix access error in mce handlerGanesh Goudar2021-09-131-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We queue an irq work for deferred processing of mce event in realmode mce handler, where translation is disabled. Queuing of the work may result in accessing memory outside RMO region, such access needs the translation to be enabled for an LPAR running with hash mmu else the kernel crashes. After enabling translation in mce_handle_error() we used to leave it enabled to avoid crashing here, but now with the commit 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler") we are restoring the MSR to disable translation. Hence to fix this enable the translation before queuing the work. Without this change following trace is seen on injecting SLB multihit in an LPAR running with hash mmu. Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 5 PID: 1883 Comm: insmod Tainted: G OE 5.14.0-mce+ #137 NIP: c000000000735d60 LR: c000000000318640 CTR: 0000000000000000 REGS: c00000001ebff9a0 TRAP: 0300 Tainted: G OE (5.14.0-mce+) MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28008228 XER: 00000001 CFAR: c00000000031863c DAR: c00000027fa8fe08 DSISR: 40000000 IRQMASK: 0 ... NIP llist_add_batch+0x0/0x40 LR __irq_work_queue_local+0x70/0xc0 Call Trace: 0xc00000001ebffc0c (unreliable) irq_work_queue+0x40/0x70 machine_check_queue_event+0xbc/0xd0 machine_check_early_common+0x16c/0x1f4 Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler") Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> [mpe: Fix comment formatting, trim oops in change log for readability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210909064330.312432-1-ganeshgr@linux.ibm.com
| * | | KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registersNicholas Piggin2021-09-131-2/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POWER9 DD2.2 and 2.3 hardware implements a "fake-suspend" mode where certain TM instructions executed in HV=0 mode cause softpatch interrupts so the hypervisor can emulate them and prevent problematic processor conditions. In this fake-suspend mode, the treclaim. instruction does not modify registers. Unfortunately the rfscv instruction executed by the guest do not generate softpatch interrupts, which can cause the hypervisor to lose track of the fake-suspend mode, and it can execute this treclaim. while not in fake-suspend mode. This modifies GPRs and crashes the hypervisor. It's not trivial to disable scv in the guest with HFSCR now, because they assume a POWER9 has scv available. So this fix saves and restores checkpointed registers across the treclaim. Fixes: 7854f7545bff ("KVM: PPC: Book3S: Rework TM save/restore code and make it C-callable") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210908101718.118522-2-npiggin@gmail.com
| * | | powerpc/64s: system call rfscv workaround for TM bugsNicholas Piggin2021-09-131-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rfscv instruction does not work correctly with the fake-suspend mode in POWER9, which can end up with the hypervisor restoring an incorrect checkpoint. Work around this by setting the _TIF_RESTOREALL flag if a system call returns to a transaction active state, causing rfid to be used instead of rfscv to return, which will do the right thing. The contents of the registers are irrelevant because they will be overwritten in this case anyway. Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions") Reported-by: Eirik Fuller <efuller@redhat.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210908101718.118522-1-npiggin@gmail.com
| * | | selftests/powerpc: Add scv versions of the basic TM syscall testsNicholas Piggin2021-09-132-8/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The basic TM vs syscall test code hard codes an sc instruction for the system call, which fails to cover scv even when the userspace libc has support for it. Duplicate the tests with hard coded scv variants so both are tested when possible. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix build on old toolchains by using .long for scv] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210903125707.1601269-2-npiggin@gmail.com
| * | | powerpc/64s: system call scv tabort fix for corrupt irq soft-mask stateNicholas Piggin2021-09-132-41/+30
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a system call is made with a transaction active, the kernel immediately aborts it and returns. scv system calls disable irqs even earlier in their interrupt handler, and tabort_syscall does not fix this up. This can result in irq soft-mask state being messed up on the next kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in the kernel exit handlers, or possibly worse. This can't easily be fixed in asm because at this point an async irq may have hit, which is soft-masked and marked pending. The pending interrupt has to be replayed before returning to userspace. The fix is to move the tabort_syscall code to C in the main syscall handler, and just skip the system call but otherwise return as usual, which will take care of the pending irqs. This also does a bunch of other things including possible signal delivery to the process, but the doomed transaction should still be aborted when it is eventually returned to. The sc system call path is changed to use the new C function as well to reduce code and path differences. This slows down how quickly system calls are aborted when called while a transaction is active, which could potentially impact TM performance. But making any system call is already bad for performance, and TM is on the way out, so go with simpler over faster. Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions") Reported-by: Eirik Fuller <efuller@redhat.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use #ifdef rather than IS_ENABLED() to fix build error on 32-bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210903125707.1601269-1-npiggin@gmail.com
* | | Merge tag 'kbuild-fixes-v5.15' of ↵Linus Torvalds2021-09-196-20/+27
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix bugs in checkkconfigsymbols.py - Fix missing sys import in gen_compile_commands.py - Fix missing FORCE warning for ARCH=sh builds - Fix -Wignored-optimization-argument warnings for Clang builds - Turn -Wignored-optimization-argument into an error in order to stop building instead of sprinkling warnings * tag 'kbuild-fixes-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: Add -Werror=ignored-optimization-argument to CLANG_FLAGS x86/build: Do not add -falign flags unconditionally for clang kbuild: Fix comment typo in scripts/Makefile.modpost sh: Add missing FORCE prerequisites in Makefile gen_compile_commands: fix missing 'sys' package checkkconfigsymbols.py: Remove skipping of help lines in parse_kconfig_file checkkconfigsymbols.py: Forbid passing 'HEAD' to --commit
| * | | kbuild: Add -Werror=ignored-optimization-argument to CLANG_FLAGSNathan Chancellor2021-09-191-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to commit 589834b3a009 ("kbuild: Add -Werror=unknown-warning-option to CLANG_FLAGS"). Clang ignores certain GCC flags that it has not implemented, only emitting a warning: $ echo | clang -fsyntax-only -falign-jumps -x c - clang-14: warning: optimization flag '-falign-jumps' is not supported [-Wignored-optimization-argument] When one of these flags gets added to KBUILD_CFLAGS unconditionally, all subsequent cc-{disable-warning,option} calls fail because -Werror was added to these invocations to turn the above warning and the equivalent -W flag warning into errors. To catch the presence of these flags earlier, turn -Wignored-optimization-argument into an error so that the flags can either be implemented or ignored via cc-option and there are no more weird errors. Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * | | x86/build: Do not add -falign flags unconditionally for clangNathan Chancellor2021-09-191-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang does not support -falign-jumps and only recently gained support for -falign-loops. When one of the configuration options that adds these flags is enabled, clang warns and all cc-{disable-warning,option} that follow fail because -Werror gets added to test for the presence of this warning: clang-14: warning: optimization flag '-falign-jumps=0' is not supported [-Wignored-optimization-argument] To resolve this, add a couple of cc-option calls when building with clang; gcc has supported these options since 3.2 so there is no point in testing for their support. -falign-functions was implemented in clang-7, -falign-loops was implemented in clang-14, and -falign-jumps has not been implemented yet. Link: https://lore.kernel.org/r/YSQE2f5teuvKLkON@Ryzen-9-3900X.localdomain/ Link: https://lore.kernel.org/r/20210824022640.2170859-2-nathan@kernel.org/ Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * | | kbuild: Fix comment typo in scripts/Makefile.modpostRamji Jiyani2021-09-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change comment "create one <module>.mod.c file pr. module" to "create one <module>.mod.c file per module" Signed-off-by: Ramji Jiyani <ramjiyani@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * | | sh: Add missing FORCE prerequisites in MakefileGeert Uytterhoeven2021-09-191-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make: arch/sh/boot/Makefile:87: FORCE prerequisite is missing Add the missing FORCE prerequisites for all build targets identified by "make help". Fixes: e1f86d7b4b2a5213 ("kbuild: warn if FORCE is missing for if_changed(_dep,_rule) and filechk") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * | | gen_compile_commands: fix missing 'sys' packageKortan2021-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to import the 'sys' package since the script has called sys.exit() method. Fixes: 6ad7cbc01527 ("Makefile: Add clang-tidy and static analyzer support to makefile") Signed-off-by: Kortan <kortanzh@gmail.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * | | checkkconfigsymbols.py: Remove skipping of help lines in parse_kconfig_fileAriel Marcovitch2021-09-191-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When parsing Kconfig files to find symbol definitions and references, lines after a 'help' line are skipped until a new config definition starts. However, Kconfig statements can actually be after a help section, as long as these have shallower indentation. These are skipped by the parser. This means that symbols referenced in this kind of statements are ignored by this function and thus are not considered undefined references in case the symbol is not defined. Remove the 'skip' logic entirely, as it is not needed if we just use the STMT regex to find the end of help lines. However, this means that keywords that appear as part of the help message (i.e. with the same indentation as the help lines) it will be considered as a reference/definition. This can happen now as well, but only with REGEX_KCONFIG_DEF lines. Also, the keyword must have a SYMBOL after it, which probably means that someone referenced a config in the help so it seems like a bonus :) The real solution is to keep track of the indentation when a the first help line in encountered and then handle DEF and STMT lines only if the indentation is shallower. Signed-off-by: Ariel Marcovitch <arielmarcovitch@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * | | checkkconfigsymbols.py: Forbid passing 'HEAD' to --commitAriel Marcovitch2021-09-191-0/+3
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As opposed to the --diff option, --commit can get ref names instead of commit hashes. When using the --commit option, the script resets the working directory to the commit before the given ref, by adding '~' to the end of the ref. However, the 'HEAD' ref is relative, and so when the working directory is reset to 'HEAD~', 'HEAD' points to what was 'HEAD~'. Then when the script resets to 'HEAD' it actually stays in the same commit. In this case, the script won't report any cases because there is no diff between the cases of the two refs. Prevent the user from using HEAD refs. A better solution might be to resolve the refs before doing the reset, but for now just disallow such refs. Signed-off-by: Ariel Marcovitch <arielmarcovitch@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* | | Merge tag 'perf-tools-fixes-for-v5.15-2021-09-18' of ↵Linus Torvalds2021-09-197-51/+100
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix ip display in 'perf script' when output type != attr->type. - Ignore deprecation warning when using libbpf'sg btf__get_from_id(), fixing the build with libbpf v0.6+. - Make use of FD() robust in libperf, fixing a segfault with 'perf stat --iostat list'. - Initialize addr_location:srcline pointer to NULL when resolving callchain addresses. - Fix fused instruction logic for assembly functions in 'perf annotate'. * tag 'perf-tools-fixes-for-v5.15-2021-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf bpf: Ignore deprecation warning when using libbpf's btf__get_from_id() libperf evsel: Make use of FD robust. perf machine: Initialize srcline string member in add_location struct perf script: Fix ip display when type != attr->type perf annotate: Fix fused instr logic for assembly functions
| * | | perf bpf: Ignore deprecation warning when using libbpf's btf__get_from_id()Andrii Nakryiko2021-09-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perf code re-implements libbpf's btf__load_from_kernel_by_id() API as a weak function, presumably to dynamically link against old version of libbpf shared library. Unfortunately this causes compilation warning when perf is compiled against libbpf v0.6+. For now, just ignore deprecation warning, but there might be a better solution, depending on perf's needs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: kernel-team@fb.com LPU-Reference: 20210914170004.4185659-1-andrii@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | libperf evsel: Make use of FD robust.Ian Rogers2021-09-181-23/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FD uses xyarray__entry that may return NULL if an index is out of bounds. If NULL is returned then a segv happens as FD unconditionally dereferences the pointer. This was happening in a case of with perf iostat as shown below. The fix is to make FD an "int*" rather than an int and handle the NULL case as either invalid input or a closed fd. $ sudo gdb --args perf stat --iostat list ... Breakpoint 1, perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50 50 { (gdb) bt #0 perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50 #1 0x000055555585c188 in evsel__open_cpu (evsel=0x5555560951a0, cpus=0x555556093410, threads=0x555556086fb0, start_cpu=0, end_cpu=1) at util/evsel.c:1792 #2 0x000055555585cfb2 in evsel__open (evsel=0x5555560951a0, cpus=0x0, threads=0x555556086fb0) at util/evsel.c:2045 #3 0x000055555585d0db in evsel__open_per_thread (evsel=0x5555560951a0, threads=0x555556086fb0) at util/evsel.c:2065 #4 0x00005555558ece64 in create_perf_stat_counter (evsel=0x5555560951a0, config=0x555555c34700 <stat_config>, target=0x555555c2f1c0 <target>, cpu=0) at util/stat.c:590 #5 0x000055555578e927 in __run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0) at builtin-stat.c:833 #6 0x000055555578f3c6 in run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0) at builtin-stat.c:1048 #7 0x0000555555792ee5 in cmd_stat (argc=1, argv=0x7fffffffe4a0) at builtin-stat.c:2534 #8 0x0000555555835ed3 in run_builtin (p=0x555555c3f540 <commands+288>, argc=3, argv=0x7fffffffe4a0) at perf.c:313 #9 0x0000555555836154 in handle_internal_command (argc=3, argv=0x7fffffffe4a0) at perf.c:365 #10 0x000055555583629f in run_argv (argcp=0x7fffffffe2ec, argv=0x7fffffffe2e0) at perf.c:409 #11 0x0000555555836692 in main (argc=3, argv=0x7fffffffe4a0) at perf.c:539 ... (gdb) c Continuing. Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_iio_0/event=0x83,umask=0x04,ch_mask=0xF,fc_mask=0x07/). /bin/dmesg | grep -i perf may provide additional information. Program received signal SIGSEGV, Segmentation fault. 0x00005555559b03ea in perf_evsel__close_fd_cpu (evsel=0x5555560951a0, cpu=1) at evsel.c:166 166 if (FD(evsel, cpu, thread) >= 0) v3. fixes a bug in perf_evsel__run_ioctl where the sense of a branch was backward. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210918054440.2350466-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | perf machine: Initialize srcline string member in add_location structMichael Petlan2021-09-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's later supposed to be either a correct address or NULL. Without the initialization, it may contain an undefined value which results in the following segmentation fault: # perf top --sort comm -g --ignore-callees=do_idle terminates with: #0 0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6 #1 0x00007ffff55e3802 in strdup () from /lib64/libc.so.6 #2 0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489 #3 hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564 #4 0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420, sample_self=sample_self@entry=true) at util/hist.c:657 #5 0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0, sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288 #6 0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38) at util/hist.c:1056 #7 iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056 #8 0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0) at util/hist.c:1231 #9 0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0) at builtin-top.c:842 #10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202 #11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244 #12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323 #13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339 #14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341 #15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339 #16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114 #17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0 #18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6 If you look at the frame #2, the code is: 488 if (he->srcline) { 489 he->srcline = strdup(he->srcline); 490 if (he->srcline == NULL) 491 goto err_rawdata; 492 } If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish), it gets strdupped and strdupping a rubbish random string causes the problem. Also, if you look at the commit 1fb7d06a509e, it adds the srcline property into the struct, but not initializing it everywhere needed. Committer notes: Now I see, when using --ignore-callees=do_idle we end up here at line 2189 in add_callchain_ip(): 2181 if (al.sym != NULL) { 2182 if (perf_hpp_list.parent && !*parent && 2183 symbol__match_regex(al.sym, &parent_regex)) 2184 *parent = al.sym; 2185 else if (have_ignore_callees && root_al && 2186 symbol__match_regex(al.sym, &ignore_callees_regex)) { 2187 /* Treat this symbol as the root, 2188 forgetting its callees. */ 2189 *root_al = al; 2190 callchain_cursor_reset(cursor); 2191 } 2192 } And the al that doesn't have the ->srcline field initialized will be copied to the root_al, so then, back to: 1211 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al, 1212 int max_stack_depth, void *arg) 1213 { 1214 int err, err2; 1215 struct map *alm = NULL; 1216 1217 if (al) 1218 alm = map__get(al->map); 1219 1220 err = sample__resolve_callchain(iter->sample, &callchain_cursor, &iter->parent, 1221 iter->evsel, al, max_stack_depth); 1222 if (err) { 1223 map__put(alm); 1224 return err; 1225 } 1226 1227 err = iter->ops->prepare_entry(iter, al); 1228 if (err) 1229 goto out; 1230 1231 err = iter->ops->add_single_entry(iter, al); 1232 if (err) 1233 goto out; 1234 That al at line 1221 is what hist_entry_iter__add() (called from sample__resolve_callchain()) saw as 'root_al', and then: iter->ops->add_single_entry(iter, al); will go on with al->srcline with a bogus value, I'll add the above sequence to the cset and apply, thanks! Signed-off-by: Michael Petlan <mpetlan@redhat.com> CC: Milian Wolff <milian.wolff@kdab.com> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 1fb7d06a509e ("perf report Use srcline from callchain for hist entries") Link: https //lore.kernel.org/r/20210719145332.29747-1-mpetlan@redhat.com Reported-by: Juri Lelli <jlelli@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | perf script: Fix ip display when type != attr->typeAdrian Hunter2021-09-181-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_print_ip_opts() was not being called when type != attr->type because there is not a one-to-one relationship between output types and attr->type. That resulted in ip not printing. The attr_type() function is removed, and the match of attr->type to output type is corrected. Example on ADL using taskset to select an atom cpu: # perf record -e cpu_atom/cpu-cycles/ taskset 0x1000 uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.003 MB perf.data (7 samples) ] Before: # perf script | head taskset 428 [-01] 10394.179041: 1 cpu_atom/cpu-cycles/: taskset 428 [-01] 10394.179043: 1 cpu_atom/cpu-cycles/: taskset 428 [-01] 10394.179044: 11 cpu_atom/cpu-cycles/: taskset 428 [-01] 10394.179045: 407 cpu_atom/cpu-cycles/: taskset 428 [-01] 10394.179046: 16789 cpu_atom/cpu-cycles/: taskset 428 [-01] 10394.179052: 676300 cpu_atom/cpu-cycles/: uname 428 [-01] 10394.179278: 4079859 cpu_atom/cpu-cycles/: After: # perf script | head taskset 428 10394.179041: 1 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms]) taskset 428 10394.179043: 1 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms]) taskset 428 10394.179044: 11 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms]) taskset 428 10394.179045: 407 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms]) taskset 428 10394.179046: 16789 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms]) taskset 428 10394.179052: 676300 cpu_atom/cpu-cycles/: 7f829ef73800 cfree+0x0 (/lib/libc-2.32.so) uname 428 10394.179278: 4079859 cpu_atom/cpu-cycles/: ffffffff95bae912 vma_interval_tree_remove+0x1f2 ([kernel.kallsyms]) Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20210911133053.15682-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | perf annotate: Fix fused instr logic for assembly functionsRavi Bangoria2021-09-183-17/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https //lore.kernel.org/r/20210911043854.8373-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | | dmascc: use proper 'virt_to_bus()' rather than casting to 'int'Linus Torvalds2021-09-191-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old dmascc driver depends on the legacy ISA_DMA_API, and blindly just casts the kernel virtual address to 'int' for set_dma_addr(). That works only incidentally, and because the high bits of the address will be ignored anyway. And on 64-bit architectures it causes warnings. Admittedly, 64-bit architectures with ISA are basically dead - I think the only example of this is alpha, and nobody would ever use the dmascc driver there. But hey, the fix is easy enough, the end result is cleaner, and it's yet another configuration that now builds without warnings. If somebody actually uses this driver on an alpha and this fixes it for you, please email me. Because that is just incredibly bizarre. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | alpha: enable GENERIC_PCI_IOMAP unconditionallyLinus Torvalds2021-09-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the previous commit (9caea0007601: "parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled") we can now enable GENERIC_PCI_IOMAP unconditionally on alpha, and if PCI is not enabled we will just get the nice empty helper functions that allow mixed-bus drivers to build. Example driver: the old 3com/3c59x.c driver works with either the PCI or the EISA version of the 3x59x card, but wouldn't build in an EISA-only configuration because of missing pci_iomap() and pci_iounmap() dummy wrappers. Most of the other PCI infrastructure just becomes empty wrappers even without GENERIC_PCI_IOMAP, and it's not obvious that the pci_iomap functionality shouldn't do the same, but this works. Cc: Ulrich Teichert <krypton@ulrich-teichert.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabledHelge Deller2021-09-193-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linus noticed odd declaration rules for pci_iounmap() in iomap.h and pci_iomap.h, where it dependend on either NO_GENERIC_PCI_IOPORT_MAP or GENERIC_IOMAP when CONFIG_PCI was disabled. Testing on parisc seems to indicate that we need pci_iounmap() only when CONFIG_PCI is enabled, so the declaration of pci_iounmap() can be moved cleanly into pci_iomap.h in sync with the declarations of pci_iomap(). Link: https://lore.kernel.org/all/CAHk-=wjRrh98pZoQ+AzfWmsTZacWxTJKXZ9eKU2X_0+jM=O8nw@mail.gmail.com/ Signed-off-by: Helge Deller <deller@gmx.de> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 97a29d59fc22 ("[PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional") Cc: Arnd Bergmann <arnd@arndb.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ulrich Teichert <krypton@ulrich-teichert.org> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Revert "drm/vc4: hdmi: Remove drm_encoder->crtc usage"Linus Torvalds2021-09-191-27/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 27da370e0fb343a0baf308f503bb3e5dcdfe3362. Sudip Mukherjee reports that this broke pulseaudio with a NULL pointer dereference in vc4_hdmi_audio_prepare(), bisected it to this commit, and confirmed that a revert fixed the problem. Revert the problematic commit until fixed. Link: https://lore.kernel.org/all/CADVatmPB9-oKd=ypvj25UYysVo6EZhQ6bCM7EvztQBMyiZfAyw@mail.gmail.com/ Link: https://lore.kernel.org/all/CADVatmN5EpRshGEPS_JozbFQRXg5w_8LFB3OMP1Ai-ghxd3w4g@mail.gmail.com/ Reported-and-tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Maxime Ripard <maxime@cerno.tech> Cc: Emma Anholt <emma@anholt.net> Cc: Dave Airlie <airlied@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Revert drm/vc4 hdmi runtime PM changesLinus Torvalds2021-09-191-34/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commits 9984d6664ce9 ("drm/vc4: hdmi: Make sure the controller is powered in detect") 411efa18e4b0 ("drm/vc4: hdmi: Move the HSM clock enable to runtime_pm") as Michael Stapelberg reports that the new runtime PM changes cause his Raspberry Pi 3 to hang on boot, probably due to interactions with other changes in the DRM tree (because a bisect points to the merge in commit e058a84bfddc: "Merge tag 'drm-next-2021-07-01' of git://.../drm"). Revert these two commits until it's been resolved. Link: https://lore.kernel.org/all/871r5mp7h2.fsf@midna.i-did-not-set--mail-host-address--so-tickle-me/ Reported-and-tested-by: Michael Stapelberg <michael@stapelberg.ch> Cc: Maxime Ripard <maxime@cerno.tech> Cc: Dave Stevenson <dave.stevenson@raspberrypi.com> Cc: Dave Airlie <airlied@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | alpha: move __udiv_qrnnd library function to arch/alpha/lib/Linus Torvalds2021-09-185-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already had the implementation for __udiv_qrnnd (unsigned divide for multi-precision arithmetic) as part of the alpha math emulation code. But you can disable the math emulation code - even if you shouldn't - and then the MPI code that actually wants this functionality (and is needed by various crypto functions) will fail to build. So move the extended-precision divide code to be a regular library function, just like all the regular division code is. That way ie is available regardless of math-emulation. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | alpha: mark 'Jensen' platform as no longer brokenLinus Torvalds2021-09-182-6/+5
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ok, it almost certainly is still broken on actual hardware, but the immediate reason for it having been marked BROKEN was a build error that is fixed by just making sure the low-level IO header file is included sufficiently early that the __EXTERN_INLINE hackery takes effect. This was marked broken back in 2017 by commit 1883c9f49d02 ("alpha: mark jensen as broken"), but Ulrich Teichert made me look at it as part of my cross-build work to make sure -Werror actually does the right thing. There are lots of alpha configurations that do not build cleanly, but now it's no longer because Jensen wouldn't be buildable. That said, because the Jensen platform doesn't force PCI to be enabled (Jensen only had EISA), it ends up being somewhat interesting as a source of odd configs. Reported-by: Ulrich Teichert <krypton@ulrich-teichert.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>