summaryrefslogtreecommitdiffstats
path: root/arch/riscv
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3 days18-113/+443
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "ARM: - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions also require emulating part of the pointer authentication extension. As a result, the trap handling of pointer authentication has been greatly simplified. - Turn the global (and not very scalable) LPI translation cache into a per-ITS, scalable cache, making non directly injected LPIs much cheaper to make visible to the vcpu. - A batch of pKVM patches, mostly fixes and cleanups, as the upstreaming process seems to be resuming. Fingers crossed! - Allocate PPIs and SGIs outside of the vcpu structure, allowing for smaller EL2 mapping and some flexibility in implementing more or less than 32 private IRQs. - Purge stale mpidr_data if a vcpu is created after the MPIDR map has been created. - Preserve vcpu-specific ID registers across a vcpu reset. - Various minor cleanups and improvements. LoongArch: - Add ParaVirt IPI support - Add software breakpoint support - Add mmio trace events support RISC-V: - Support guest breakpoints using ebreak - Introduce per-VCPU mp_state_lock and reset_cntx_lock - Virtualize SBI PMU snapshot and counter overflow interrupts - New selftests for SBI PMU and Guest ebreak - Some preparatory work for both TDX and SNP page fault handling. This also cleans up the page fault path, so that the priorities of various kinds of fauls (private page, no memory, write to read-only slot, etc.) are easier to follow. x86: - Minimize amount of time that shadow PTEs remain in the special REMOVED_SPTE state. This is a state where the mmu_lock is held for reading but concurrent accesses to the PTE have to spin; shortening its use allows other vCPUs to repopulate the zapped region while the zapper finishes tearing down the old, defunct page tables. - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field, which is defined by hardware but left for software use. This lets KVM communicate its inability to map GPAs that set bits 51:48 on hosts without 5-level nested page tables. Guest firmware is expected to use the information when mapping BARs; this avoids that they end up at a legal, but unmappable, GPA. - Fixed a bug where KVM would not reject accesses to MSR that aren't supposed to exist given the vCPU model and/or KVM configuration. - As usual, a bunch of code cleanups. x86 (AMD): - Implement a new and improved API to initialize SEV and SEV-ES VMs, which will also be extendable to SEV-SNP. The new API specifies the desired encryption in KVM_CREATE_VM and then separately initializes the VM. The new API also allows customizing the desired set of VMSA features; the features affect the measurement of the VM's initial state, and therefore enabling them cannot be done tout court by the hypervisor. While at it, the new API includes two bugfixes that couldn't be applied to the old one without a flag day in userspace or without affecting the initial measurement. When a SEV-ES VM is created with the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are rejected once the VMSA has been encrypted. Also, the FPU and AVX state will be synchronized and encrypted too. - Support for GHCB version 2 as applicable to SEV-ES guests. This, once more, is only accessible when using the new KVM_SEV_INIT2 flow for initialization of SEV-ES VMs. x86 (Intel): - An initial bunch of prerequisite patches for Intel TDX were merged. They generally don't do anything interesting. The only somewhat user visible change is a new debugging mode that checks that KVM's MMU never triggers a #VE virtualization exception in the guest. - Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to L1, as per the SDM. Generic: - Use vfree() instead of kvfree() for allocations that always use vcalloc() or __vcalloc(). - Remove .change_pte() MMU notifier - the changes to non-KVM code are small and Andrew Morton asked that I also take those through the KVM tree. The callback was only ever implemented by KVM (which was also the original user of MMU notifiers) but it had been nonfunctional ever since calls to set_pte_at_notify were wrapped with invalidate_range_start and invalidate_range_end... in 2012. Selftests: - Enhance the demand paging test to allow for better reporting and stressing of UFFD performance. - Convert the steal time test to generate TAP-friendly output. - Fix a flaky false positive in the xen_shinfo_test due to comparing elapsed time across two different clock domains. - Skip the MONITOR/MWAIT test if the host doesn't actually support MWAIT. - Avoid unnecessary use of "sudo" in the NX hugepage test wrapper shell script, to play nice with running in a minimal userspace environment. - Allow skipping the RSEQ test's sanity check that the vCPU was able to complete a reasonable number of KVM_RUNs, as the assert can fail on a completely valid setup. If the test is run on a large-ish system that is otherwise idle, and the test isn't affined to a low-ish number of CPUs, the vCPU task can be repeatedly migrated to CPUs that are in deep sleep states, which results in the vCPU having very little net runtime before the next migration due to high wakeup latencies. - Define _GNU_SOURCE for all selftests to fix a warning that was introduced by a change to kselftest_harness.h late in the 6.9 cycle, and because forcing every test to #define _GNU_SOURCE is painful. - Provide a global pseudo-RNG instance for all tests, so that library code can generate random, but determinstic numbers. - Use the global pRNG to randomly force emulation of select writes from guest code on x86, e.g. to help validate KVM's emulation of locked accesses. - Allocate and initialize x86's GDT, IDT, TSS, segments, and default exception handlers at VM creation, instead of forcing tests to manually trigger the related setup. Documentation: - Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (225 commits) selftests/kvm: remove dead file KVM: selftests: arm64: Test vCPU-scoped feature ID registers KVM: selftests: arm64: Test that feature ID regs survive a reset KVM: selftests: arm64: Store expected register value in set_id_regs KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope KVM: arm64: Only reset vCPU-scoped feature ID regs once KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs() KVM: arm64: Rename is_id_reg() to imply VM scope KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: SEV: Allow per-guest configuration of GHCB protocol version KVM: SEV: Add GHCB handling for termination requests KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests KVM: SEV: Add support to handle AP reset MSR protocol KVM: x86: Explicitly zero kvm_caps during vendor module load KVM: x86: Fully re-initialize supported_mce_cap on vendor module load KVM: x86: Fully re-initialize supported_vm_types on vendor module load KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values ...
| * Merge tag 'loongarch-kvm-6.10' of ↵Paolo Bonzini8 days8-30/+36
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.10 1. Add ParaVirt IPI support. 2. Add software breakpoint support. 3. Add mmio trace events support.
| * \ Merge tag 'kvm-riscv-6.10-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini11 days17-93/+443
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM/riscv changes for 6.10 - Support guest breakpoints using ebreak - Introduce per-VCPU mp_state_lock and reset_cntx_lock - Virtualize SBI PMU snapshot and counter overflow interrupts - New selftests for SBI PMU and Guest ebreak
| | * | RISC-V: KVM: Improve firmware counter read functionAtish Patra2024-04-263-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the function to indicate that it is meant for firmware counter read. While at it, add a range sanity check for it as well. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240420151741.962500-17-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: KVM: Support 64 bit firmware counters on RV32Atish Patra2024-04-263-2/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SBI v2.0 introduced a fw_read_hi function to read 64 bit firmware counters for RV32 based systems. Add infrastructure to support that. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-16-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: KVM: Add perf sampling support for guestsAtish Patra2024-04-267-8/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM enables perf for guest via counter virtualization. However, the sampling can not be supported as there is no mechanism to enabled trap/emulate scountovf in ISA yet. Rely on the SBI PMU snapshot to provide the counter overflow data via the shared memory. In case of sampling event, the host first sets the guest's LCOFI interrupt and injects to the guest via irq filtering mechanism defined in AIA specification. Thus, ssaia must be enabled in the host in order to use perf sampling in the guest. No other AIA dependency w.r.t kernel is required. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-15-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: KVM: Implement SBI PMU Snapshot featureAtish Patra2024-04-263-1/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PMU Snapshot function allows to minimize the number of traps when the guest access configures/access the hpmcounters. If the snapshot feature is enabled, the hypervisor updates the shared memory with counter data and state of overflown counters. The guest can just read the shared memory instead of trap & emulate done by the hypervisor. This patch doesn't implement the counter overflow yet. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-14-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: KVM: No need to exit to the user space if perf event failedAtish Patra2024-04-262-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we return a linux error code if creating a perf event failed in kvm. That shouldn't be necessary as guest can continue to operate without perf profiling or profiling with firmware counters. Return appropriate SBI error code to indicate that PMU configuration failed. An error message in kvm already describes the reason for failure. Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling") Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-13-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: KVM: No need to update the counter value during resetAtish Patra2024-04-261-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The virtual counter value is updated during pmu_ctr_read. There is no need to update it in reset case. Otherwise, it will be counted twice which is incorrect. Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling") Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-12-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: KVM: Fix the initial sample period valueAtish Patra2024-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The initial sample period value when counter value is not assigned should be set to maximum value supported by the counter width. Otherwise, it may result in spurious interrupts. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240420151741.962500-11-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: Use the minor version mask while computing sbi versionAtish Patra2024-04-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per the SBI specification, minor version is encoded in the lower 24 bits only. Make sure that the SBI version is computed with the appropriate mask. Currently, there is no minor version in use. Thus, it doesn't change anything functionality but it is good to be compliant with the specification. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-8-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic nameAtish Patra2024-04-223-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SBI_STA_SHMEM_DISABLE is a macro to invoke disable shared memory commands. As this can be invoked from other SBI extension context as well, rename it to more generic name as SBI_SHMEM_DISABLE. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-7-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: Add SBI PMU snapshot definitionsAtish Patra2024-04-221-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SBI PMU Snapshot function optimizes the number of traps to higher privilege mode by leveraging a shared memory between the S/VS-mode and the M/HS mode. Add the definitions for that extension and new error codes. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-6-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | drivers/perf: riscv: Use BIT macro for shifting operationsAtish Patra2024-04-221-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is a good practice to use BIT() instead of (1 << x). Replace the current usages with BIT(). Take this opportunity to replace few (1UL << x) with BIT() as well for consistency. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-5-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: Add FIRMWARE_READ_HI definitionAtish Patra2024-04-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SBI v2.0 added another function to SBI PMU extension to read the upper bits of a counter with width larger than XLEN. Add the definition for that function. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-3-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: Fix the typo in Scountovf CSR nameAtish Patra2024-04-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The counter overflow CSR name is "scountovf" not "sscountovf". Fix the csr name. Fixes: 4905ec2fb7e6 ("RISC-V: Add sscofpmf extension support") Reviewed-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-2-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISCV: KVM: Introduce vcpu->reset_cntx_lockYong-Xuan Wang2024-04-223-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally, the use of kvm->lock in SBI_EXT_HSM_HART_START also avoids the simultaneous updates to the reset context of target VCPU. Since this lock has been replace with vcpu->mp_state_lock, and this new lock also protects the vcpu->mp_state. We have to add a separate lock for vcpu->reset_cntx. Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240417074528.16506-3-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISCV: KVM: Introduce mp_state_lock to avoid lock inversionYong-Xuan Wang2024-04-224-29/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Documentation/virt/kvm/locking.rst advises that kvm->lock should be acquired outside vcpu->mutex and kvm->srcu. However, when KVM/RISC-V handling SBI_EXT_HSM_HART_START, the lock ordering is vcpu->mutex, kvm->srcu then kvm->lock. Although the lockdep checking no longer complains about this after commit f0f44752f5f6 ("rcu: Annotate SRCU's update-side lockdep dependencies"), it's necessary to replace kvm->lock with a new dedicated lock to ensure only one hart can execute the SBI_EXT_HSM_HART_START call for the target hart simultaneously. Additionally, this patch also rename "power_off" to "mp_state" with two possible values. The vcpu->mp_state_lock also protects the access of vcpu->mp_state. Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240417074528.16506-2-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: KVM: Handle breakpoint exits for VCPUChao Du2024-04-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exit to userspace for breakpoint traps. Set the exit_reason as KVM_EXIT_DEBUG before exit. Signed-off-by: Chao Du <duchao@eswincomputing.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240402062628.5425-3-duchao@eswincomputing.com Signed-off-by: Anup Patel <anup@brainfault.org>
| | * | RISC-V: KVM: Implement kvm_arch_vcpu_ioctl_set_guest_debug()Chao Du2024-04-084-18/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_vm_ioctl_check_extension(): Return 1 if KVM_CAP_SET_GUEST_DEBUG is been checked. kvm_arch_vcpu_ioctl_set_guest_debug(): Update the guest_debug flags from userspace accordingly. Route the breakpoint exceptions to HS mode if the VCPU is being debugged by userspace, by clearing the corresponding bit in hedeleg. Initialize the hedeleg configuration in kvm_riscv_vcpu_setup_config(). Write the actual CSR in kvm_arch_vcpu_load(). Signed-off-by: Chao Du <duchao@eswincomputing.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240402062628.5425-2-duchao@eswincomputing.com Signed-off-by: Anup Patel <anup@brainfault.org>
| * | | Merge x86 bugfixes from Linux 6.9-rc3Paolo Bonzini2024-04-1913-20/+50
| |\| | | | | | | | | | | | | | | | | | | | | | Pull fix for SEV-SNP late disable bugs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: delete .change_pte MMU notifier callbackPaolo Bonzini2024-04-111-20/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The .change_pte() MMU notifier callback was intended as an optimization. The original point of it was that KSM could tell KVM to flip its secondary PTE to a new location without having to first zap it. At the time there was also an .invalidate_page() callback; both of them were *not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(), and .invalidate_page() also doubled as a fallback implementation of .change_pte(). Later on, however, both callbacks were changed to occur within an invalidate_range_start/end() block. In the case of .change_pte(), commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end", 2012-10-09) did so to remove the fallback from .invalidate_page() to .change_pte() and allow sleepable .invalidate_page() hooks. This however made KVM's usage of the .change_pte() callback completely moot, because KVM unmaps the sPTEs during .invalidate_range_start() and therefore .change_pte() has no hope of finding a sPTE to change. Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as well as all the architecture specific implementations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Bibo Mao <maobibo@loongson.cn> Message-ID: <20240405115815.3226315-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | Merge tag 'modules-6.10-rc1' of ↵Linus Torvalds3 days5-35/+38
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules updates from Luis Chamberlain: "Finally something fun. Mike Rapoport does some cleanup to allow us to take out module_alloc() out of modules into a new paint shedded execmem_alloc() and execmem_free() so to make emphasis these helpers are actually used outside of modules. It starts with a non-functional changes API rename / placeholders to then allow architectures to define their requirements into a new shiny struct execmem_info with ranges, and requirements for those ranges. Archs now can intitialize this execmem_info as the last part of mm_core_init() if they have to diverge from the norm. Each range is a known type clearly articulated and spelled out in enum execmem_type. Although a lot of this is major cleanup and prep work for future enhancements an immediate clear gain is we get to enable KPROBES without MODULES now. That is ultimately what motiviated to pick this work up again, now with smaller goal as concrete stepping stone" * tag 'modules-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of kprobes: remove dependency on CONFIG_MODULES powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate x86/ftrace: enable dynamic ftrace without CONFIG_MODULES arch: make execmem setup available regardless of CONFIG_MODULES powerpc: extend execmem_params for kprobes allocations arm64: extend execmem_info for generated code allocations riscv: extend execmem_params for generated code allocations mm/execmem, arch: convert remaining overrides of module_alloc to execmem mm/execmem, arch: convert simple overrides of module_alloc to execmem mm: introduce execmem_alloc() and execmem_free() module: make module_memory_{alloc,free} more self-contained sparc: simplify module_alloc() nios2: define virtual address space for modules mips: module: rename MODULE_START to MODULES_VADDR arm64: module: remove unneeded call to kasan_alloc_module_shadow() kallsyms: replace deprecated strncpy with strscpy module: allow UNUSED_KSYMS_WHITELIST to be relative against objtree.
| * | | | arch: make execmem setup available regardless of CONFIG_MODULESMike Rapoport (IBM)4 days2-34/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | execmem does not depend on modules, on the contrary modules use execmem. To make execmem available when CONFIG_MODULES=n, for instance for kprobes, split execmem_params initialization out from arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * | | | riscv: extend execmem_params for generated code allocationsMike Rapoport (IBM)4 days4-24/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The memory allocations for kprobes and BPF on RISC-V are not placed in the modules area and these custom allocations are implemented with overrides of alloc_insn_page() and bpf_jit_alloc_exec(). Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END for 32 bit and slightly reorder execmem_params initialization to support both 32 and 64 bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in riscv::execmem_params and drop overrides of alloc_insn_page() and bpf_jit_alloc_exec(). Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
| * | | | mm/execmem, arch: convert simple overrides of module_alloc to execmemMike Rapoport (IBM)4 days1-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several architectures override module_alloc() only to define address range for code allocations different than VMALLOC address space. Provide a generic implementation in execmem that uses the parameters for address space ranges, required alignment and page protections provided by architectures. The architectures must fill execmem_info structure and implement execmem_arch_setup() that returns a pointer to that structure. This way the execmem initialization won't be called from every architecture, but rather from a central place, namely a core_initcall() in execmem. The execmem provides execmem_alloc() API that wraps __vmalloc_node_range() with the parameters defined by the architectures. If an architecture does not implement execmem_arch_setup(), execmem_alloc() will fall back to module_alloc(). Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
* | | | | Merge tag 'printk-for-6.10' of ↵Linus Torvalds3 days3-3/+3
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Use no_printk() instead of "if (0) printk()" constructs to avoid generating printk index for messages disabled at compile time - Remove deprecated strncpy/strcpy from printk.c - Remove redundant CONFIG_BASE_FULL in favor of CONFIG_BASE_SMALL * tag 'printk-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: cleanup deprecated uses of strncpy/strcpy printk: Remove redundant CONFIG_BASE_FULL printk: Change type of CONFIG_BASE_SMALL to bool printk: Fix LOG_CPU_MAX_BUF_SHIFT when BASE_SMALL is enabled ceph: Use no_printk() helper dyndbg: Use *no_printk() helpers dev_printk: Add and use dev_no_printk() printk: Let no_printk() use _printk()
| * | | | | printk: Remove redundant CONFIG_BASE_FULLYoann Congal12 days3-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_BASE_FULL is equivalent to !CONFIG_BASE_SMALL and is enabled by default: CONFIG_BASE_SMALL is the special case to take care of. So, remove CONFIG_BASE_FULL and move the config choice to CONFIG_BASE_SMALL (which defaults to 'n') For defconfigs explicitely disabling BASE_FULL, explicitely enable BASE_SMALL. For defconfigs explicitely enabling BASE_FULL, drop it as it is the default. Signed-off-by: Yoann Congal <yoann.congal@smile.fr> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240505080343.1471198-4-yoann.congal@smile.fr Signed-off-by: Petr Mladek <pmladek@suse.com>
* | | | | | Merge tag 'net-next-6.10' of ↵Linus Torvalds4 days3-14/+267
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Complete rework of garbage collection of AF_UNIX sockets. AF_UNIX is prone to forming reference count cycles due to fd passing functionality. New method based on Tarjan's Strongly Connected Components algorithm should be both faster and remove a lot of workarounds we accumulated over the years. - Add TCP fraglist GRO support, allowing chaining multiple TCP packets and forwarding them together. Useful for small switches / routers which lack basic checksum offload in some scenarios (e.g. PPPoE). - Support using SMP threads for handling packet backlog i.e. packet processing from software interfaces and old drivers which don't use NAPI. This helps move the processing out of the softirq jumble. - Continue work of converting from rtnl lock to RCU protection. Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link information available via rtnetlink. - Small optimizations from Eric to UDP wake up handling, memory accounting, RPS/RFS implementation, TCP packet sizing etc. - Allow direct page recycling in the bulk API used by XDP, for +2% PPS. - Support peek with an offset on TCP sockets. - Add MPTCP APIs for querying last time packets were received/sent/acked and whether MPTCP "upgrade" succeeded on a TCP socket. - Add intra-node communication shortcut to improve SMC performance. - Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver. - Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver. - Add reset reasons for tracing what caused a TCP reset to be sent. - Introduce direction attribute for xfrm (IPSec) states. State can be used either for input or output packet processing. Things we sprinkled into general kernel code: - Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS(). This required touch-ups and renaming of a few existing users. - Add Endian-dependent __counted_by_{le,be} annotations. - Make building selftests "quieter" by printing summaries like "CC object.o" rather than full commands with all the arguments. Netfilter: - Use GFP_KERNEL to clone elements, to deal better with OOM situations and avoid failures in the .commit step. BPF: - Add eBPF JIT for ARCv2 CPUs. - Support attaching kprobe BPF programs through kprobe_multi link in a session mode, meaning, a BPF program is attached to both function entry and return, the entry program can decide if the return program gets executed and the entry program can share u64 cookie value with return program. "Session mode" is a common use-case for tetragon and bpftrace. - Add the ability to specify and retrieve BPF cookie for raw tracepoint programs in order to ease migration from classic to raw tracepoints. - Add an internal-only BPF per-CPU instruction for resolving per-CPU memory addresses and implement support in x86, ARM64 and RISC-V JITs. This allows inlining functions which need to access per-CPU state. - Optimize x86 BPF JIT's emit_mov_imm64, and add support for various atomics in bpf_arena which can be JITed as a single x86 instruction. Support BPF arena on ARM64. - Add a new bpf_wq API for deferring events and refactor process-context bpf_timer code to keep common code where possible. - Harden the BPF verifier's and/or/xor value tracking. - Introduce crypto kfuncs to let BPF programs call kernel crypto APIs. - Support bpf_tail_call_static() helper for BPF programs with GCC 13. - Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF program to have code sections where preemption is disabled. Driver API: - Skip software TC processing completely if all installed rules are marked as HW-only, instead of checking the HW-only flag rule by rule. - Add support for configuring PoE (Power over Ethernet), similar to the already existing support for PoDL (Power over Data Line) config. - Initial bits of a queue control API, for now allowing a single queue to be reset without disturbing packet flow to other queues. - Common (ethtool) statistics for hardware timestamping. Tests and tooling: - Remove the need to create a config file to run the net forwarding tests so that a naive "make run_tests" can exercise them. - Define a method of writing tests which require an external endpoint to communicate with (to send/receive data towards the test machine). Add a few such tests. - Create a shared code library for writing Python tests. Expose the YAML Netlink library from tools/ to the tests for easy Netlink access. - Move netfilter tests under net/, extend them, separate performance tests from correctness tests, and iron out issues found by running them "on every commit". - Refactor BPF selftests to use common network helpers. - Further work filling in YAML definitions of Netlink messages for: nftables, team driver, bonding interfaces, vlan interfaces, VF info, TC u32 mark, TC police action. - Teach Python YAML Netlink to decode attribute policies. - Extend the definition of the "indexed array" construct in the specs to cover arrays of scalars rather than just nests. - Add hyperlinks between definitions in generated Netlink docs. Drivers: - Make sure unsupported flower control flags are rejected by drivers, and make more drivers report errors directly to the application rather than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen). - Ethernet high-speed NICs: - Broadcom (bnxt): - support multiple RSS contexts and steering traffic to them - support XDP metadata - make page pool allocations more NUMA aware - Intel (100G, ice, idpf): - extract datapath code common among Intel drivers into a library - use fewer resources in switchdev by sharing queues with the PF - add PFCP filter support - add Ethernet filter support - use a spinlock instead of HW lock in PTP clock ops - support 5 layer Tx scheduler topology - nVidia/Mellanox: - 800G link modes and 100G SerDes speeds - per-queue IRQ coalescing configuration - Marvell Octeon: - support offloading TC packet mark action - Ethernet NICs consumer, embedded and virtual: - stop lying about skb->truesize in USB Ethernet drivers, it messes up TCP memory calculations - Google cloud vNIC: - support changing ring size via ethtool - support ring reset using the queue control API - VirtIO net: - expose flow hash from RSS to XDP - per-queue statistics - add selftests - Synopsys (stmmac): - support controllers which require an RX clock signal from the MII bus to perform their hardware initialization - TI: - icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices - icssg_prueth: add SW TX / RX Coalescing based on hrtimers - cpsw: minimal XDP support - Renesas (ravb): - support describing the MDIO bus - Realtek (r8169): - add support for RTL8168M - Microchip Sparx5: - matchall and flower actions mirred and redirect - Ethernet switches: - nVidia/Mellanox: - improve events processing performance - Marvell: - add support for MV88E6250 family internal PHYs - Microchip: - add DCB and DSCP mapping support for KSZ switches - vsc73xx: convert to PHYLINK - Realtek: - rtl8226b/rtl8221b: add C45 instances and SerDes switching - Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup - Ethernet PHYs: - Add a new driver for Airoha EN8811H 2.5 Gigabit PHY. - micrel: lan8814: add support for PPS out and external timestamp trigger - WiFi: - Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers. Modern devices can only be configured using nl80211. - mac80211/cfg80211 - handle color change per link for WiFi 7 Multi-Link Operation - Intel (iwlwifi): - don't support puncturing in 5 GHz - support monitor mode on passive channels - BZ-W device support - P2P with HE/EHT support - re-add support for firmware API 90 - provide channel survey information for Automatic Channel Selection - MediaTek (mt76): - mt7921 LED control - mt7925 EHT radiotap support - mt7920e PCI support - Qualcomm (ath11k): - P2P support for QCA6390, WCN6855 and QCA2066 - support hibernation - ieee80211-freq-limit Device Tree property support - Qualcomm (ath12k): - refactoring in preparation of multi-link support - suspend and hibernation support - ACPI support - debugfs support, including dfs_simulate_radar support - RealTek: - rtw88: RTL8723CS SDIO device support - rtw89: RTL8922AE Wi-Fi 7 PCI device support - rtw89: complete features of new WiFi 7 chip 8922AE including BT-coexistence and Wake-on-WLAN - rtw89: use BIOS ACPI settings to set TX power and channels - rtl8xxxu: enable Management Frame Protection (MFP) support - Bluetooth: - support for Intel BlazarI and Filmore Peak2 (BE201) - support for MediaTek MT7921S SDIO - initial support for Intel PCIe BT driver - remove HCI_AMP support" * tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits) selftests: netfilter: fix packetdrill conntrack testcase net: gro: fix napi_gro_cb zeroed alignment Bluetooth: btintel_pcie: Refactor and code cleanup Bluetooth: btintel_pcie: Fix warning reported by sparse Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1 Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config Bluetooth: btintel_pcie: Fix compiler warnings Bluetooth: btintel_pcie: Add *setup* function to download firmware Bluetooth: btintel_pcie: Add support for PCIe transport Bluetooth: btintel: Export few static functions Bluetooth: HCI: Remove HCI_AMP support Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init() Bluetooth: qca: Fix error code in qca_read_fw_build_info() Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning Bluetooth: btintel: Add support for Filmore Peak2 (BE201) Bluetooth: btintel: Add support for BlazarI LE Create Connection command timeout increased to 20 secs dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth Bluetooth: compute LE flow credits based on recvbuf space Bluetooth: hci_sync: Use cmd->num_cis instead of magic number ...
| * \ \ \ \ \ Merge tag 'for-netdev' of ↵Jakub Kicinski5 days2-12/+62
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-05-13 We've added 119 non-merge commits during the last 14 day(s) which contain a total of 134 files changed, 9462 insertions(+), 4742 deletions(-). The main changes are: 1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi. 2) Add BPF range computation improvements to the verifier in particular around XOR and OR operators, refactoring of checks for range computation and relaxing MUL range computation so that src_reg can also be an unknown scalar, from Cupertino Miranda. 3) Add support to attach kprobe BPF programs through kprobe_multi link in a session mode, meaning, a BPF program is attached to both function entry and return, the entry program can decide if the return program gets executed and the entry program can share u64 cookie value with return program. Session mode is a common use-case for tetragon and bpftrace, from Jiri Olsa. 4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf as well as BPF selftest's struct_ops handling, from Andrii Nakryiko. 5) Improvements to BPF selftests in context of BPF gcc backend, from Jose E. Marchesi & David Faust. 6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test- -style in order to retire the old test, run it in BPF CI and additionally expand test coverage, from Jordan Rife. 7) Big batch for BPF selftest refactoring in order to remove duplicate code around common network helpers, from Geliang Tang. 8) Another batch of improvements to BPF selftests to retire obsolete bpf_tcp_helpers.h as everything is available vmlinux.h, from Martin KaFai Lau. 9) Fix BPF map tear-down to not walk the map twice on free when both timer and wq is used, from Benjamin Tissoires. 10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL, from Alexei Starovoitov. 11) Change BTF build scripts to using --btf_features for pahole v1.26+, from Alan Maguire. 12) Small improvements to BPF reusing struct_size() and krealloc_array(), from Andy Shevchenko. 13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions, from Ilya Leoshkevich. 14) Extend TCP ->cong_control() callback in order to feed in ack and flag parameters and allow write-access to tp->snd_cwnd_stamp from BPF program, from Miao Xu. 15) Add support for internal-only per-CPU instructions to inline bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs, from Puranjay Mohan. 16) Follow-up to remove the redundant ethtool.h from tooling infrastructure, from Tushar Vyavahare. 17) Extend libbpf to support "module:<function>" syntax for tracing programs, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) bpf: make list_for_each_entry portable bpf: ignore expected GCC warning in test_global_func10.c bpf: disable strict aliasing in test_global_func9.c selftests/bpf: Free strdup memory in xdp_hw_metadata selftests/bpf: Fix a few tests for GCC related warnings. bpf: avoid gcc overflow warning in test_xdp_vlan.c tools: remove redundant ethtool.h from tooling infra selftests/bpf: Expand ATTACH_REJECT tests selftests/bpf: Expand getsockname and getpeername tests sefltests/bpf: Expand sockaddr hook deny tests selftests/bpf: Expand sockaddr program return value tests selftests/bpf: Retire test_sock_addr.(c|sh) selftests/bpf: Remove redundant sendmsg test cases selftests/bpf: Migrate ATTACH_REJECT test cases selftests/bpf: Migrate expected_attach_type tests selftests/bpf: Migrate wildcard destination rewrite test selftests/bpf: Migrate sendmsg6 v4 mapped address tests selftests/bpf: Migrate sendmsg deny test cases selftests/bpf: Migrate WILDCARD_IP test selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases ... ==================== Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | | | riscv, bpf: make some atomic operations fully orderedPuranjay Mohan6 days1-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The BPF atomic operations with the BPF_FETCH modifier along with BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements all atomic operations except BPF_CMPXCHG with relaxed ordering. Section 8.1 of the "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic Instructions" says: | To provide more efficient support for release consistency [5], each | atomic instruction has two bits, aq and rl, used to specify additional | memory ordering constraints as viewed by other RISC-V harts. and | If only the aq bit is set, the atomic memory operation is treated as | an acquire access. | If only the rl bit is set, the atomic memory operation is treated as a | release access. | | If both the aq and rl bits are set, the atomic memory operation is | sequentially consistent. Fix this by setting both aq and rl bits as 1 for operations with BPF_FETCH and BPF_XCHG. [1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | | | riscv, bpf: Fix typo in commentXiao Wang6 days1-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can use either "instruction" or "insn" in the comment. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240507111618.437121-1-xiao.w.wang@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | | | riscv, bpf: inline bpf_get_smp_processor_id()Puranjay Mohan6 days1-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. RISCV saves the pointer to the CPU's task_struct in the TP (thread pointer) register. This makes it trivial to get the CPU's processor id. As thread_info is the first member of task_struct, we can read the processor id from TP + offsetof(struct thread_info, cpu). RISCV64 JIT output for `call bpf_get_smp_processor_id` ====================================================== Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ NOTE: This benchmark includes changes from this patch and the previous patch that implemented the per-cpu insn. [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | | | riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrsPuranjay Mohan6 days1-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. RISC-V uses generic per-cpu implementation where the offsets for CPUs are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores the address of the task_struct in TP register. The first element in task_struct is struct thread_info, and we can get the cpu number by reading from the TP register + offsetof(struct thread_info, cpu). Once we have the cpu number in a register we read the offset for that cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this offset to the destination register. To measure the improvement from this change, the benchmark in [1] was used on Qemu: Before: glob-arr-inc : 1.127 ± 0.013M/s arr-inc : 1.121 ± 0.004M/s hash-inc : 0.681 ± 0.052M/s After: glob-arr-inc : 1.138 ± 0.011M/s arr-inc : 1.366 ± 0.006M/s hash-inc : 0.676 ± 0.001M/s [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-05-028-30/+36
| |\ \ \ \ \ \ \ | | |/ / / / / / | |/| | | | | / | | | |_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | | | Merge tag 'for-netdev' of ↵Jakub Kicinski2024-04-293-2/+205
| |\ \ \ \ \ \ | | |_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-04-29 We've added 147 non-merge commits during the last 32 day(s) which contain a total of 158 files changed, 9400 insertions(+), 2213 deletions(-). The main changes are: 1) Add an internal-only BPF per-CPU instruction for resolving per-CPU memory addresses and implement support in x86 BPF JIT. This allows inlining per-CPU array and hashmap lookups and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko. 2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song. 3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various atomics in bpf_arena which can be JITed as a single x86 instruction, from Alexei Starovoitov. 4) Add support for passing mark with bpf_fib_lookup helper, from Anton Protopopov. 5) Add a new bpf_wq API for deferring events and refactor sleepable bpf_timer code to keep common code where possible, from Benjamin Tissoires. 6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs to check when NULL is passed for non-NULLable parameters, from Eduard Zingerman. 7) Harden the BPF verifier's and/or/xor value tracking, from Harishankar Vishwanathan. 8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel crypto subsystem, from Vadim Fedorenko. 9) Various improvements to the BPF instruction set standardization doc, from Dave Thaler. 10) Extend libbpf APIs to partially consume items from the BPF ringbuffer, from Andrea Righi. 11) Bigger batch of BPF selftests refactoring to use common network helpers and to drop duplicate code, from Geliang Tang. 12) Support bpf_tail_call_static() helper for BPF programs with GCC 13, from Jose E. Marchesi. 13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF program to have code sections where preemption is disabled, from Kumar Kartikeya Dwivedi. 14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs, from David Vernet. 15) Extend the BPF verifier to allow different input maps for a given bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu. 16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan. 17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison the stack memory, from Martin KaFai Lau. 18) Improve xsk selftest coverage with new tests on maximum and minimum hardware ring size configurations, from Tushar Vyavahare. 19) Various ReST man pages fixes as well as documentation and bash completion improvements for bpftool, from Rameez Rehman & Quentin Monnet. 20) Fix libbpf with regards to dumping subsequent char arrays, from Quentin Deslandes. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits) bpf, docs: Clarify PC use in instruction-set.rst bpf_helpers.h: Define bpf_tail_call_static when building with GCC bpf, docs: Add introduction for use in the ISA Internet Draft selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args selftests/bpf: dummy_st_ops should reject 0 for non-nullable params bpf: check bpf_dummy_struct_ops program params for test runs selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops selftests/bpf: adjust dummy_st_ops_success to detect additional error bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable selftests/bpf: Add ring_buffer__consume_n test. bpf: Add bpf_guard_preempt() convenience macro selftests: bpf: crypto: add benchmark for crypto functions selftests: bpf: crypto skcipher algo selftests bpf: crypto: add skcipher to bpf crypto bpf: make common crypto API for TC/XDP programs bpf: update the comment for BTF_FIELDS_MAX selftests/bpf: Fix wq test. selftests/bpf: Use make_sockaddr in test_sock_addr selftests/bpf: Use connect_to_addr in test_sock_addr ... ==================== Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | | bpf, riscv: Implement bpf_addr_space_cast instructionPuranjay Mohan2024-04-043-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LLVM generates bpf_addr_space_cast instruction while translating pointers between native (zero) address space and __attribute__((address_space(N))). The addr_space=0 is reserved as bpf_arena address space. rY = addr_space_cast(rX, 0, 1) is processed by the verifier and converted to normal 32-bit move: wX = wY rY = addr_space_cast(rX, 1, 0) has to be converted by JIT. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240404114203.105970-3-puranjay12@gmail.com
| | * | | | | bpf, riscv: Implement PROBE_MEM32 pseudo instructionsPuranjay Mohan2024-04-043-2/+189
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for [LDX | STX | ST], PROBE_MEM32, [B | H | W | DW] instructions. They are similar to PROBE_MEM instructions with the following differences: - PROBE_MEM32 supports store. - PROBE_MEM32 relies on the verifier to clear upper 32-bit of the src/dst register - PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in S7 in the prologue). Due to bpf_arena constructions such S7 + reg + off16 access is guaranteed to be within arena virtual range, so no address check at run-time. - S11 is a free callee-saved register, so it is used to store kern_vm_start - PROBE_MEM32 allows STX and ST. If they fault the store is a nop. When LDX faults the destination register is zeroed. To support these on riscv, we do tmp = S7 + src/dst reg and then use tmp2 as the new src/dst register. This allows us to reuse most of the code for normal [LDX | STX | ST]. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240404114203.105970-2-puranjay12@gmail.com
* | | | | | | Merge tag 'irq-core-2024-05-12' of ↵Linus Torvalds4 days1-0/+2
|\ \ \ \ \ \ \ | |_|_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: "Core code: - Interrupt storm detection for the lockup watchdog: Lockups which are caused by interrupt storms are not easy to debug because there is no information about the events which make the lockup detector trigger. To make this more user friendly, provide an extenstion to interrupt statistics which allows to take snapshots and an interface to retrieve the delta to the snapshot. Use this new mechanism in the watchdog code to do a two stage lockup analysis by taking the snapshot and printing the deltas for the topmost active interrupts on the second trigger. Note: This contains both the interrupt and the watchdog changes as the latter depend on the former obviously. - Avoid summation loops in the /proc/interrupts output and use the global counter when possible - Skip suspended interrupts on CPU hotplug operations to ensure that they are not delivered before the system resumes the device drivers when coming out of suspend. - On CPU hot-unplug interrupts which are affine to the outgoing CPU are migrated to a different CPU in the affinity mask. This can fail when the CPUs have no vectors left. Instead of giving up try to migrate it to any online CPU and thereby breaking the affinity setting in order to prevent a stale device interrupt which targets an offline CPU - The usual small cleanups Driver code: - Support for the RISCV AIA MSI controller - Make the interrupt allocation for the Loongson PCH controller more flexible to prevent vector exhaustion - The usual set of cleanups and fixes all over the place" * tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc cpuidle: Avoid explicit cpumask allocation on stack irqchip/sifive-plic: Avoid explicit cpumask allocation on stack irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack cpumask: Introduce cpumask_first_and_and() irqchip/irq-brcmstb-l2: Avoid saving mask on shutdown genirq: Reuse irq_is_nmi() genirq/cpuhotplug: Retry with cpu_online_mask when migration fails genirq/cpuhotplug: Skip suspended interrupts when restoring affinity arm64: dts: st: Add interrupt parent to pinctrl on stm32mp251 arm64: dts: st: Add exti1 and exti2 nodes on stm32mp251 ARM: dts: stm32: List exti parent interrupts on stm32mp131 ARM: dts: stm32: List exti parent interrupts on stm32mp151 arm64: Kconfig.platforms: Enable STM32_EXTI for ARCH_STM32 irqchip/stm32-exti: Mark events reserved with RIF configuration check irqchip/stm32-exti: Skip secure events irqchip/stm32-exti: Convert driver to standard PM ...
| * | | | | | RISC-V: Select APLIC and IMSIC driversAnup Patel2024-03-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The QEMU virt machine supports AIA emulation and quite a few RISC-V platforms with AIA support are under development so select APLIC and IMSIC drivers for all RISC-V platforms. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240307140307.646078-9-apatel@ventanamicro.com
* | | | | | | Merge tag 'soc-drivers-6.10' of ↵Linus Torvalds5 days6-20/+19
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "As usual, these are updates for drivers that are specific to certain SoCs or firmware running on them. Notable updates include - The new STMicroelectronics STM32 "firewall" bus driver that is used to provide a barrier between different parts of an SoC - Lots of updates for the Qualcomm platform drivers, in particular SCM, which gets a rewrite of its initialization code - Firmware driver updates for Arm FF-A notification interrupts and indirect messaging, SCMI firmware support for pin control and vendor specific interfaces, and TEE firmware interface changes across multiple TEE drivers - A larger cleanup of the Mediatek CMDQ driver and some related bits - Kconfig changes for riscv drivers to prepare for adding Kanaan k230 support - Multiple minor updates for the TI sysc bus driver, memory controllers, hisilicon hccs and more" * tag 'soc-drivers-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (103 commits) firmware: qcom: uefisecapp: Allow on sc8180x Primus and Flex 5G soc: qcom: pmic_glink: Make client-lock non-sleeping dt-bindings: soc: qcom,wcnss: fix bluetooth address example soc/tegra: pmc: Add EQOS wake event for Tegra194 and Tegra234 bus: stm32_firewall: fix off by one in stm32_firewall_get_firewall() bus: etzpc: introduce ETZPC firewall controller driver firmware: arm_ffa: Avoid queuing work when running on the worker queue bus: ti-sysc: Drop legacy idle quirk handling bus: ti-sysc: Drop legacy quirk handling for smartreflex bus: ti-sysc: Drop legacy quirk handling for uarts bus: ti-sysc: Add a description and copyrights bus: ti-sysc: Move check for no-reset-on-init soc: hisilicon: kunpeng_hccs: replace MAILBOX dependency with PCC soc: hisilicon: kunpeng_hccs: Add the check for obtaining complete port attribute firmware: arm_ffa: Fix memory corruption in ffa_msg_send2() bus: rifsc: introduce RIFSC firewall controller driver of: property: fw_devlink: Add support for "access-controller" soc: mediatek: mtk-socinfo: Correct the marketing name for MT8188GV soc: mediatek: mtk-socinfo: Add entry for MT8395AV/ZA Genio 1200 soc: mediatek: mtk-mutex: Add support for MT8188 VPPSYS ...
| * \ \ \ \ \ \ Merge tag 'riscv-config-for-v6.10' of ↵Arnd Bergmann11 days6-20/+19
| |\ \ \ \ \ \ \ | | |_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/drivers RISC-V SoC Kconfig Updates for v6.10 A few different bits of SoC-related Kconfig work. The first part of this is shared with the DT updates - the modification of all SOC_CANAAN users to SOC_CANAAN_K210 to split the existing m-mode nommu k210 away from the k230 that is able to be used in a "common" kernel. The other thing here is the removal of most of the SOC_VENDOR options, with their ARCH_VENDOR equivalents that've been waiting in the wings for 1 year+ now made visible. Due a lapse on my part when originally adding the ARCH_VENDOR stuff, the Microchip transition isn't complete - the _POLARFIRE was a mistake to keep as there's gonna be non-PolarFire RISC-V stuff from Microchip soonTM. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-config-for-v6.10' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: riscv: config: enable ARCH_CANAAN in defconfig RISC-V: drop SOC_VIRT for ARCH_VIRT RISC-V: drop SOC_SIFIVE for ARCH_SIFIVE RISC-V: drop SOC_MICROCHIP_POLARFIRE for ARCH_MICROCHIP RISC-V: Drop unused SOC_CANAAN reset: k210: Deprecate SOC_CANAAN and use SOC_CANAAN_K210 pinctrl: k210: Deprecate SOC_CANAAN and use SOC_CANAAN_K210 clk: k210: Deprecate SOC_CANAAN and use SOC_CANAAN_K210 soc: canaan: Deprecate SOC_CANAAN and use SOC_CANAAN_K210 for K210 riscv: Kconfig.socs: Split ARCH_CANAAN and SOC_CANAAN_K210 Link: https://lore.kernel.org/r/20240503-mardi-underling-3d81a9f97329@spud Signed-off-by: Arnd Bergmann <arnd@arndb.de>
| | * | | | | | riscv: config: enable ARCH_CANAAN in defconfigYangyu Chen2024-04-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since K230 has been supported, allow ARCH_CANAAN to be selected to build dt and drivers for it in defconfig. Signed-off-by: Yangyu Chen <cyy@cyyself.name> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Guo Ren <guoren@kernel.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
| | * | | | | | RISC-V: drop SOC_VIRT for ARCH_VIRTConor Dooley2024-04-103-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARCH_ and SOC_ versions of this symbol have persisted for quite a while now in parallel. Generated .config files from previous LTS kernels should have both. Finally remove SOC_VIRT and update all config files using it. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
| | * | | | | | RISC-V: drop SOC_SIFIVE for ARCH_SIFIVEConor Dooley2024-04-102-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the users in the kernel are gone and generated .config files from previous LTS kernels will contain ARCH_SIFIVE. Drop SOC_SIFIVE and update the defconfig. Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
| | * | | | | | RISC-V: drop SOC_MICROCHIP_POLARFIRE for ARCH_MICROCHIPConor Dooley2024-04-102-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SOC_MICROCHIP_POLARFIRE doesn't have any remaining in-kernel users, so time has come to remove it. Foolish auld me left the "POLARFIRE" in the new, hidden, config option when I renamed it and changed all of the users however. As HPSC has been announced, and is not a PolarFire, drop mention of that specific FPGA family from the new user-visible config option. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
| | * | | | | | RISC-V: Drop unused SOC_CANAANConor Dooley2024-04-051-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All users of SOC_CANAAN are now gone and it is not user-visible, remove it. Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
| | * | | | | | riscv: Kconfig.socs: Split ARCH_CANAAN and SOC_CANAAN_K210Yangyu Chen2024-04-054-5/+13
| | |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since SOC_FOO should be deprecated from patch [1], and cleanup for other SoCs is already in the mailing list [2,3,4], we remove the use of SOC_CANAAN and use ARCH_CANAAN for SoCs vendored by Canaan instead from now on. And allows ARCH_CANAAN to be selected for other Canaan SoCs. However, we should keep the config SOC_CANAAN and use def_bool to redirect the symbol temporarily to avoid potential conflict. Since we now have Canaan Kendryte K230 with MMU, ARCH_CANAAN is no longer referred to as K210. There are some special features for K210, like M-Mode No MMU and loader.bin in arch/riscv/Makefile. If we keep ARCH_CANAAN for other Canaan SoCs and remove the K210, the depends on !MMU in Kconfig may confuse some users who try to boot Kernel with MMU on K210, as Damien mentioned in the list [5]. Thus, we introduce a new symbol SOC_CANAAN_K210 for any conditional code or driver selection specific to the K210, so users will not try to build some K210-specific things when MMU is enabled and see it fails to boot on K210. [1] https://lore.kernel.org/linux-riscv/20221121221414.109965-1-conor@kernel.org/ [2] https://lore.kernel.org/linux-riscv/20240305-praying-clad-c4fbcaa7ed0a@spud/ [3] https://lore.kernel.org/linux-riscv/20240305-fled-undrilled-41dc0c46bb29@spud/ [4] https://lore.kernel.org/linux-riscv/20240305-stress-earflap-d7ddb8655a4d@spud/ [5] https://lore.kernel.org/linux-riscv/2b0511af-1b5b-4c90-a673-c9113bb58142@kernel.org/ Signed-off-by: Yangyu Chen <cyy@cyyself.name> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* | | | | | | Merge tag 'soc-dt-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds5 days6-24/+232
|\ \ \ \ \ \ \ | |_|_|_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SoC devicetree updates from Arnd Bergmann: "The updates this time are a bit smaller than most times, mainly because it is not totally dominated by new Qualcomm hardware support. Instead, we larger than average updates for Rockchips, NXP, Allwinner and TI. The only two new SoCs this time are both from NXP and are minor variants of already supported ones. The updates for aspeed, amlogic and mediatek came a little late, so I'm saving those for part 2 in a few days if everything turns out fine. New machines this time contain: - two Broadcom SoC based wireless routers from Asus - Five allwinner based consumer devices for gaming, set-top-box and eboot reader applications - Three older phones based on Qualcomm chips, plus the more recent Sony Xperia 1 V - 14 industrial and embedded boards based on NXP i.MX6, i.MX8, layerscape and s32g3 SoCs - six rockchips boards including another handheld game console and a few single-board computers On top of these, we have the usual cleanups for dtc warnings and updates to add more features to already merged machines" * tag 'soc-dt-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (612 commits) arm64: dts: marvell: espressobin-ultra: fix Ethernet Switch unit address arm64: dts: marvell: turris-mox: drop unneeded flash address/size-cells arm64: dts: marvell: eDPU: drop redundant address/size-cells arm64: dts: qcom: pm6150: correct USB VBUS regulator compatible arm64: dts: rockchip: add rk3588 pcie and php IOMMUs arm64: dts: rockchip: enable onboard spi flash for rock-3a arm64: dts: rockchip: add USB-C support to rk3588s-orangepi-5 arm64: dts: rockchip: Enable GPU on Orange Pi 5 arm64: dts: rockchip: enable GPU on khadas-edge2 arm64: dts: rockchip: Add USB3 on Edgeble NCM6A-IO board arm64: dts: rockchip: Support poweroff on Edgeble Neural Compute Module arm64: dts: rockchip: Add Radxa ROCK 3C dt-bindings: arm: rockchip: add Radxa ROCK 3C arm64: dts: exynos: gs101: specify empty clocks for remaining pinctrl arm64: dts: exynos: gs101: specify bus clock for pinctrl_hsi2 arm64: dts: exynos: gs101: specify bus clock for pinctrl_peric[01] arm64: dts: exynos: gs101: specify bus clock for pinctrl (far) alive arm64: dts: Add/fix /memory node unit-addresses arm64: dts: qcom: qcs404: fix bluetooth device address arm64: dts: qcom: sc8280xp-x13s: enable USB MP and fingerprint reader ...
| * | | | | | Merge tag 'riscv-sophgo-dt-for-v6.10' of https://github.com/sophgo/linux ↵Arnd Bergmann2024-05-024-8/+157
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into soc/dt RISC-V Devicetrees for v6.10 Sophgo: Added sdhci support for cv18xx/duo. Added clock support for cv18xx. Added clock for uart/sdhci. Added spi support for cv18xx. Added i2c support for cv18xx. Added reserved memory node for cv1800b/duo. Signed-off-by: Chen Wang <unicorn_wang@outlook.com> * tag 'riscv-sophgo-dt-for-v6.10' of https://github.com/sophgo/linux: riscv: dts: sophgo: add reserved memory node for CV1800B riscv: dts: sophgo: use real clock for sdhci riscv: dts: sophgo: cv18xx: Add i2c devices riscv: dts: sophgo: cv18xx: Add spi devices riscv: dts: sophgo: add uart clock for Sophgo CV1800 series SoC riscv: dts: sophgo: add clock generator for Sophgo CV1800 series SoC riscv: dts: sophgo: add sdcard support for milkv duo Link: https://lore.kernel.org/r/MA0P287MB2822CA2DE757787D6EA3B1F8FE192@MA0P287MB2822.INDP287.PROD.OUTLOOK.COM Signed-off-by: Arnd Bergmann <arnd@arndb.de>