summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* treewide: Replace the use of mem_encrypt_active() with cc_platform_has()Tom Lendacky2021-10-048-20/+15
| | | | | | | | | | | | | | | | Replace uses of mem_encrypt_active() with calls to cc_platform_has() with the CC_ATTR_MEM_ENCRYPT attribute. Remove the implementation of mem_encrypt_active() across all arches. For s390, since the default implementation of the cc_platform_has() matches the s390 implementation of mem_encrypt_active(), cc_platform_has() does not need to be implemented in s390 (the config option ARCH_HAS_CC_PLATFORM is not set). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-9-bp@alien8.de
* x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()Tom Lendacky2021-10-044-28/+7
| | | | | | | | | | | Replace uses of sev_es_active() with the more generic cc_platform_has() using CC_ATTR_GUEST_STATE_ENCRYPT. If future support is added for other memory encyrption techonologies, the use of CC_ATTR_GUEST_STATE_ENCRYPT can be updated, as required. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-8-bp@alien8.de
* x86/sev: Replace occurrences of sev_active() with cc_platform_has()Tom Lendacky2021-10-049-29/+27
| | | | | | | | | | | Replace uses of sev_active() with the more generic cc_platform_has() using CC_ATTR_GUEST_MEM_ENCRYPT. If future support is added for other memory encryption technologies, the use of CC_ATTR_GUEST_MEM_ENCRYPT can be updated, as required. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-7-bp@alien8.de
* x86/sme: Replace occurrences of sme_active() with cc_platform_has()Tom Lendacky2021-10-049-31/+32
| | | | | | | | | | | | | | Replace uses of sme_active() with the more generic cc_platform_has() using CC_ATTR_HOST_MEM_ENCRYPT. If future support is added for other memory encryption technologies, the use of CC_ATTR_HOST_MEM_ENCRYPT can be updated, as required. This also replaces two usages of sev_active() that are really geared towards detecting if SME is active. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-6-bp@alien8.de
* powerpc/pseries/svm: Add a powerpc version of cc_platform_has()Tom Lendacky2021-10-043-0/+29
| | | | | | | | | | | | Introduce a powerpc version of the cc_platform_has() function. This will be used to replace the powerpc mem_encrypt_active() implementation, so the implementation will initially only support the CC_ATTR_MEM_ENCRYPT attribute. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lkml.kernel.org/r/20210928191009.32551-5-bp@alien8.de
* x86/sev: Add an x86 version of cc_platform_has()Tom Lendacky2021-10-045-0/+78
| | | | | | | | | | Introduce an x86 version of the cc_platform_has() function. This will be used to replace vendor specific calls like sme_active(), sev_active(), etc. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-4-bp@alien8.de
* arch/cc: Introduce a function to check for confidential computing featuresTom Lendacky2021-10-041-0/+3
| | | | | | | | | | | | | | | | | | | In preparation for other confidential computing technologies, introduce a generic helper function, cc_platform_has(), that can be used to check for specific active confidential computing attributes, like memory encryption. This is intended to eliminate having to add multiple technology-specific checks to the code (e.g. if (sev_active() || tdx_active() || ... ). [ bp: s/_CC_PLATFORM_H/_LINUX_CC_PLATFORM_H/g ] Co-developed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-3-bp@alien8.de
* x86/ioremap: Selectively build arch override encryption functionsTom Lendacky2021-10-042-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for other uses of the cc_platform_has() function besides AMD's memory encryption support, selectively build the AMD memory encryption architecture override functions only when CONFIG_AMD_MEM_ENCRYPT=y. These functions are: - early_memremap_pgprot_adjust() - arch_memremap_can_ram_remap() Additionally, routines that are only invoked by these architecture override functions can also be conditionally built. These functions are: - memremap_should_map_decrypted() - memremap_is_efi_data() - memremap_is_setup_data() - early_memremap_is_setup_data() And finally, phys_mem_access_encrypted() is conditionally built as well, but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is not set. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-2-bp@alien8.de
* kvm: fix objtool relocation warningLinus Torvalds2021-10-031-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent change to make objtool aware of more symbol relocation types (commit 24ff65257375: "objtool: Teach get_alt_entry() about more relocation types") also added another check, and resulted in this objtool warning when building kvm on x86: arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception The reason seems to be that kvm_fastop_exception() is marked as a global symbol, which causes the relocation to ke kept around for objtool. And at the same time, the kvm_fastop_exception definition (which is done as an inline asm statement) doesn't actually set the type of the global, which then makes objtool unhappy. The minimal fix is to just not mark kvm_fastop_exception as being a global symbol. It's only used in that one compilation unit anyway, so it was always pointless. That's how all the other local exception table labels are done. I'm not entirely happy about the kinds of games that the kvm code plays with doing its own exception handling, and the fact that it confused objtool is most definitely a symptom of the code being a bit too subtle and ad-hoc. But at least this trivial one-liner makes objtool no longer upset about what is going on. Fixes: 24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types") Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/ Cc: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'perf_urgent_for_v5.15_rc4' of ↵Linus Torvalds2021-10-032-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Make sure the destroy callback is reset when a event initialization fails - Update the event constraints for Icelake - Make sure the active time of an event is updated even for inactive events * tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: fix userpage->time_enabled of inactive events perf/x86/intel: Update event constraints for ICX perf/x86: Reset destroy callback on event init failure
| * perf/x86/intel: Update event constraints for ICXKan Liang2021-10-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | According to the latest event list, the event encoding 0xEF is only available on the first 4 counters. Add it into the event constraints table. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1632842343-25862-1-git-send-email-kan.liang@linux.intel.com
| * perf/x86: Reset destroy callback on event init failureAnand K Mistry2021-10-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf_init_event tries multiple init callbacks and does not reset the event state between tries. When x86_pmu_event_init runs, it unconditionally sets the destroy callback to hw_perf_event_destroy. On the next init attempt after x86_pmu_event_init, in perf_try_init_event, if the pmu's capabilities includes PERF_PMU_CAP_NO_EXCLUDE, the destroy callback will be run. However, if the next init didn't set the destroy callback, hw_perf_event_destroy will be run (since the callback wasn't reset). Looking at other pmu init functions, the common pattern is to only set the destroy callback on a successful init. Resetting the callback on failure tries to replicate that pattern. This was discovered after commit f11dd0d80555 ("perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op") when the second (and only second) run of the perf tool after a reboot results in 0 samples being generated. The extra run of hw_perf_event_destroy results in active_events having an extra decrement on each perf run. The second run has active_events == 0 and every subsequent run has active_events < 0. When active_events == 0, the NMI handler will early-out and not record any samples. Signed-off-by: Anand K Mistry <amistry@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210929170405.1.I078b98ee7727f9ae9d6df8262bad7e325e40faf0@changeid
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2021-10-014-14/+19
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more kvm fixes from Paolo Bonzini: "Small x86 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Ensure all migrations are performed when test is affined KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm x86/kvmclock: Move this_cpu_pvti into kvmclock.h selftests: KVM: Don't clobber XMM register when read KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue
| * | KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checksSean Christopherson2021-09-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check whether a CPUID entry's index is significant before checking for a matching index to hack-a-fix an undefined behavior bug due to consuming uninitialized data. RESET/INIT emulation uses kvm_cpuid() to retrieve CPUID.0x1, which does _not_ have a significant index, and fails to initialize the dummy variable that doubles as EBX/ECX/EDX output _and_ ECX, a.k.a. index, input. Practically speaking, it's _extremely_ unlikely any compiler will yield code that causes problems, as the compiler would need to inline the kvm_cpuid() call to detect the uninitialized data, and intentionally hose the kernel, e.g. insert ud2, instead of simply ignoring the result of the index comparison. Although the sketchy "dummy" pattern was introduced in SVM by commit 66f7b72e1171 ("KVM: x86: Make register state after reset conform to specification"), it wasn't actually broken until commit 7ff6c0350315 ("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the order of operations such that "index" was checked before the significant flag. Avoid consuming uninitialized data by reverting to checking the flag before the index purely so that the fix can be easily backported; the offending RESET/INIT code has been refactored, moved, and consolidated from vendor code to common x86 since the bug was introduced. A future patch will directly address the bad RESET/INIT behavior. The undefined behavior was detected by syzbot + KernelMemorySanitizer. BUG: KMSAN: uninit-value in cpuid_entry2_find arch/x86/kvm/cpuid.c:68 BUG: KMSAN: uninit-value in kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 BUG: KMSAN: uninit-value in kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183 cpuid_entry2_find arch/x86/kvm/cpuid.c:68 [inline] kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 [inline] kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183 kvm_vcpu_reset+0x13fb/0x1c20 arch/x86/kvm/x86.c:10885 kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923 vcpu_enter_guest+0xfd2/0x6d80 arch/x86/kvm/x86.c:9534 vcpu_run+0x7f5/0x18d0 arch/x86/kvm/x86.c:9788 kvm_arch_vcpu_ioctl_run+0x245b/0x2d10 arch/x86/kvm/x86.c:10020 Local variable ----dummy@kvm_vcpu_reset created at: kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812 kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923 Reported-by: syzbot+f3985126b746b3d59c9d@syzkaller.appspotmail.com Reported-by: Alexander Potapenko <glider@google.com> Fixes: 2a24be79b6b7 ("KVM: VMX: Set EDX at INIT with CPUID.0x1, Family-Model-Stepping") Fixes: 7ff6c0350315 ("KVM: x86: Remove stateful CPUID handling") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210929222426.1855730-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | x86/kvmclock: Move this_cpu_pvti into kvmclock.hZelin Deng2021-09-302-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There're other modules might use hv_clock_per_cpu variable like ptp_kvm, so move it into kvmclock.h and export the symbol to make it visiable to other modules. Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com> Cc: <stable@vger.kernel.org> Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issueZhenzhong Duan2021-09-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry, clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i]. Modifying guest_uret_msrs directly is completely broken as 'i' does not point at the MSR_IA32_TSX_CTRL entry. In fact, it's guaranteed to be an out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior loop. By sheer dumb luck, the fallout is limited to "only" failing to preserve the host's TSX_CTRL_CPUID_CLEAR. The out-of-bounds access is benign as it's guaranteed to clear a bit in a guest MSR value, which are always zero at vCPU creation on both x86-64 and i386. Cc: stable@vger.kernel.org Fixes: 8ea8b8d6f869 ("KVM: VMX: Use common x86's uret MSR list as the one true list") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Merge tag 'net-5.15-rc4' of ↵Linus Torvalds2021-09-302-32/+91
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, netfilter and bpf. Current release - regressions: - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from interrupt - mdio: revert mechanical patches which broke handling of optional resources - dev_addr_list: prevent address duplication Previous releases - regressions: - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb (NULL deref) - Revert "mac80211: do not use low data rates for data frames with no ack flag", fixing broadcast transmissions - mac80211: fix use-after-free in CCMP/GCMP RX - netfilter: include zone id in tuple hash again, minimize collisions - netfilter: nf_tables: unlink table before deleting it (race -> UAF) - netfilter: log: work around missing softdep backend module - mptcp: don't return sockets in foreign netns - sched: flower: protect fl_walk() with rcu (race -> UAF) - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup - smsc95xx: fix stalled rx after link change - enetc: fix the incorrect clearing of IF_MODE bits - ipv4: fix rtnexthop len when RTA_FLOW is present - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this SKU - e100: fix length calculation & buffer overrun in ethtool::get_regs Previous releases - always broken: - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx - mac80211: drop frames from invalid MAC address in ad-hoc mode - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race -> UAF) - bpf, x86: Fix bpf mapping of atomic fetch implementation - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog - netfilter: ip6_tables: zero-initialize fragment offset - mhi: fix error path in mhi_net_newlink - af_unix: return errno instead of NULL in unix_create1() when over the fs.file-max limit Misc: - bpf: exempt CAP_BPF from checks against bpf_jit_limit - netfilter: conntrack: make max chain length random, prevent guessing buckets by attackers - netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic, defer conntrack walk to work queue (prevent hogging RTNL lock)" * tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) af_unix: fix races in sk_peer_pid and sk_peer_cred accesses net: stmmac: fix EEE init issue when paired with EEE capable PHYs net: dev_addr_list: handle first address in __hw_addr_add_ex net: sched: flower: protect fl_walk() with rcu net: introduce and use lock_sock_fast_nested() net: phy: bcm7xxx: Fixed indirect MMD operations net: hns3: disable firmware compatible features when uninstall PF net: hns3: fix always enable rx vlan filter problem after selftest net: hns3: PF enable promisc for VF when mac table is overflow net: hns3: fix show wrong state when add existing uc mac address net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE net: hns3: don't rollback when destroy mqprio fail net: hns3: remove tc enable checking net: hns3: do not allow call hns3_nic_net_open repeatedly ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup net: bridge: mcast: Associate the seqcount with its protecting lock. net: mdio-ipq4019: Fix the error for an optional regs resource net: hns3: fix hclge_dbg_dump_tm_pg() stack usage net: mdio: mscc-miim: Fix the mdio controller af_unix: Return errno instead of NULL in unix_create1(). ...
| * \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2021-09-282-32/+91
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2021-09-28 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 14 day(s) which contain a total of 11 files changed, 139 insertions(+), 53 deletions(-). The main changes are: 1) Fix MIPS JIT jump code emission for too large offsets, from Piotr Krysiuk. 2) Fix x86 JIT atomic/fetch emission when dst reg maps to rax, from Johan Almbladh. 3) Fix cgroup_sk_alloc corner case when called from interrupt, from Daniel Borkmann. 4) Fix segfault in libbpf's linker for objects without BTF, from Kumar Kartikeya Dwivedi. 5) Fix bpf_jit_charge_modmem for applications with CAP_BPF, from Lorenz Bauer. 6) Fix return value handling for struct_ops BPF programs, from Hou Tao. 7) Various fixes to BPF selftests, from Jiri Benc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> ,
| | * | | bpf, x86: Fix bpf mapping of atomic fetch implementationJohan Almbladh2021-09-281-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the case where the dst register maps to %rax as otherwise this produces an incorrect mapping with the implementation in 981f94c3e921 ("bpf: Add bitwise atomic instructions") as %rax is clobbered given it's part of the cmpxchg as operand. The issue is similar to b29dd96b905f ("bpf, x86: Fix BPF_FETCH atomic and/or/ xor with r0 as src") just that the case of dst register was missed. Before, dst=r0 (%rax) src=r2 (%rsi): [...] c5: mov %rax,%r10 c8: mov 0x0(%rax),%rax <---+ (broken) cc: mov %rax,%r11 | cf: and %rsi,%r11 | d2: lock cmpxchg %r11,0x0(%rax) <---+ d8: jne 0x00000000000000c8 | da: mov %rax,%rsi | dd: mov %r10,%rax | [...] | | After, dst=r0 (%rax) src=r2 (%rsi): | | [...] | da: mov %rax,%r10 | dd: mov 0x0(%r10),%rax <---+ (fixed) e1: mov %rax,%r11 | e4: and %rsi,%r11 | e7: lock cmpxchg %r11,0x0(%r10) <---+ ed: jne 0x00000000000000dd ef: mov %rax,%rsi f2: mov %r10,%rax [...] The remaining combinations were fine as-is though: After, dst=r9 (%r15) src=r0 (%rax): [...] dc: mov %rax,%r10 df: mov 0x0(%r15),%rax e3: mov %rax,%r11 e6: and %r10,%r11 e9: lock cmpxchg %r11,0x0(%r15) ef: jne 0x00000000000000df _ f1: mov %rax,%r10 | (unneeded, but f4: mov %r10,%rax _| not a problem) [...] After, dst=r9 (%r15) src=r4 (%rcx): [...] de: mov %rax,%r10 e1: mov 0x0(%r15),%rax e5: mov %rax,%r11 e8: and %rcx,%r11 eb: lock cmpxchg %r11,0x0(%r15) f1: jne 0x00000000000000e1 f3: mov %rax,%rcx f6: mov %r10,%rax [...] The case of dst == src register is rejected by the verifier and therefore not supported, but x86 JIT also handles this case just fine. After, dst=r0 (%rax) src=r0 (%rax): [...] eb: mov %rax,%r10 ee: mov 0x0(%r10),%rax f2: mov %rax,%r11 f5: and %r10,%r11 f8: lock cmpxchg %r11,0x0(%r10) fe: jne 0x00000000000000ee 100: mov %rax,%r10 103: mov %r10,%rax [...] Fixes: 981f94c3e921 ("bpf: Add bitwise atomic instructions") Reported-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Brendan Jackman <jackmanb@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
| | * | | bpf, mips: Validate conditional branch offsetsPiotr Krysiuk2021-09-151-14/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conditional branch instructions on MIPS use 18-bit signed offsets allowing for a branch range of 128 KBytes (backward and forward). However, this limit is not observed by the cBPF JIT compiler, and so the JIT compiler emits out-of-range branches when translating certain cBPF programs. A specific example of such a cBPF program is included in the "BPF_MAXINSNS: exec all MSH" test from lib/test_bpf.c that executes anomalous machine code containing incorrect branch offsets under JIT. Furthermore, this issue can be abused to craft undesirable machine code, where the control flow is hijacked to execute arbitrary Kernel code. The following steps can be used to reproduce the issue: # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf test_name="BPF_MAXINSNS: exec all MSH" This should produce multiple warnings from build_bimm() similar to: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 209 at arch/mips/mm/uasm-mips.c:210 build_insn+0x558/0x590 Micro-assembler field overflow Modules linked in: test_bpf(+) CPU: 0 PID: 209 Comm: modprobe Not tainted 5.14.3 #1 Stack : 00000000 807bb824 82b33c9c 801843c0 00000000 00000004 00000000 63c9b5ee 82b33af4 80999898 80910000 80900000 82fd6030 00000001 82b33a98 82087180 00000000 00000000 80873b28 00000000 000000fc 82b3394c 00000000 2e34312e 6d6d6f43 809a180f 809a1836 6f6d203a 80900000 00000001 82b33bac 80900000 00027f80 00000000 00000000 807bb824 00000000 804ed790 001cc317 00000001 [...] Call Trace: [<80108f44>] show_stack+0x38/0x118 [<807a7aac>] dump_stack_lvl+0x5c/0x7c [<807a4b3c>] __warn+0xcc/0x140 [<807a4c3c>] warn_slowpath_fmt+0x8c/0xb8 [<8011e198>] build_insn+0x558/0x590 [<8011e358>] uasm_i_bne+0x20/0x2c [<80127b48>] build_body+0xa58/0x2a94 [<80129c98>] bpf_jit_compile+0x114/0x1e4 [<80613fc4>] bpf_prepare_filter+0x2ec/0x4e4 [<8061423c>] bpf_prog_create+0x80/0xc4 [<c0a006e4>] test_bpf_init+0x300/0xba8 [test_bpf] [<8010051c>] do_one_initcall+0x50/0x1d4 [<801c5e54>] do_init_module+0x60/0x220 [<801c8b20>] sys_finit_module+0xc4/0xfc [<801144d0>] syscall_common+0x34/0x58 [...] ---[ end trace a287d9742503c645 ]--- Then the anomalous machine code executes: => 0xc0a18000: addiu sp,sp,-16 0xc0a18004: sw s3,0(sp) 0xc0a18008: sw s4,4(sp) 0xc0a1800c: sw s5,8(sp) 0xc0a18010: sw ra,12(sp) 0xc0a18014: move s5,a0 0xc0a18018: move s4,zero 0xc0a1801c: move s3,zero # __BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0) 0xc0a18020: lui t6,0x8012 0xc0a18024: ori t4,t6,0x9e14 0xc0a18028: li a1,0 0xc0a1802c: jalr t4 0xc0a18030: move a0,s5 0xc0a18034: bnez v0,0xc0a1ffb8 # incorrect branch offset 0xc0a18038: move v0,zero 0xc0a1803c: andi s4,s3,0xf 0xc0a18040: b 0xc0a18048 0xc0a18044: sll s4,s4,0x2 [...] # __BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0) 0xc0a1ffa0: lui t6,0x8012 0xc0a1ffa4: ori t4,t6,0x9e14 0xc0a1ffa8: li a1,0 0xc0a1ffac: jalr t4 0xc0a1ffb0: move a0,s5 0xc0a1ffb4: bnez v0,0xc0a1ffb8 # incorrect branch offset 0xc0a1ffb8: move v0,zero 0xc0a1ffbc: andi s4,s3,0xf 0xc0a1ffc0: b 0xc0a1ffc8 0xc0a1ffc4: sll s4,s4,0x2 # __BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0) 0xc0a1ffc8: lui t6,0x8012 0xc0a1ffcc: ori t4,t6,0x9e14 0xc0a1ffd0: li a1,0 0xc0a1ffd4: jalr t4 0xc0a1ffd8: move a0,s5 0xc0a1ffdc: bnez v0,0xc0a3ffb8 # correct branch offset 0xc0a1ffe0: move v0,zero 0xc0a1ffe4: andi s4,s3,0xf 0xc0a1ffe8: b 0xc0a1fff0 0xc0a1ffec: sll s4,s4,0x2 [...] # epilogue 0xc0a3ffb8: lw s3,0(sp) 0xc0a3ffbc: lw s4,4(sp) 0xc0a3ffc0: lw s5,8(sp) 0xc0a3ffc4: lw ra,12(sp) 0xc0a3ffc8: addiu sp,sp,16 0xc0a3ffcc: jr ra 0xc0a3ffd0: nop To mitigate this issue, we assert the branch ranges for each emit call that could generate an out-of-range branch. Fixes: 36366e367ee9 ("MIPS: BPF: Restore MIPS32 cBPF JIT") Fixes: c6610de353da ("MIPS: net: Add BPF JIT") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Cc: Paul Burton <paulburton@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Link: https://lore.kernel.org/bpf/20210915160437.4080-1-piotras@gmail.com
| | * | | bpf: Handle return value of BPF_PROG_TYPE_STRUCT_OPS progHou Tao2021-09-141-13/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if a function ptr in struct_ops has a return value, its caller will get a random return value from it, because the return value of related BPF_PROG_TYPE_STRUCT_OPS prog is just dropped. So adding a new flag BPF_TRAMP_F_RET_FENTRY_RET to tell bpf trampoline to save and return the return value of struct_ops prog if ret_size of the function ptr is greater than 0. Also restricting the flag to be used alone. Fixes: 85d33df357b6 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210914023351.3664499-1-houtao1@huawei.com
* | | | | Merge branch 'linus' of ↵Linus Torvalds2021-09-291-2/+3
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This contains fixes for a resource leak in ccp as well as stack corruption in x86/sm4" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/sm4 - Fix frame pointer stack corruption crypto: ccp - fix resource leaks in ccp_run_aes_gcm_cmd()
| * | | | | crypto: x86/sm4 - Fix frame pointer stack corruptionJosh Poimboeuf2021-09-241-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sm4_aesni_avx_crypt8() sets up the frame pointer (which includes pushing RBP) before doing a conditional sibling call to sm4_aesni_avx_crypt4(), which sets up an additional frame pointer. Things will not go well when sm4_aesni_avx_crypt4() pops only the innermost single frame pointer and then tries to return to the outermost frame pointer. Sibling calls need to occur with an empty stack frame. Do the conditional sibling call *before* setting up the stack pointer. This fixes the following warning: arch/x86/crypto/sm4-aesni-avx-asm_64.o: warning: objtool: sm4_aesni_avx_crypt8()+0x8: sibling call from callable instruction with modified stack frame Fixes: a7ee22ee1445 ("crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Arnd Bergmann <arnd@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | | | | Merge tag 'm68k-for-v5.15-tag3' of ↵Linus Torvalds2021-09-2824-354/+301
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull more m68k updates from Geert Uytterhoeven: - signal handling fixes - removal of set_fs() [ The set_fs removal isn't strictly a fix, but it's been pending for a while and is very welcome. The signal handling fixes resolved an issue that was incorrectly attributed to the set_fs changes - Linus ] * tag 'm68k-for-v5.15-tag3' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Remove set_fs() m68k: Provide __{get,put}_kernel_nofault m68k: Factor the 8-byte lowlevel {get,put}_user code into helpers m68k: Use BUILD_BUG for passing invalid sizes to get_user/put_user m68k: Remove the 030 case in virt_to_phys_slow m68k: Document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES m68k: Leave stack mangling to asm wrapper of sigreturn() m68k: Update ->thread.esp0 before calling syscall_trace() in ret_from_signal m68k: Handle arrivals of multiple signals correctly
| * | | | | | m68k: Remove set_fs()Christoph Hellwig2021-09-2422-117/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a m68k-only set_fc helper to set the SFC and DFC registers for the few places that need to override it for special MM operations, but disconnect that from the deprecated kernel-wide set_fs() API. Note that the SFC/DFC registers are context switched, so there is no need to disable preemption. Partially based on an earlier patch from Linus Torvalds <torvalds@linux-foundation.org>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-7-hch@lst.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
| * | | | | | m68k: Provide __{get,put}_kernel_nofaultChristoph Hellwig2021-09-241-21/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow non-faulting access to kernel addresses without overriding the address space. Implemented by passing the instruction name to the low-level assembly macros as an argument, and force the use of the normal move instructions for kernel access. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-6-hch@lst.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
| * | | | | | m68k: Factor the 8-byte lowlevel {get,put}_user code into helpersChristoph Hellwig2021-09-241-51/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new helpers for doing the grunt work of the 8-byte {get,put}_user routines to allow for better reuse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-5-hch@lst.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
| * | | | | | m68k: Use BUILD_BUG for passing invalid sizes to get_user/put_userChristoph Hellwig2021-09-241-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify the handling a bit by using the common helper instead of referencing undefined symbols. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-4-hch@lst.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
| * | | | | | m68k: Remove the 030 case in virt_to_phys_slowChristoph Hellwig2021-09-241-18/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 030 case in virt_to_phys_slow can't ever be reached, so remove it. Suggested-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-3-hch@lst.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
| * | | | | | m68k: Document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACESChristoph Hellwig2021-09-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document that access_ok is completely broken for coldfire and friends at the moment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-2-hch@lst.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
| * | | | | | m68k: Leave stack mangling to asm wrapper of sigreturn()Al Viro2021-09-245-105/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sigreturn has to deal with an unpleasant problem - exception stack frames have different sizes, depending upon the exception (and processor model, as well) and variable-sized part of exception frame may contain information needed for instruction restart. So when signal handler terminates and calls sigreturn to resume the execution at the place where we'd been when we caught the signal, it has to rearrange the frame at the bottom of kernel stack. Worse, it might need to open a gap in the kernel stack, shifting pt_regs towards lower addresses. Doing that from C is insane - we'd need to shift stack frames (return addresses, local variables, etc.) of C call chain, right under the nose of compiler and hope it won't fall apart horribly. What had been actually done is only slightly less insane - an inline asm in mangle_kernel_stack() moved the stuff around, then reset stack pointer and jumped to label in asm glue. However, we can avoid all that mess if the asm wrapper we have to use anyway would reserve some space on the stack between switch_stack and the C stack frame of do_{rt_,}sigreturn(). Then C part can simply memmove() pt_regs + switch_stack, memcpy() the variable part of exception frame into the opened gap - all of that without inline asm, buggering C call chain, magical jumps to asm labels, etc. Asm wrapper would need to know where the moved switch_stack has ended up - it might have been shifted into the gap we'd reserved before do_rt_sigreturn() call. That's where it needs to set the stack pointer to. So let the C part return just that and be done with that. While we are at it, the call of berr_040cleanup() we need to do when returning via 68040 bus error exception frame can be moved into C part as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/YP2dTQPm1wGPWFgD@zeniv-ca.linux.org.uk Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
| * | | | | | m68k: Update ->thread.esp0 before calling syscall_trace() in ret_from_signalAl Viro2021-09-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We get there when sigreturn has performed obscene acts on kernel stack; in particular, the location of pt_regs has shifted. We are about to call syscall_trace(), which might stop for tracer. If that happens, we'd better have task_pt_regs() returning correct result... Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: bd6f56a75bb2 ("m68k: Missing syscall_trace() on sigreturn") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/YP2dMWeV1LkHiOpr@zeniv-ca.linux.org.uk Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
| * | | | | | m68k: Handle arrivals of multiple signals correctlyAl Viro2021-09-241-46/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we have several pending signals, have entered with the kernel with large exception frame *and* have already built at least one sigframe, regs->stkadj is going to be non-zero and regs->format/sr/pc are going to be junk - the real values are in shifted exception stack frame we'd built when putting together the first sigframe. If that happens, subsequent sigframes are going to be garbage. Not hard to fix - just need to find the "adjusted" frame first and look for format/vector/sr/pc in it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/YP2dBIAPTaVvHiZ6@zeniv-ca.linux.org.uk Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
* | | | | | | Merge tag 'nios2_fixes_for_v5.15_part1' of ↵Linus Torvalds2021-09-282-3/+2
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux Pull nios2 fixes from Dinh Nguyen: - Fix build warning for unmet dependency for EARLY_PRINTK - Remove unused dram_start() function * tag 'nios2_fixes_for_v5.15_part1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: NIOS2: setup.c: drop unused variable 'dram_start' NIOS2: fix kconfig unmet dependency warning for SERIAL_CORE_CONSOLE
| * | | | | | | NIOS2: setup.c: drop unused variable 'dram_start'Randy Dunlap2021-09-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a nuisance when CONFIG_WERROR is set, so drop the variable declaration since the code that used it was removed. ../arch/nios2/kernel/setup.c: In function 'setup_arch': ../arch/nios2/kernel/setup.c:152:13: warning: unused variable 'dram_start' [-Wunused-variable] 152 | int dram_start; Fixes: 7f7bc20bc41a ("nios2: Don't use _end for calculating min_low_pfn") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Andreas Oetken <andreas.oetken@siemens.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
| * | | | | | | NIOS2: fix kconfig unmet dependency warning for SERIAL_CORE_CONSOLERandy Dunlap2021-09-241-1/+2
| | |_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SERIAL_CORE_CONSOLE depends on TTY so EARLY_PRINTK should also depend on TTY so that it does not select SERIAL_CORE_CONSOLE inadvertently. WARNING: unmet direct dependencies detected for SERIAL_CORE_CONSOLE Depends on [n]: TTY [=n] && HAS_IOMEM [=y] Selected by [y]: - EARLY_PRINTK [=y] Fixes: e8bf5bc776ed ("nios2: add early printk support") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
* | | | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2021-09-2723-180/+280
|\ \ \ \ \ \ \ | | |_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Paolo Bonzini: "A bit late... I got sidetracked by back-from-vacation routines and conferences. But most of these patches are already a few weeks old and things look more calm on the mailing list than what this pull request would suggest. x86: - missing TLB flush - nested virtualization fixes for SMM (secure boot on nested hypervisor) and other nested SVM fixes - syscall fuzzing fixes - live migration fix for AMD SEV - mirror VMs now work for SEV-ES too - fixes for reset - possible out-of-bounds access in IOAPIC emulation - fix enlightened VMCS on Windows 2022 ARM: - Add missing FORCE target when building the EL2 object - Fix a PMU probe regression on some platforms Generic: - KCSAN fixes selftests: - random fixes, mostly for clang compilation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) selftests: KVM: Explicitly use movq to read xmm registers selftests: KVM: Call ucall_init when setting up in rseq_test KVM: Remove tlbs_dirty KVM: X86: Synchronize the shadow pagetable before link it KVM: X86: Fix missed remote tlb flush in rmap_write_protect() KVM: x86: nSVM: don't copy virt_ext from vmcb12 KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0 KVM: x86: nSVM: restore int_vector in svm_clear_vintr kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[] KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest state KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smm KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode KVM: x86: reset pdptrs_from_userspace when exiting smm KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on SMM exit KVM: nVMX: Filter out all unsupported controls when eVMCS was activated KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUs KVM: Clean up benign vcpu->cpu data races when kicking vCPUs ...
| * | | | | | Merge tag 'kvmarm-fixes-5.15-1' of ↵Paolo Bonzini2021-09-243-5/+9
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.15, take #1 - Add missing FORCE target when building the EL2 object - Fix a PMU probe regression on some platforms
| | * | | | | | KVM: arm64: Fix PMU probe orderingMarc Zyngier2021-09-202-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Russell reported that since 5.13, KVM's probing of the PMU has started to fail on his HW. As it turns out, there is an implicit ordering dependency between the architectural PMU probing code and and KVM's own probing. If, due to probe ordering reasons, KVM probes before the PMU driver, it will fail to detect the PMU and prevent it from being advertised to guests as well as the VMM. Obviously, this is one probing too many, and we should be able to deal with any ordering. Add a callback from the PMU code into KVM to advertise the registration of a host CPU PMU, allowing for any probing order. Fixes: 5421db1be3b1 ("KVM: arm64: Divorce the perf code from oprofile helpers") Reported-by: "Russell King (Oracle)" <linux@armlinux.org.uk> Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/YUYRKVflRtUytzy5@shell.armlinux.org.uk Cc: stable@vger.kernel.org
| | * | | | | | KVM: arm64: nvhe: Fix missing FORCE for hyp-reloc.S build ruleZenghui Yu2021-09-201-1/+1
| | | |_|/ / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add FORCE so that if_changed can detect the command line change. We'll otherwise see a compilation warning since commit e1f86d7b4b2a ("kbuild: warn if FORCE is missing for if_changed(_dep,_rule) and filechk"). arch/arm64/kvm/hyp/nvhe/Makefile:58: FORCE prerequisite is missing Cc: David Brazdil <dbrazdil@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907052137.1059-1-yuzenghui@huawei.com
| * | | | | | KVM: X86: Synchronize the shadow pagetable before link itLai Jiangshan2021-09-232-9/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If gpte is changed from non-present to present, the guest doesn't need to flush tlb per SDM. So the host must synchronze sp before link it. Otherwise the guest might use a wrong mapping. For example: the guest first changes a level-1 pagetable, and then links its parent to a new place where the original gpte is non-present. Finally the guest can access the remapped area without flushing the tlb. The guest's behavior should be allowed per SDM, but the host kvm mmu makes it wrong. Fixes: 4731d4c7a077 ("KVM: MMU: out of sync shadow core") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: X86: Fix missed remote tlb flush in rmap_write_protect()Lai Jiangshan2021-09-231-21/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kvm->tlbs_dirty > 0, some rmaps might have been deleted without flushing tlb remotely after kvm_sync_page(). If @gfn was writable before and it's rmaps was deleted in kvm_sync_page(), and if the tlb entry is still in a remote running VCPU, the @gfn is not safely protected. To fix the problem, kvm_sync_page() does the remote flush when needed to avoid the problem. Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-2-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: nSVM: don't copy virt_ext from vmcb12Maxim Levitsky2021-09-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These field correspond to features that we don't expose yet to L2 While currently there are no CVE worthy features in this field, if AMD adds more features to this field, that could allow guest escapes similar to CVE-2021-3653 and CVE-2021-3656. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-6-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: nSVM: test eax for 4K alignment for GP errata workaroundMaxim Levitsky2021-09-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GP SVM errata workaround made the #GP handler always emulate the SVM instructions. However these instructions #GP in case the operand is not 4K aligned, but the workaround code didn't check this and we ended up emulating these instructions anyway. This is only an emulation accuracy check bug as there is no harm for KVM to read/write unaligned vmcb images. Fixes: 82a11e9c6fa2 ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: nSVM: restore int_vector in svm_clear_vintrMaxim Levitsky2021-09-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In svm_clear_vintr we try to restore the virtual interrupt injection that might be pending, but we fail to restore the interrupt vector. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[]Fares Mehanna2021-09-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel PMU MSRs is in msrs_to_save_all[], so add AMD PMU MSRs to have a consistent behavior between Intel and AMD when using KVM_GET_MSRS, KVM_SET_MSRS or KVM_GET_MSR_INDEX_LIST. We have to add legacy and new MSRs to handle guests running without X86_FEATURE_PERFCTR_CORE. Signed-off-by: Fares Mehanna <faresx@amazon.de> Message-Id: <20210915133951.22389-1-faresx@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: nVMX: re-evaluate emulation_required on nested VM exitMaxim Levitsky2021-09-223-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If L1 had invalid state on VM entry (can happen on SMM transactions when we enter from real mode, straight to nested guest), then after we load 'host' state from VMCS12, the state has to become valid again, but since we load the segment registers with __vmx_set_segment we weren't always updating emulation_required. Update emulation_required explicitly at end of load_vmcs12_host_state. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if ↵Maxim Levitsky2021-09-222-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | !from_vmentry It is possible that when non root mode is entered via special entry (!from_vmentry), that is from SMM or from loading the nested state, the L2 state could be invalid in regard to non unrestricted guest mode, but later it can become valid. (for example when RSM emulation restores segment registers from SMRAM) Thus delay the check to VM entry, where we will check this and fail. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest stateMaxim Levitsky2021-09-221-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since no actual VM entry happened, the VM exit information is stale. To avoid this, synthesize an invalid VM guest state VM exit. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smmMaxim Levitsky2021-09-221-66/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use return statements instead of nested if, and fix error path to free all the maps that were allocated. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>