summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* arm64: mte: Fix typo in macro definitionVincenzo Frascino2020-11-301-1/+1
| | | | | | | | | | | | | | | | | | UL in the definition of SYS_TFSR_EL1_TF1 was misspelled causing compilation issues when trying to implement in kernel MTE async mode. Fix the macro correcting the typo. Note: MTE async mode will be introduced with a future series. Fixes: c058b1c4a5ea ("arm64: mte: system register definitions") Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20201130170709.22309-1-vincenzo.frascino@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: entry: fix EL1 debug transitionsMark Rutland2020-11-302-25/+26
| | | | | | | | | | | | | | | | | | | | | | In debug_exception_enter() and debug_exception_exit() we trace hardirqs on/off while RCU isn't guaranteed to be watching, and we don't save and restore the hardirq state, and so may return with this having changed. Handle this appropriately with new entry/exit helpers which do the bare minimum to ensure this is appropriately maintained, without marking debug exceptions as NMIs. These are placed in entry-common.c with the other entry/exit helpers. In future we'll want to reconsider whether some debug exceptions should be NMIs, but this will require a significant refactoring, and for now this should prevent issues with lockdep and RCU. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marins <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-12-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: entry: fix NMI {user, kernel}->kernel transitionsMark Rutland2020-11-304-10/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exceptions which can be taken at (almost) any time are consdiered to be NMIs. On arm64 that includes: * SDEI events * GICv3 Pseudo-NMIs * Kernel stack overflows * Unexpected/unhandled exceptions ... but currently debug exceptions (BRKs, breakpoints, watchpoints, single-step) are not considered NMIs. As these can be taken at any time, kernel features (lockdep, RCU, ftrace) may not be in a consistent kernel state. For example, we may take an NMI from the idle code or partway through an entry/exit path. While nmi_enter() and nmi_exit() handle most of this state, notably they don't save/restore the lockdep state across an NMI being taken and handled. When interrupts are enabled and an NMI is taken, lockdep may see interrupts become disabled within the NMI code, but not see interrupts become enabled when returning from the NMI, leaving lockdep believing interrupts are disabled when they are actually disabled. The x86 code handles this in idtentry_{enter,exit}_nmi(), which will shortly be moved to the generic entry code. As we can't use either yet, we copy the x86 approach in arm64-specific helpers. All the NMI entrypoints are marked as noinstr to prevent any instrumentation handling code being invoked before the state has been corrected. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-11-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: entry: fix non-NMI kernel<->kernel transitionsMark Rutland2020-11-302-3/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are periods in kernel mode when RCU is not watching and/or the scheduler tick is disabled, but we can still take exceptions such as interrupts. The arm64 exception handlers do not account for this, and it's possible that RCU is not watching while an exception handler runs. The x86/generic entry code handles this by ensuring that all (non-NMI) kernel exception handlers call irqentry_enter() and irqentry_exit(), which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to the generic entry code, and already hadnle the user<->kernel transitions elsewhere, so we add new kernel<->kernel transition helpers alog the lines of the generic entry code. Since we now track interrupts becoming masked when an exception is taken, local_daif_inherit() is modified to track interrupts becoming re-enabled when the original context is inherited. To balance the entry/exit paths, each handler masks all DAIF exceptions before exit_to_kernel_mode(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-10-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: ptrace: prepare for EL1 irq/rcu trackingMark Rutland2020-11-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exceptions from EL1 may be taken when RCU isn't watching (e.g. in idle sequences), or when the lockdep hardirqs transiently out-of-sync with the hardware state (e.g. in the middle of local_irq_enable()). To correctly handle these cases, we'll need to save/restore this state across some exceptions taken from EL1. A series of subsequent patches will update EL1 exception handlers to handle this. In preparation for this, and to avoid dependencies between those patches, this patch adds two new fields to struct pt_regs so that exception handlers can track this state. Note that this is placed in pt_regs as some entry/exit sequences such as el1_irq are invoked from assembly, which makes it very difficult to add a separate structure as with the irqentry_state used by x86. We can separate this once more of the exception logic is moved to C. While the fields only need to be bool, they are both made u64 to keep pt_regs 16-byte aligned. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-9-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: entry: fix non-NMI user<->kernel transitionsMark Rutland2020-11-305-48/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE will WARN() at boot time that interrupts are enabled when we call context_tracking_user_enter(), despite the DAIF flags indicating that IRQs are masked. The problem is that we're not tracking IRQ flag changes accurately, and so lockdep believes interrupts are enabled when they are not (and vice-versa). We can shuffle things so to make this more accurate. For kernel->user transitions there are a number of constraints we need to consider: 1) When we call __context_tracking_user_enter() HW IRQs must be disabled and lockdep must be up-to-date with this. 2) Userspace should be treated as having IRQs enabled from the PoV of both lockdep and tracing. 3) As context_tracking_user_enter() stops RCU from watching, we cannot use RCU after calling it. 4) IRQ flag tracing and lockdep have state that must be manipulated before RCU is disabled. ... with similar constraints applying for user->kernel transitions, with the ordering reversed. The generic entry code has enter_from_user_mode() and exit_to_user_mode() helpers to handle this. We can't use those directly, so we add arm64 copies for now (without the instrumentation markers which aren't used on arm64). These replace the existing user_exit() and user_exit_irqoff() calls spread throughout handlers, and the exception unmasking is left as-is. Note that: * The accounting for debug exceptions from userspace now happens in el0_dbg() and ret_to_user(), so this is removed from debug_exception_enter() and debug_exception_exit(). As user_exit_irqoff() wakes RCU, the userspace-specific check is removed. * The accounting for syscalls now happens in el0_svc(), el0_svc_compat(), and ret_to_user(), so this is removed from el0_svc_common(). This does not adversely affect the workaround for erratum 1463225, as this does not depend on any of the state tracking. * In ret_to_user() we mask interrupts with local_daif_mask(), and so we need to inform lockdep and tracing. Here a trace_hardirqs_off() is sufficient and safe as we have not yet exited kernel context and RCU is usable. * As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only needs to check for the latter. * EL0 SError handling will be dealt with in a subsequent patch, as this needs to be treated as an NMI. Prior to this patch, booting an appropriately-configured kernel would result in spats as below: | DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) | WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0 | Modules linked in: | CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3 | Hardware name: linux,dummy-virt (DT) | pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--) | pc : check_flags.part.54+0x1dc/0x1f0 | lr : check_flags.part.54+0x1dc/0x1f0 | sp : ffff80001003bd80 | x29: ffff80001003bd80 x28: ffff66ce801e0000 | x27: 00000000ffffffff x26: 00000000000003c0 | x25: 0000000000000000 x24: ffffc31842527258 | x23: ffffc31842491368 x22: ffffc3184282d000 | x21: 0000000000000000 x20: 0000000000000001 | x19: ffffc318432ce000 x18: 0080000000000000 | x17: 0000000000000000 x16: ffffc31840f18a78 | x15: 0000000000000001 x14: ffffc3184285c810 | x13: 0000000000000001 x12: 0000000000000000 | x11: ffffc318415857a0 x10: ffffc318406614c0 | x9 : ffffc318415857a0 x8 : ffffc31841f1d000 | x7 : 647261685f706564 x6 : ffffc3183ff7c66c | x5 : ffff66ce801e0000 x4 : 0000000000000000 | x3 : ffffc3183fe00000 x2 : ffffc31841500000 | x1 : e956dc24146b3500 x0 : 0000000000000000 | Call trace: | check_flags.part.54+0x1dc/0x1f0 | lock_is_held_type+0x10c/0x188 | rcu_read_lock_sched_held+0x70/0x98 | __context_tracking_enter+0x310/0x350 | context_tracking_enter.part.3+0x5c/0xc8 | context_tracking_user_enter+0x6c/0x80 | finish_ret_to_user+0x2c/0x13cr Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: entry: move el1 irq/nmi logic to CMark Rutland2020-11-304-45/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for reworking the EL1 irq/nmi entry code, move the existing logic to C. We no longer need the asm_nmi_enter() and asm_nmi_exit() wrappers, so these are removed. The new C functions are marked noinstr, which prevents compiler instrumentation and runtime probing. In subsequent patches we'll want the new C helpers to be called in all cases, so we don't bother wrapping the calls with ifdeferry. Even when the new C functions are stubs the trivial calls are unlikely to have a measurable impact on the IRQ or NMI paths anyway. Prototypes are added to <asm/exception.h> as otherwise (in some configurations) GCC will complain about the lack of a forward declaration. We already do this for existing function, e.g. enter_from_user_mode(). The new helpers are marked as noinstr (which prevents all instrumentation, tracing, and kprobes). Otherwise, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-7-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: entry: prepare ret_to_user for function callMark Rutland2020-11-301-4/+5
| | | | | | | | | | | | | | | | | | | | | | In a subsequent patch ret_to_user will need to make a C function call (in some configurations) which may clobber x0-x18 at the start of the finish_ret_to_user block, before enable_step_tsk consumes the flags loaded into x1. In preparation for this, let's load the flags into x19, which is preserved across C function calls. This avoids a redundant reload of the flags and ensures we operate on a consistent shapshot regardless. There should be no functional change as a result of this patch. At this point of the entry/exit paths we only need to preserve x28 (tsk) and the sp, and x19 is free for this use. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-6-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: entry: move enter_from_user_mode to entry-common.cMark Rutland2020-11-302-7/+6
| | | | | | | | | | | | | | | | | | | In later patches we'll want to extend enter_from_user_mode() and add a corresponding exit_to_user_mode(). As these will be common for all entries/exits from userspace, it'd be better for these to live in entry-common.c with the rest of the entry logic. This patch moves enter_from_user_mode() into entry-common.c. As with other functions in entry-common.c it is marked as noinstr (which prevents all instrumentation, tracing, and kprobes) but there are no other functional changes. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-5-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: entry: mark entry code as noinstrMark Rutland2020-11-301-50/+25
| | | | | | | | | | | | | | | | | | | | Functions in entry-common.c are marked as notrace and NOKPROBE_SYMBOL(), but they're still subject to other instrumentation which may rely on lockdep/rcu/context-tracking being up-to-date, and may cause nested exceptions (e.g. for WARN/BUG or KASAN's use of BRK) which will corrupt exceptions registers which have not yet been read. Prevent this by marking all functions in entry-common.c as noinstr to prevent compiler instrumentation. This also blacklists the functions for tracing and kprobes, so we don't need to handle that separately. Functions elsewhere will be dealt with in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: mark idle code as noinstrMark Rutland2020-11-301-4/+4
| | | | | | | | | | | | | | | | | | | | | Core code disables RCU when calling arch_cpu_idle(), so it's not safe for arch_cpu_idle() or its calees to be instrumented, as the instrumentation callbacks may attempt to use RCU or other features which are unsafe to use in this context. Mark them noinstr to prevent issues. The use of local_irq_enable() in arch_cpu_idle() is similarly problematic, and the "sched/idle: Fix arch_cpu_idle() vs tracing" patch queued in the tip tree addresses that case. Reported-by: Marco Elver <elver@google.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: syscall: exit userspace before unmasking exceptionsMark Rutland2020-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | In el0_svc_common() we unmask exceptions before we call user_exit(), and so there's a window where an IRQ or debug exception can be taken while RCU is not watching. In do_debug_exception() we account for this in via debug_exception_{enter,exit}(), but in the el1_irq asm we do not and we call trace functions which rely on RCU before we have a guarantee that RCU is watching. Let's avoid this by having el0_svc_common() exit userspace before unmasking exceptions, matching what we do for all other EL0 entry paths. We can use user_exit_irqoff() to avoid the pointless save/restore of IRQ flags while we're sure exceptions are masked in DAIF. The workaround for Cortex-A76 erratum 1463225 may trigger a debug exception before this point, but the debug code invoked in this case is safe even when RCU is not watching. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()Will Deacon2020-11-231-13/+14
| | | | | | | | | | | | | | | With hardware dirty bit management, calling pte_wrprotect() on a writable, dirty PTE will lose the dirty state and return a read-only, clean entry. Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that the dirty bit is preserved for writable entries, as this is required for soft-dirty bit management if we enable it in the future. Cc: <stable@vger.kernel.org> Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20201120143557.6715-3-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: pgtable: Fix pte_accessible()Will Deacon2020-11-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | pte_accessible() is used by ptep_clear_flush() to figure out whether TLB invalidation is necessary when unmapping pages for reclaim. Although our implementation is correct according to the architecture, returning true only for valid, young ptes in the absence of racing page-table modifications, this is in fact flawed due to lazy invalidation of old ptes in ptep_clear_flush_young() where we elide the expensive DSB instruction for completing the TLB invalidation. Rather than penalise the aging path, adjust pte_accessible() to return true for any valid pte, even if the access flag is cleared. Cc: <stable@vger.kernel.org> Fixes: 76c714be0e5e ("arm64: pgtable: implement pte_accessible()") Reported-by: Yu Zhao <yuzhao@google.com> Acked-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20201120143557.6715-2-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd buildRandy Dunlap2020-11-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | Adding <asm/exception.h> brought in <asm/kprobes.h> which uses <asm/probes.h>, which uses 'pstate_check_t' so the latter needs to #include <asm/insn.h> for this typedef. Fixes this build error: In file included from arch/arm64/include/asm/kprobes.h:24, from arch/arm64/include/asm/exception.h:11, from arch/arm64/kernel/fpsimd.c:35: arch/arm64/include/asm/probes.h:16:2: error: unknown type name 'pstate_check_t' 16 | pstate_check_t *pstate_cc; Fixes: c6b90d5cf637 ("arm64/fpsimd: Fix missing-prototypes in fpsimd.c") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Tian Tao <tiantao6@hisilicon.com> Link: https://lore.kernel.org/r/20201123044510.9942-1-rdunlap@infradead.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX SilverKonrad Dybcio2020-11-131-0/+2
| | | | | | | | | | QCOM KRYO2XX Silver cores are Cortex-A53 based and are susceptible to the 845719 erratum. Add them to the lookup list to apply the erratum. Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Link: https://lore.kernel.org/r/20201104232218.198800-5-konrad.dybcio@somainline.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-listKonrad Dybcio2020-11-131-0/+1
| | | | | | | | | KRYO2XX silver (LITTLE) CPUs are based on Cortex-A53 and they are not affected by spectre-v2. Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Link: https://lore.kernel.org/r/20201104232218.198800-4-konrad.dybcio@somainline.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelistKonrad Dybcio2020-11-131-0/+2
| | | | | | | | | | QCOM KRYO2XX gold (big) silver (LITTLE) CPU cores are based on Cortex-A73 and Cortex-A53 respectively and are meltdown safe, hence add them to kpti_safe_list[]. Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Link: https://lore.kernel.org/r/20201104232218.198800-3-konrad.dybcio@somainline.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: Add MIDR value for KRYO2XX gold/silver CPU coresKonrad Dybcio2020-11-131-0/+4
| | | | | | | | | | | Add MIDR value for KRYO2XX gold (big) and silver (LITTLE) CPU cores which are used in Qualcomm Technologies, Inc. SoCs. This will be used to identify and apply errata which are applicable for these CPU cores. Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Link: https://lore.kernel.org/r/20201104232218.198800-2-konrad.dybcio@somainline.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64/mm: Validate hotplug range before creating linear mappingAnshuman Khandual2020-11-131-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During memory hotplug process, the linear mapping should not be created for a given memory range if that would fall outside the maximum allowed linear range. Else it might cause memory corruption in the kernel virtual space. Maximum linear mapping region is [PAGE_OFFSET..(PAGE_END -1)] accommodating both its ends but excluding PAGE_END. Max physical range that can be mapped inside this linear mapping range, must also be derived from its end points. This ensures that arch_add_memory() validates memory hot add range for its potential linear mapping requirements, before creating it with __create_pgd_mapping(). Fixes: 4ab215061554 ("arm64: Add memory hotplug support") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1605252614-761-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: smp: Tell RCU about CPUs that fail to come onlineWill Deacon2020-11-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured that RCU is informed early about incoming CPUs that might end up calling into printk() before they are online. However, if such a CPU fails the early CPU feature compatibility checks in check_local_cpu_capabilities(), then it will be powered off or parked without informing RCU, leading to an endless stream of stalls: | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: | rcu: 2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593 | (detected by 0, t=5252 jiffies, g=9317, q=136) | Task dump for CPU 2: | task:swapper/2 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000028 | Call trace: | ret_from_fork+0x0/0x30 Ensure that the dying CPU invokes rcu_report_dead() prior to being powered off or parked. Cc: Qian Cai <cai@redhat.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Suggested-by: Qian Cai <cai@redhat.com> Link: https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck Link: https://lore.kernel.org/r/20201106103602.9849-3-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: psci: Avoid printing in cpu_psci_cpu_die()Will Deacon2020-11-101-4/+1
| | | | | | | | | | | | | cpu_psci_cpu_die() is called in the context of the dying CPU, which will no longer be online or tracked by RCU. It is therefore not generally safe to call printk() if the PSCI "cpu off" request fails, so remove the pr_crit() invocation. Cc: Qian Cai <cai@redhat.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20201106103602.9849-2-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: kexec_file: Fix sparse warningWill Deacon2020-11-101-1/+1
| | | | | | | | | | | | | | | | Sparse gets cross about us returning 0 from image_load(), which has a return type of 'void *': >> arch/arm64/kernel/kexec_image.c:130:16: sparse: sparse: Using plain integer as NULL pointer Return NULL instead, as we don't use the return value for anything if it does not indicate an error. Cc: Benjamin Gwin <bgwin@google.com> Reported-by: kernel test robot <lkp@intel.com> Fixes: 108aa503657e ("arm64: kexec_file: try more regions if loading segments fails") Link: https://lore.kernel.org/r/202011091736.T0zH8kaC-lkp@intel.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: errata: Fix handling of 1418040 with late CPU onliningWill Deacon2020-11-102-3/+4
| | | | | | | | | | | | | | | | | | | | | In a surprising turn of events, it transpires that CPU capabilities configured as ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE are never set as the result of late-onlining. Therefore our handling of erratum 1418040 does not get activated if it is not required by any of the boot CPUs, even though we allow late-onlining of an affected CPU. In order to get things working again, replace the cpus_have_const_cap() invocation with an explicit check for the current CPU using this_cpu_has_cap(). Cc: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201106114952.10032-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: kexec_file: try more regions if loading segments failsBenjamin Gwin2020-11-052-11/+39
| | | | | | | | | | | | | | It's possible that the first region picked for the new kernel will make it impossible to fit the other segments in the required 32GB window, especially if we have a very large initrd. Instead of giving up, we can keep testing other regions for the kernel until we find one that works. Suggested-by: Ryan O'Leary <ryanoleary@google.com> Signed-off-by: Benjamin Gwin <bgwin@google.com> Link: https://lore.kernel.org/r/20201103201106.2397844-1-bgwin@google.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: kprobes: Use BRK instead of single-step when executing instructions ↵Jean-Philippe Brucker2020-11-034-47/+27
| | | | | | | | | | | | | | | | | | | | out-of-line Commit 36dadef23fcc ("kprobes: Init kprobes in early_initcall") enabled using kprobes from early_initcall. Unfortunately at this point the hardware debug infrastructure is not operational. The OS lock may still be locked, and the hardware watchpoints may have unknown values when kprobe enables debug monitors to single-step instructions. Rather than using hardware single-step, append a BRK instruction after the instruction to be executed out-of-line. Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20201103134900.337243-1-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: NUMA: Kconfig: Increase NODES_SHIFT to 4Vanshidhar Konda2020-11-031-1/+1
| | | | | | | | | | | | | The current arm64 default config limits max NUMA nodes available on system to 4 (NODES_SHIFT = 2). Today's arm64 systems can reach or exceed 16 NUMA nodes. To accomodate current hardware and to fit NODES_SHIFT within page flags on arm64, increase NODES_SHIFT to 4. Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20201020173409.1266576-1-vanshikonda@os.amperecomputing.com/ Link: https://lore.kernel.org/r/20201030173050.1182876-1-vanshikonda@os.amperecomputing.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: Change .weak to SYM_FUNC_START_WEAK_PI for arch/arm64/lib/mem*.SFangrui Song2020-10-303-6/+3
| | | | | | | | | | | | | | | | | | | | | | Commit 39d114ddc682 ("arm64: add KASAN support") added .weak directives to arch/arm64/lib/mem*.S instead of changing the existing SYM_FUNC_START_PI macros. This can lead to the assembly snippet `.weak memcpy ... .globl memcpy` which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy with LLVM's integrated assembler before LLVM 12. LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Use the appropriate SYM_FUNC_START_WEAK_PI instead. Fixes: 39d114ddc682 ("arm64: add KASAN support") Reported-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Fangrui Song <maskray@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201029181951.1866093-1-maskray@google.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64/smp: Move rcu_cpu_starting() earlierQian Cai2020-10-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The call to rcu_cpu_starting() in secondary_start_kernel() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows: WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call trace: dump_backtrace+0x0/0x3c8 show_stack+0x14/0x60 dump_stack+0x14c/0x1c4 lockdep_rcu_suspicious+0x134/0x14c __lock_acquire+0x1c30/0x2600 lock_acquire+0x274/0xc48 _raw_spin_lock+0xc8/0x140 vprintk_emit+0x90/0x3d0 vprintk_default+0x34/0x40 vprintk_func+0x378/0x590 printk+0xa8/0xd4 __cpuinfo_store_cpu+0x71c/0x868 cpuinfo_store_cpu+0x2c/0xc8 secondary_start_kernel+0x244/0x318 This is avoided by moving the call to rcu_cpu_starting up near the beginning of the secondary_start_kernel() function. Signed-off-by: Qian Cai <cai@redhat.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ Link: https://lore.kernel.org/r/20201028182614.13655-1-cai@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
* arm64: Add workaround for Arm Cortex-A77 erratum 1508412Rob Herring2020-10-2912-15/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load and a store exclusive or PAR_EL1 read can cause a deadlock. The workaround requires a DMB SY before and after a PAR_EL1 register read. In addition, it's possible an interrupt (doing a device read) or KVM guest exit could be taken between the DMB and PAR read, so we also need a DMB before returning from interrupt and before returning to a guest. A deadlock is still possible with the workaround as KVM guests must also have the workaround. IOW, a malicious guest can deadlock an affected systems. This workaround also depends on a firmware counterpart to enable the h/w to insert DMB SY after load and store exclusive instructions. See the errata document SDEN-1152370 v10 [1] for more information. [1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: Add part number for Arm Cortex-A77Rob Herring2020-10-291-0/+2
| | | | | | | | | | | Add the MIDR part number info for the Arm Cortex-A77. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201028182839.166037-1-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: efi: increase EFI PE/COFF header padding to 64 KBArd Biesheuvel2020-10-281-1/+1
| | | | | | | | | | | | | | | | | | | Commit 76085aff29f5 ("efi/libstub/arm64: align PE/COFF sections to segment alignment") increased the PE/COFF section alignment to match the minimum segment alignment of the kernel image, which ensures that the kernel does not need to be moved around in memory by the EFI stub if it was built as relocatable. However, the first PE/COFF section starts at _stext, which is only 4 KB aligned, and so the section layout is inconsistent. Existing EFI loaders seem to care little about this, but it is better to clean this up. So let's pad the header to 64 KB to match the PE/COFF section alignment. Fixes: 76085aff29f5 ("efi/libstub/arm64: align PE/COFF sections to segment alignment") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20201027073209.2897-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: vmlinux.lds: account for spurious empty .igot.plt sectionsArd Biesheuvel2020-10-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Now that we started making the linker warn about orphan sections (input sections that are not explicitly consumed by an output section), some configurations produce the following warning: aarch64-linux-gnu-ld: warning: orphan section `.igot.plt' from `arch/arm64/kernel/head.o' being placed in section `.igot.plt' It could be any file that triggers this - head.o is simply the first input file in the link - and the resulting .igot.plt section never actually appears in vmlinux as it turns out to be empty. So let's add .igot.plt to our collection of input sections to disregard unless they are empty. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20201028133332.5571-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: avoid -Woverride-init warningArnd Bergmann2020-10-282-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The icache_policy_str[] definition causes a warning when extra warning flags are enabled: arch/arm64/kernel/cpuinfo.c:38:26: warning: initialized field overwritten [-Woverride-init] 38 | [ICACHE_POLICY_VIPT] = "VIPT", | ^~~~~~ arch/arm64/kernel/cpuinfo.c:38:26: note: (near initialization for 'icache_policy_str[2]') arch/arm64/kernel/cpuinfo.c:39:26: warning: initialized field overwritten [-Woverride-init] 39 | [ICACHE_POLICY_PIPT] = "PIPT", | ^~~~~~ arch/arm64/kernel/cpuinfo.c:39:26: note: (near initialization for 'icache_policy_str[3]') arch/arm64/kernel/cpuinfo.c:40:27: warning: initialized field overwritten [-Woverride-init] 40 | [ICACHE_POLICY_VPIPT] = "VPIPT", | ^~~~~~~ arch/arm64/kernel/cpuinfo.c:40:27: note: (near initialization for 'icache_policy_str[0]') There is no real need for the default initializer here, as printing a NULL string is harmless. Rewrite the logic to have an explicit reserved value for the only one that uses the default value. This partially reverts the commit that removed ICACHE_POLICY_AIVIVT. Fixes: 155433cb365e ("arm64: cache: Remove support for ASID-tagged VIVT I-caches") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20201026193807.3816388-1-arnd@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* KVM: arm64: ARM_SMCCC_ARCH_WORKAROUND_1 doesn't return SMCCC_RET_NOT_REQUIREDStephen Boyd2020-10-282-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the SMCCC spec[1](7.5.2 Discovery) the ARM_SMCCC_ARCH_WORKAROUND_1 function id only returns 0, 1, and SMCCC_RET_NOT_SUPPORTED. 0 is "workaround required and safe to call this function" 1 is "workaround not required but safe to call this function" SMCCC_RET_NOT_SUPPORTED is "might be vulnerable or might not be, who knows, I give up!" SMCCC_RET_NOT_SUPPORTED might as well mean "workaround required, except calling this function may not work because it isn't implemented in some cases". Wonderful. We map this SMC call to 0 is SPECTRE_MITIGATED 1 is SPECTRE_UNAFFECTED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE For KVM hypercalls (hvc), we've implemented this function id to return SMCCC_RET_NOT_SUPPORTED, 0, and SMCCC_RET_NOT_REQUIRED. One of those isn't supposed to be there. Per the code we call arm64_get_spectre_v2_state() to figure out what to return for this feature discovery call. 0 is SPECTRE_MITIGATED SMCCC_RET_NOT_REQUIRED is SPECTRE_UNAFFECTED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE Let's clean this up so that KVM tells the guest this mapping: 0 is SPECTRE_MITIGATED 1 is SPECTRE_UNAFFECTED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE Note: SMCCC_RET_NOT_AFFECTED is 1 but isn't part of the SMCCC spec Fixes: c118bbb52743 ("arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Steven Price <steven.price@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://developer.arm.com/documentation/den0028/latest [1] Link: https://lore.kernel.org/r/20201023154751.1973872-1-swboyd@chromium.org Signed-off-by: Will Deacon <will@kernel.org>
* arm64: vdso32: Allow ld.lld to properly link the VDSONathan Chancellor2020-10-261-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As it stands now, the vdso32 Makefile hardcodes the linker to ld.bfd using -fuse-ld=bfd with $(CC). This was taken from the arm vDSO Makefile, as the comment notes, done in commit d2b30cd4b722 ("ARM: 8384/1: VDSO: force use of BFD linker"). Commit fe00e50b2db8 ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to link VDSO") changed that Makefile to use $(LD) directly instead of through $(CC), which matches how the rest of the kernel operates. Since then, LD=ld.lld means that the arm vDSO will be linked with ld.lld, which has shown no problems so far. Allow ld.lld to link this vDSO as we do the regular arm vDSO. To do this, we need to do a few things: * Add a LD_COMPAT variable, which defaults to $(CROSS_COMPILE_COMPAT)ld with gcc and $(LD) if LLVM is 1, which will be ld.lld, or $(CROSS_COMPILE_COMPAT)ld if not, which matches the logic of the main Makefile. It is overrideable for further customization and avoiding breakage. * Eliminate cc32-ldoption, which matches commit 055efab3120b ("kbuild: drop support for cc-ldoption"). With those, we can use $(LD_COMPAT) in cmd_ldvdso and change the flags from compiler linker flags to linker flags directly. We eliminate -mfloat-abi=soft because it is not handled by the linker. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1033 Link: https://lore.kernel.org/r/20201020011406.1818918-1-natechancellor@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
* treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches2020-10-2567-104/+104
| | | | | | | | | | | | | | | | | | | | Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'parisc-5.10-2' of ↵Linus Torvalds2020-10-253-9/+87
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull more parisc updates from Helge Deller: - During this merge window O_NONBLOCK was changed to become 000200000, but we missed that the syscalls timerfd_create(), signalfd4(), eventfd2(), pipe2(), inotify_init1() and userfaultfd() do a strict bit-wise check of the flags parameter. To provide backward compatibility with existing userspace we introduce parisc specific wrappers for those syscalls which filter out the old O_NONBLOCK value and replaces it with the new one. - Prevent HIL bus driver to get stuck when keyboard or mouse isn't attached - Improve error return codes when setting rtc time - Minor documentation fix in pata_ns87415.c * 'parisc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: ata: pata_ns87415.c: Document support on parisc with superio chip parisc: Add wrapper syscalls to fix O_NONBLOCK flag usage hil/parisc: Disable HIL driver when it gets stuck parisc: Improve error return codes when setting rtc time
| * parisc: Add wrapper syscalls to fix O_NONBLOCK flag usageHelge Deller2020-10-232-7/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 75ae04206a4d ("parisc: Define O_NONBLOCK to become 000200000") changed the O_NONBLOCK constant to have only one bit set (like all other architectures). This change broke some existing userspace code (e.g. udevadm, systemd-udevd, elogind) which called specific syscalls which do strict value checking on their flag parameter. This patch adds wrapper functions for the relevant syscalls. The wrappers masks out any old invalid O_NONBLOCK flags, reports in the syslog if the old O_NONBLOCK value was used and then calls the target syscall with the new O_NONBLOCK value. Fixes: 75ae04206a4d ("parisc: Define O_NONBLOCK to become 000200000") Signed-off-by: Helge Deller <deller@gmx.de> Tested-by: Meelis Roos <mroos@linux.ee> Tested-by: Jeroen Roovers <jer@xs4all.nl>
| * parisc: Improve error return codes when setting rtc timeHelge Deller2020-10-221-2/+9
| | | | | | | | | | | | | | | | The HP 730 machine returned strange errors when I tried setting the rtc time. Add some debug code to improve the possibility to trace errors and document that hppa probably has as Y2k38 problem. Signed-off-by: Helge Deller <deller@gmx.de>
* | Merge tag 'for-linus-5.10b-rc1c-tag' of ↵Linus Torvalds2020-10-252-8/+13
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - a series for the Xen pv block drivers adding module parameters for better control of resource usge - a cleanup series for the Xen event driver * tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: Documentation: add xen.fifo_events kernel parameter description xen/events: unmask a fifo event channel only if it was masked xen/events: only register debug interrupt for 2-level events xen/events: make struct irq_info private to events_base.c xen: remove no longer used functions xen-blkfront: Apply changed parameter name to the document xen-blkfront: add a parameter for disabling of persistent grants xen-blkback: add a parameter for disabling of persistent grants
| * | xen/events: only register debug interrupt for 2-level eventsJuergen Gross2020-10-232-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xen_debug_interrupt() is specific to 2-level event handling. So don't register it with fifo event handling being active. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Link: https://lore.kernel.org/r/20201022094907.28560-4-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
* | | Merge tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2020-10-241-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull dma-mapping fixes from Christoph Hellwig: - document the new dma_{alloc,free}_pages() API - two fixups for the dma-mapping.h split * tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: document dma_{alloc,free}_pages dma-mapping: move more functions to dma-map-ops.h ARM/sa1111: add a missing include of dma-map-ops.h
| * | | ARM/sa1111: add a missing include of dma-map-ops.hChristoph Hellwig2020-10-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure the dmabounce functions are available for all Kconfig permutations. Fixes: 0a0f0d8be76d ("dma-mapping: split <linux/dma-mapping.h>") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2020-10-245-9/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: "Two fixes for this merge window, and an unrelated bugfix for a host hang" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: ioapic: break infinite recursion on lazy EOI KVM: vmx: rename pi_init to avoid conflict with paride KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 build
| * | | | KVM: ioapic: break infinite recursion on lazy EOIVitaly Kuznetsov2020-10-241-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During shutdown the IOAPIC trigger mode is reset to edge triggered while the vfio-pci INTx is still registered with a resampler. This allows us to get into an infinite loop: ioapic_set_irq -> ioapic_lazy_update_eoi -> kvm_ioapic_update_eoi_one -> kvm_notify_acked_irq -> kvm_notify_acked_gsi -> (via irq_acked fn ptr) irqfd_resampler_ack -> kvm_set_irq -> (via set fn ptr) kvm_set_ioapic_irq -> kvm_ioapic_set_irq -> ioapic_set_irq Commit 8be8f932e3db ("kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts", 2020-05-04) acknowledges that this recursion loop exists and tries to avoid it at the call to ioapic_lazy_update_eoi, but at this point the scenario is already set, we have an edge interrupt with resampler on the same gsi. Fortunately, the only user of irq ack notifiers (in addition to resamplefd) is i8254 timer interrupt reinjection. These are edge-triggered, so in principle they would need the call to kvm_ioapic_update_eoi_one from ioapic_lazy_update_eoi, but they already disable AVIC(*), so they don't need the lazy EOI behavior. Therefore, remove the call to kvm_ioapic_update_eoi_one from ioapic_lazy_update_eoi. This fixes CVE-2020-27152. Note that this issue cannot happen with SR-IOV assigned devices because virtual functions do not have INTx, only MSI. Fixes: f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: vmx: rename pi_init to avoid conflict with paridePaolo Bonzini2020-10-243-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allyesconfig results in: ld: drivers/block/paride/paride.o: in function `pi_init': (.text+0x1340): multiple definition of `pi_init'; arch/x86/kvm/vmx/posted_intr.o:posted_intr.c:(.init.text+0x0): first defined here make: *** [Makefile:1164: vmlinux] Error 1 because commit: commit 8888cdd0996c2d51cd417f9a60a282c034f3fa28 Author: Xiaoyao Li <xiaoyao.li@intel.com> Date: Wed Sep 23 11:31:11 2020 -0700 KVM: VMX: Extract posted interrupt support to separate files added another pi_init(), though one already existed in the paride code. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 buildSean Christopherson2020-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace a modulo operator with the more common pattern for computing the gfn "offset" of a huge page to fix an i386 build error. arch/x86/kvm/mmu/tdp_mmu.c:212: undefined reference to `__umoddi3' In fact, almost all of tdp_mmu.c can be elided on 32-bit builds, but that is a much larger patch. Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs") Reported-by: Daniel Díaz <daniel.diaz@linaro.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201024031150.9318-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | | Merge tag 'x86_seves_fixes_for_v5.10_rc1' of ↵Linus Torvalds2020-10-245-16/+40
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV-ES fixes from Borislav Petkov: "Three fixes to SEV-ES to correct setting up the new early pagetable on 5-level paging machines, to always map boot_params and the kernel cmdline, and disable stack protector for ../compressed/head{32,64}.c. (Arvind Sankar)" * tag 'x86_seves_fixes_for_v5.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Explicitly map boot_params and command line x86/head/64: Disable stack protection for head$(BITS).o x86/boot/64: Initialize 5-level paging variables earlier
| * | | | | x86/boot/64: Explicitly map boot_params and command lineArvind Sankar2020-10-192-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commits ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table") 8570978ea030 ("x86/boot/compressed/64: Don't pre-map memory in KASLR code") set up a new page table in the decompressor stub, but without explicit mappings for boot_params and the kernel command line, relying on the #PF handler instead. This is fragile, as boot_params and the command line mappings are required for the main kernel. If EARLY_PRINTK and RANDOMIZE_BASE are disabled, a QEMU/OVMF boot never accesses the command line in the decompressor stub, and so it never gets mapped. The main kernel accesses it from the identity mapping if AMD_MEM_ENCRYPT is enabled, and will crash. Fix this by adding back the explicit mapping of boot_params and the command line. Note: the changes also removed the explicit mapping of the main kernel, with the result that .bss and .brk may not be in the identity mapping, but those don't get accessed by the main kernel before it switches to its own page tables. [ bp: Pass boot_params with a MOV %rsp... instead of PUSH/POP. Use block formatting for the comment. ] Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20201016200404.1615994-1-nivedita@alum.mit.edu