summaryrefslogtreecommitdiffstats
path: root/arch/arm64/kernel/entry.S
Commit message (Collapse)AuthorAgeFilesLines
* arm64: ssbd: Introduce thread flag to control userspace mitigationMarc Zyngier2018-07-221-0/+2
| | | | | | | | | | | | | | | | | | | commit 9dd9614f5476687abbff8d4b12cd08ae70d7c2ad upstream. In order to allow userspace to be mitigated on demand, let's introduce a new thread flag that prevents the mitigation from being turned off when exiting to userspace, and doesn't turn it on on entry into the kernel (with the assumption that the mitigation is always enabled in the kernel itself). This will be used by a prctl interface introduced in a later patch. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: ssbd: Skip apply_ssbd if not using dynamic mitigationMarc Zyngier2018-07-221-0/+3
| | | | | | | | | | | | | | | | | | commit 986372c4367f46b34a3c0f6918d7fb95cbdf39d6 upstream. In order to avoid checking arm64_ssbd_callback_required on each kernel entry/exit even if no mitigation is required, let's add yet another alternative that by default jumps over the mitigation, and that gets nop'ed out if we're doing dynamic mitigation. Think of it as a poor man's static key... Reviewed-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2Marc Zyngier2018-07-221-4/+7
| | | | | | | | | | | | | | | commit 5cf9ce6e5ea50f805c6188c04ed0daaec7b6887d upstream. In a heterogeneous system, we can end up with both affected and unaffected CPUs. Let's check their status before calling into the firmware. Reviewed-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1Marc Zyngier2018-07-221-0/+22
| | | | | | | | | | | | | | | | | | | commit 8e2906245f1e3b0d027169d9f2e55ce0548cb96e upstream. In order for the kernel to protect itself, let's call the SSBD mitigation implemented by the higher exception level (either hypervisor or firmware) on each transition between userspace and kernel. We must take the PSCI conduit into account in order to target the right exception level, hence the introduction of a runtime patching callback. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'arm64-upstream' of ↵Linus Torvalds2018-02-081-13/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Catalin Marinas: "As I mentioned in the last pull request, there's a second batch of security updates for arm64 with mitigations for Spectre/v1 and an improved one for Spectre/v2 (via a newly defined firmware interface API). Spectre v1 mitigation: - back-end version of array_index_mask_nospec() - masking of the syscall number to restrict speculation through the syscall table - masking of __user pointers prior to deference in uaccess routines Spectre v2 mitigation update: - using the new firmware SMC calling convention specification update - removing the current PSCI GET_VERSION firmware call mitigation as vendors are deploying new SMCCC-capable firmware - additional branch predictor hardening for synchronous exceptions and interrupts while in user mode Meltdown v3 mitigation update: - Cavium Thunder X is unaffected but a hardware erratum gets in the way. The kernel now starts with the page tables mapped as global and switches to non-global if kpti needs to be enabled. Other: - Theoretical trylock bug fixed" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits) arm64: Kill PSCI_GET_VERSION as a variant-2 workaround arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: smccc: Implement SMCCC v1.1 inline primitive arm/arm64: smccc: Make function identifiers an unsigned quantity firmware/psci: Expose SMCCC version through psci_ops firmware/psci: Expose PSCI conduit arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: KVM: Turn kvm_psci_version into a static inline arm/arm64: KVM: Advertise SMCCC v1.1 arm/arm64: KVM: Implement PSCI 1.0 support arm/arm64: KVM: Add smccc accessors to PSCI code arm/arm64: KVM: Add PSCI_VERSION helper arm/arm64: KVM: Consolidate the PSCI include files arm64: KVM: Increment PC after handling an SMC trap arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: entry: Apply BP hardening for suspicious interrupts from EL0 arm64: entry: Apply BP hardening for high-priority synchronous exceptions arm64: futex: Mask __user pointers prior to dereference ...
| * arm64: entry: Apply BP hardening for suspicious interrupts from EL0Will Deacon2018-02-061-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to take an IRQ from EL0 following a branch to a kernel address in such a way that the IRQ is prioritised over the instruction abort. Whilst an attacker would need to get the stars to align here, it might be sufficient with enough calibration so perform BP hardening in the rare case that we see a kernel address in the ELR when handling an IRQ from EL0. Reported-by: Dan Hettena <dhettena@nvidia.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: entry: Apply BP hardening for high-priority synchronous exceptionsWill Deacon2018-02-061-1/+4
| | | | | | | | | | | | | | | | | | | | Software-step and PC alignment fault exceptions have higher priority than instruction abort exceptions, so apply the BP hardening hooks there too if the user PC appears to reside in kernel space. Reported-by: Dan Hettena <dhettena@nvidia.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: entry: Ensure branch through syscall table is bounded under speculationWill Deacon2018-02-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | In a similar manner to array_index_mask_nospec, this patch introduces an assembly macro (mask_nospec64) which can be used to bound a value under speculation. This macro is then used to ensure that the indirect branch through the syscall table is bounded under speculation, with out-of-range addresses speculating as calls to sys_io_setup (0). Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: Make USER_DS an inclusive limitRobin Murphy2018-02-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, USER_DS represents an exclusive limit while KERNEL_DS is inclusive. In order to do some clever trickery for speculation-safe masking, we need them both to behave equivalently - there aren't enough bits to make KERNEL_DS exclusive, so we have precisely one option. This also happens to correct a longstanding false negative for a range ending on the very top byte of kernel memory. Mark Rutland points out that we've actually got the semantics of addresses vs. segments muddled up in most of the places we need to amend, so shuffle the {USER,KERNEL}_DS definitions around such that we can correct those properly instead of just pasting "-1"s everywhere. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: entry: Reword comment about post_ttbr_update_workaroundWill Deacon2018-02-061-10/+3
| | | | | | | | | | | | | | | | | | | | | | We don't fully understand the Cavium ThunderX erratum, but it appears that mapping the kernel as nG can lead to horrible consequences such as attempting to execute userspace from kernel context. Since kpti isn't enabled for these CPUs anyway, simplify the comment justifying the lack of post_ttbr_update_workaround in the exception trampoline. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | Merge branch 'linus' into sched/urgent, to resolve conflictsIngo Molnar2018-02-061-31/+365
|\| | | | | | | | | | | | | | | | | | | Conflicts: arch/arm64/kernel/entry.S arch/x86/Kconfig include/linux/sched/mm.h kernel/fork.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * arm64: kpti: Fix the interaction between ASID switching and software PANCatalin Marinas2018-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With ARM64_SW_TTBR0_PAN enabled, the exception entry code checks the active ASID to decide whether user access was enabled (non-zero ASID) when the exception was taken. On return from exception, if user access was previously disabled, it re-instates TTBR0_EL1 from the per-thread saved value (updated in switch_mm() or efi_set_pgd()). Commit 7655abb95386 ("arm64: mm: Move ASID from TTBR0 to TTBR1") makes a TTBR0_EL1 + ASID switching non-atomic. Subsequently, commit 27a921e75711 ("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN") changes the __uaccess_ttbr0_disable() function and asm macro to first write the reserved TTBR0_EL1 followed by the ASID=0 update in TTBR1_EL1. If an exception occurs between these two, the exception return code will re-instate a valid TTBR0_EL1. Similar scenario can happen in cpu_switch_mm() between setting the reserved TTBR0_EL1 and the ASID update in cpu_do_switch_mm(). This patch reverts the entry.S check for ASID == 0 to TTBR0_EL1 and disables the interrupts around the TTBR0_EL1 and ASID switching code in __uaccess_ttbr0_disable(). It also ensures that, when returning from the EFI runtime services, efi_set_pgd() doesn't leave a non-zero ASID in TTBR1_EL1 by using uaccess_ttbr0_{enable,disable}. The accesses to current_thread_info()->ttbr0 are updated to use READ_ONCE/WRITE_ONCE. As a safety measure, __uaccess_ttbr0_enable() always masks out any existing non-zero ASID TTBR1_EL1 before writing in the new ASID. Fixes: 27a921e75711 ("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN") Acked-by: Will Deacon <will.deacon@arm.com> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Co-developed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: entry: Move the trampoline to be before PANSteve Capper2018-01-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trampoline page tables are positioned after the early page tables in the kernel linker script. As we are about to change the early page table logic to resolve the swapper size at link time as opposed to compile time, the SWAPPER_DIR_SIZE variable (currently used to locate the trampline) will be rendered unsuitable for low level assembler. This patch solves this issue by moving the trampoline before the PAN page tables. The offset to the trampoline from ttbr1 can then be expressed by: PAGE_SIZE + RESERVED_TTBR0_SIZE, which is available to the entry assembler. Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: sdei: Add trampoline code for remapping the kernelJames Morse2018-01-141-11/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_UNMAP_KERNEL_AT_EL0 is set the SDEI entry point and the rest of the kernel may be unmapped when we take an event. If this may be the case, use an entry trampoline that can switch to the kernel page tables. We can't use the provided PSTATE to determine whether to switch page tables as we may have interrupted the kernel's entry trampoline, (or a normal-priority event that interrupted the kernel's entry trampoline). Instead test for a user ASID in ttbr1_el1. Save a value in regs->addr_limit to indicate whether we need to restore the original ASID when returning from this event. This value is only used by do_page_fault(), which we don't call with the SDEI regs. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: kernel: Add arch-specific SDEI entry code and CPU maskingJames Morse2018-01-131-0/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Software Delegated Exception Interface (SDEI) is an ARM standard for registering callbacks from the platform firmware into the OS. This is typically used to implement RAS notifications. Such notifications enter the kernel at the registered entry-point with the register values of the interrupted CPU context. Because this is not a CPU exception, it cannot reuse the existing entry code. (crucially we don't implicitly know which exception level we interrupted), Add the entry point to entry.S to set us up for calling into C code. If the event interrupted code that had interrupts masked, we always return to that location. Otherwise we pretend this was an IRQ, and use SDEI's complete_and_resume call to return to vbar_el1 + offset. This allows the kernel to deliver signals to user space processes. For KVM this triggers the world switch, a quick spin round vcpu_run, then back into the guest, unless there are pending signals. Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers the panic() code-path, which doesn't invoke cpuhotplug notifiers. Because we can interrupt entry-from/exit-to another EL, we can't trust the value in sp_el0 or x29, even if we interrupted the kernel, in this case the code in entry.S will save/restore sp_el0 and use the value in __entry_task. When we have VMAP stacks we can interrupt the stack-overflow test, which stirs x0 into sp, meaning we have to have our own VMAP stacks. For now these are allocated when we probe the interface. Future patches will add refcounting hooks to allow the arch code to allocate them lazily. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: Add skeleton to harden the branch predictor against aliasing attacksWill Deacon2018-01-081-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Aliasing attacks against CPU branch predictors can allow an attacker to redirect speculative control flow on some CPUs and potentially divulge information from one context to another. This patch adds initial skeleton code behind a new Kconfig option to enable implementation-specific mitigations against these attacks for CPUs that are affected. Co-developed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: Move post_ttbr_update_workaround to C codeMarc Zyngier2018-01-081-1/+1
| | | | | | | | | | | | | | | | | | | | We will soon need to invoke a CPU-specific function pointer after changing page tables, so move post_ttbr_update_workaround out into C code to make this possible. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: use RET instruction for exiting the trampolineWill Deacon2018-01-081-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Speculation attacks against the entry trampoline can potentially resteer the speculative instruction stream through the indirect branch and into arbitrary gadgets within the kernel. This patch defends against these attacks by forcing a misprediction through the return stack: a dummy BL instruction loads an entry into the stack, so that the predicted program flow of the subsequent RET instruction is to a branch-to-self instruction which is finally resolved as a branch to the kernel vectors with speculation suppressed. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: kaslr: Put kernel vectors address in separate data pageWill Deacon2017-12-111-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The literal pool entry for identifying the vectors base is the only piece of information in the trampoline page that identifies the true location of the kernel. This patch moves it into a page-aligned region of the .rodata section and maps this adjacent to the trampoline text via an additional fixmap entry, which protects against any accidental leakage of the trampoline contents. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: mm: Introduce TTBR_ASID_MASK for getting at the ASID in the TTBRWill Deacon2017-12-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | There are now a handful of open-coded masks to extract the ASID from a TTBR value, so introduce a TTBR_ASID_MASK and use that instead. Suggested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: entry: Add fake CPU feature for unmapping the kernel at EL0Will Deacon2017-12-111-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Allow explicit disabling of the entry trampoline on the kernel command line (kpti=off) by adding a fake CPU feature (ARM64_UNMAP_KERNEL_AT_EL0) that can be used to toggle the alternative sequences in our entry code and avoid use of the trampoline altogether if desired. This also allows us to make use of a static key in arm64_kernel_unmapped_at_el0(). Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: erratum: Work around Falkor erratum #E1003 in trampoline codeWill Deacon2017-12-111-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We rely on an atomic swizzling of TTBR1 when transitioning from the entry trampoline to the kernel proper on an exception. We can't rely on this atomicity in the face of Falkor erratum #E1003, so on affected cores we can issue a TLB invalidation to invalidate the walk cache prior to jumping into the kernel. There is still the possibility of a TLB conflict here due to conflicting walk cache entries prior to the invalidation, but this doesn't appear to be the case on these CPUs in practice. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: entry: Hook up entry trampoline to exception vectorsWill Deacon2017-12-111-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | Hook up the entry trampoline to our exception vectors so that all exceptions from and returns to EL0 go via the trampoline, which swizzles the vector base register accordingly. Transitioning to and from the kernel clobbers x30, so we use tpidrro_el0 and far_el1 as scratch registers for native tasks. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: entry: Explicitly pass exception level to kernel_ventry macroWill Deacon2017-12-111-23/+23
| | | | | | | | | | | | | | | | | | | | | | We will need to treat exceptions from EL0 differently in kernel_ventry, so rework the macro to take the exception level as an argument and construct the branch target using that. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: entry: Add exception trampoline page for exceptions from EL0Will Deacon2017-12-111-0/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | To allow unmapping of the kernel whilst running at EL0, we need to point the exception vectors at an entry trampoline that can map/unmap the kernel on entry/exit respectively. This patch adds the trampoline page, although it is not yet plugged into the vector table and is therefore unused. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PANWill Deacon2017-12-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | With the ASID now installed in TTBR1, we can re-enable ARM64_SW_TTBR0_PAN by ensuring that we switch to a reserved ASID of zero when disabling user access and restore the active user ASID on the uaccess enable path. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: mm: Rename post_ttbr0_update_workaroundWill Deacon2017-12-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | The post_ttbr0_update_workaround hook applies to any change to TTBRx_EL1. Since we're using TTBR1 for the ASID, rename the hook to make it clearer as to what it's doing. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* | membarrier/arm64: Provide core serializing commandMathieu Desnoyers2018-02-051-0/+4
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-11-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* arm64/sve: Detect SVE and activate runtime supportDave Martin2017-11-031-3/+4
| | | | | | | | | | | | | | | | | | | | This patch enables detection of hardware SVE support via the cpufeatures framework, and reports its presence to the kernel and userspace via the new ARM64_SVE cpucap and HWCAP_SVE hwcap respectively. Userspace can also detect SVE using ID_AA64PFR0_EL1, using the cpufeatures MRS emulation. When running on hardware that supports SVE, this enables runtime kernel support for SVE, and allows user tasks to execute SVE instructions and make of the of the SVE-specific user/kernel interface extensions implemented by this series. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64/sve: Core task context handlingDave Martin2017-11-031-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the core support for switching and managing the SVE architectural state of user tasks. Calls to the existing FPSIMD low-level save/restore functions are factored out as new functions task_fpsimd_{save,load}(), since SVE now dynamically may or may not need to be handled at these points depending on the kernel configuration, hardware features discovered at boot, and the runtime state of the task. To make these decisions as fast as possible, const cpucaps are used where feasible, via the system_supports_sve() helper. The SVE registers are only tracked for threads that have explicitly used SVE, indicated by the new thread flag TIF_SVE. Otherwise, the FPSIMD view of the architectural state is stored in thread.fpsimd_state as usual. When in use, the SVE registers are not stored directly in thread_struct due to their potentially large and variable size. Because the task_struct slab allocator must be configured very early during kernel boot, it is also tricky to configure it correctly to match the maximum vector length provided by the hardware, since this depends on examining secondary CPUs as well as the primary. Instead, a pointer sve_state in thread_struct points to a dynamically allocated buffer containing the SVE register data, and code is added to allocate and free this buffer at appropriate times. TIF_SVE is set when taking an SVE access trap from userspace, if suitable hardware support has been detected. This enables SVE for the thread: a subsequent return to userspace will disable the trap accordingly. If such a trap is taken without sufficient system- wide hardware support, SIGILL is sent to the thread instead as if an undefined instruction had been executed: this may happen if userspace tries to use SVE in a system where not all CPUs support it for example. The kernel will clear TIF_SVE and disable SVE for the thread whenever an explicit syscall is made by userspace. For backwards compatibility reasons and conformance with the spirit of the base AArch64 procedure call standard, the subset of the SVE register state that aliases the FPSIMD registers is still preserved across a syscall even if this happens. The remainder of the SVE register state logically becomes zero at syscall entry, though the actual zeroing work is currently deferred until the thread next tries to use SVE, causing another trap to the kernel. This implementation is suboptimal: in the future, the fastpath case may be optimised to zero the registers in-place and leave SVE enabled for the task, where beneficial. TIF_SVE is also cleared in the following slowpath cases, which are taken as reasonable hints that the task may no longer use SVE: * exec * fork and clone Code is added to sync data between thread.fpsimd_state and thread.sve_state whenever enabling/disabling SVE, in a manner consistent with the SVE architectural programmer's model. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Alex Bennée <alex.bennee@linaro.org> [will: added #include to fix allnoconfig build] [will: use enable_daif in do_sve_acc] Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: entry.S: move SError handling into a C function for future expansionXie XiuQi2017-11-021-7/+29
| | | | | | | | | | | | | | | | | | | | | | Today SError is taken using the inv_entry macro that ends up in bad_mode. SError can be used by the RAS Extensions to notify either the OS or firmware of CPU problems, some of which may have been corrected. To allow this handling to be added, add a do_serror() C function that just panic()s. Add the entry.S boiler plate to save/restore the CPU registers and unmask debug exceptions. Future patches may change do_serror() to return if the SError Interrupt was notification of a corrected error. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Wang Xiongfeng <wangxiongfengi2@huawei.com> [Split out of a bigger patch, added compat path, renamed, enabled debug exceptions] Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: entry.S: convert elX_irqJames Morse2017-11-021-2/+2
| | | | | | | | | | | | Following our 'dai' order, irqs should be processed with debug and serror exceptions unmasked. Add a helper to unmask these two, (and fiq for good measure). Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: entry.S convert el0_syncJames Morse2017-11-021-14/+10
| | | | | | | | | | | | | | | | | | el0_sync also unmasks exceptions on a case-by-case basis, debug exceptions are enabled, unless this was a debug exception. Irqs are unmasked for some exception types but not for others. el0_dbg should run with everything masked to prevent us taking a debug exception from do_debug_exception. For the other cases we can unmask everything. This changes the behaviour of fpsimd_{acc,exc} and el0_inv which previously ran with irqs masked. This patch removed the last user of enable_dbg_and_irq, remove it. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: entry.S: convert el1_syncJames Morse2017-11-021-8/+4
| | | | | | | | | | | | | | | | | el1_sync unmasks exceptions on a case-by-case basis, debug exceptions are unmasked, unless this was a debug exception. IRQs are unmasked for instruction and data aborts only if the interupted context had irqs unmasked. Following our 'dai' order, el1_dbg should run with everything masked. For the other cases we can inherit whatever we interrupted. Add a macro inherit_daif to set daif based on the interrupted pstate. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: Mask all exceptions during kernel_exitJames Morse2017-11-021-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | To take RAS Exceptions as quickly as possible we need to keep SError unmasked as much as possible. We need to mask it during kernel_exit as taking an error from this code will overwrite the exception-registers. Adding a naked 'disable_daif' to kernel_exit causes a performance problem for micro-benchmarks that do no real work, (e.g. calling getpid() in a loop). This is because the ret_to_user loop has already masked IRQs so that the TIF_WORK_MASK thread flags can't change underneath it, adding disable_daif is an additional self-synchronising operation. In the future, the RAS APEI code may need to modify the TIF_WORK_MASK flags from an SError, in which case the ret_to_user loop must mask SError while it examines the flags. Disable all exceptions for return to EL1. For return to EL0 get the ret_to_user loop to leave all exceptions masked once it has done its work, this avoids an extra pstate-write. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: move TASK_* definitions to <asm/processor.h>Yury Norov2017-10-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | ILP32 series [1] introduces the dependency on <asm/is_compat.h> for TASK_SIZE macro. Which in turn requires <asm/thread_info.h>, and <asm/thread_info.h> include <asm/memory.h>, giving a circular dependency, because TASK_SIZE is currently located in <asm/memory.h>. In other architectures, TASK_SIZE is defined in <asm/processor.h>, and moving TASK_SIZE there fixes the problem. Discussion: https://patchwork.kernel.org/patch/9929107/ [1] https://github.com/norov/linux/tree/ilp32-next CC: Will Deacon <will.deacon@arm.com> CC: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* Merge branch 'arm64/vmap-stack' of ↵Catalin Marinas2017-08-151-23/+98
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux into for-next/core * 'arm64/vmap-stack' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux: arm64: add VMAP_STACK overflow detection arm64: add on_accessible_stack() arm64: add basic VMAP_STACK support arm64: use an irq stack pointer arm64: assembler: allow adr_this_cpu to use the stack pointer arm64: factor out entry stack manipulation efi/arm64: add EFI_KIMG_ALIGN arm64: move SEGMENT_ALIGN to <asm/memory.h> arm64: clean up irq stack definitions arm64: clean up THREAD_* definitions arm64: factor out PAGE_* and CONT_* definitions arm64: kernel: remove {THREAD,IRQ_STACK}_START_SP fork: allow arch-override of VMAP stack alignment arm64: remove __die()'s stack dump
| * arm64: add VMAP_STACK overflow detectionMark Rutland2017-08-151-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds stack overflow detection to arm64, usable when vmap'd stacks are in use. Overflow is detected in a small preamble executed for each exception entry, which checks whether there is enough space on the current stack for the general purpose registers to be saved. If there is not enough space, the overflow handler is invoked on a per-cpu overflow stack. This approach preserves the original exception information in ESR_EL1 (and where appropriate, FAR_EL1). Task and IRQ stacks are aligned to double their size, enabling overflow to be detected with a single bit test. For example, a 16K stack is aligned to 32K, ensuring that bit 14 of the SP must be zero. On an overflow (or underflow), this bit is flipped. Thus, overflow (of less than the size of the stack) can be detected by testing whether this bit is set. The overflow check is performed before any attempt is made to access the stack, avoiding recursive faults (and the loss of exception information these would entail). As logical operations cannot be performed on the SP directly, the SP is temporarily swapped with a general purpose register using arithmetic operations to enable the test to be performed. This gives us a useful error message on stack overflow, as can be trigger with the LKDTM overflow test: [ 305.388749] lkdtm: Performing direct entry OVERFLOW [ 305.395444] Insufficient stack space to handle exception! [ 305.395482] ESR: 0x96000047 -- DABT (current EL) [ 305.399890] FAR: 0xffff00000a5e7f30 [ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000] [ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000] [ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0] [ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5 [ 305.412785] Hardware name: linux,dummy-virt (DT) [ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000 [ 305.419221] PC is at recursive_loop+0x10/0x48 [ 305.421637] LR is at recursive_loop+0x38/0x48 [ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145 [ 305.428020] sp : ffff00000a5e7f50 [ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00 [ 305.433191] x27: ffff000008981000 x26: ffff000008f80400 [ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8 [ 305.440369] x23: ffff000008f80138 x22: 0000000000000009 [ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188 [ 305.444552] x19: 0000000000000013 x18: 0000000000000006 [ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8 [ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8 [ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff [ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d [ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770 [ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1 [ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000 [ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400 [ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012 [ 305.467724] Kernel panic - not syncing: kernel stack overflow [ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5 [ 305.473325] Hardware name: linux,dummy-virt (DT) [ 305.475070] Call trace: [ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378 [ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20 [ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8 [ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280 [ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70 [ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128 [ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0) [ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000 [ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313 [ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548 [ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d [ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013 [ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138 [ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000 [ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50 [ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000 [ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330 [ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c [ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48 [ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20 [ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28 [ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170 [ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8 [ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128 [ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0 [ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0 [ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000) [ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0 [ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080 [ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f [ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038 [ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000 [ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0 [ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458 [ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0 [ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040 [ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28 [ 305.504720] Kernel Offset: disabled [ 305.505189] CPU features: 0x002082 [ 305.505473] Memory Limit: none [ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow This patch was co-authored by Ard Biesheuvel and Mark Rutland. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
| * arm64: use an irq stack pointerMark Rutland2017-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We allocate our IRQ stacks using a percpu array. This allows us to generate our IRQ stack pointers with adr_this_cpu, but bloats the kernel Image with the boot CPU's IRQ stack. Additionally, these are packed with other percpu variables, and aren't guaranteed to have guard pages. When we enable VMAP_STACK we'll want to vmap our IRQ stacks also, in order to provide guard pages and to permit more stringent alignment requirements. Doing so will require that we use a percpu pointer to each IRQ stack, rather than allocating a percpu IRQ stack in the kernel image. This patch updates our IRQ stack code to use a percpu pointer to the base of each IRQ stack. This will allow us to change the way the stack is allocated with minimal changes elsewhere. In some cases we may try to backtrace before the IRQ stack pointers are initialised, so on_irq_stack() is updated to account for this. In testing with cyclictest, there was no measureable difference between using adr_this_cpu (for irq_stack) and ldr_this_cpu (for irq_stack_ptr) in the IRQ entry path. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
| * arm64: factor out entry stack manipulationMark Rutland2017-08-151-21/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In subsequent patches, we will detect stack overflow in our exception entry code, by verifying the SP after it has been decremented to make space for the exception regs. This verification code is small, and we can minimize its impact by placing it directly in the vectors. To avoid redundant modification of the SP, we also need to move the initial decrement of the SP into the vectors. As a preparatory step, this patch introduces kernel_ventry, which performs this decrement, and updates the entry code accordingly. Subsequent patches will fold SP verification into kernel_ventry. There should be no functional change as a result of this patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Mark: turn into prep patch, expand commit msg] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
| * arm64: kernel: remove {THREAD,IRQ_STACK}_START_SPArd Biesheuvel2017-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For historical reasons, we leave the top 16 bytes of our task and IRQ stacks unused, a practice used to ensure that the SP can always be masked to find the base of the current stack (historically, where thread_info could be found). However, this is not necessary, as: * When an exception is taken from a task stack, we decrement the SP by S_FRAME_SIZE and stash the exception registers before we compare the SP against the task stack. In such cases, the SP must be at least S_FRAME_SIZE below the limit, and can be safely masked to determine whether the task stack is in use. * When transitioning to an IRQ stack, we'll place a dummy frame onto the IRQ stack before enabling asynchronous exceptions, or executing code we expect to trigger faults. Thus, if an exception is taken from the IRQ stack, the SP must be at least 16 bytes below the limit. * We no longer mask the SP to find the thread_info, which is now found via sp_el0. Note that historically, the offset was critical to ensure that cpu_switch_to() found the correct stack for new threads that hadn't yet executed ret_from_fork(). Given that, this initial offset serves no purpose, and can be removed. This brings us in-line with other architectures (e.g. x86) which do not rely on this masking. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Mark: rebase, kill THREAD_START_SP, commit msg additions] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
* | Merge branch 'arm64/exception-stack' of ↵Catalin Marinas2017-08-091-56/+66
|\| | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux into for-next/core * 'arm64/exception-stack' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux: arm64: unwind: remove sp from struct stackframe arm64: unwind: reference pt_regs via embedded stack frame arm64: unwind: disregard frame.sp when validating frame pointer arm64: unwind: avoid percpu indirection for irq stack arm64: move non-entry code out of .entry.text arm64: consistently use bl for C exception entry arm64: Add ASM_BUG()
| * arm64: unwind: reference pt_regs via embedded stack frameArd Biesheuvel2017-08-091-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As it turns out, the unwind code is slightly broken, and probably has been for a while. The problem is in the dumping of the exception stack, which is intended to dump the contents of the pt_regs struct at each level in the call stack where an exception was taken and routed to a routine marked as __exception (which means its stack frame is right below the pt_regs struct on the stack). 'Right below the pt_regs struct' is ill defined, though: the unwind code assigns 'frame pointer + 0x10' to the .sp member of the stackframe struct at each level, and dump_backtrace() happily dereferences that as the pt_regs pointer when encountering an __exception routine. However, the actual size of the stack frame created by this routine (which could be one of many __exception routines we have in the kernel) is not known, and so frame.sp is pretty useless to figure out where struct pt_regs really is. So it seems the only way to ensure that we can find our struct pt_regs when walking the stack frames is to put it at a known fixed offset of the stack frame pointer that is passed to such __exception routines. The simplest way to do that is to put it inside pt_regs itself, which is the main change implemented by this patch. As a bonus, doing this allows us to get rid of a fair amount of cruft related to walking from one stack to the other, which is especially nice since we intend to introduce yet another stack for overflow handling once we add support for vmapped stacks. It also fixes an inconsistency where we only add a stack frame pointing to ELR_EL1 if we are executing from the IRQ stack but not when we are executing from the task stack. To consistly identify exceptions regs even in the presence of exceptions taken from entry code, we must check whether the next frame was created by entry text, rather than whether the current frame was crated by exception text. To avoid backtracing using PCs that fall in the idmap, or are controlled by userspace, we must explcitly zero the FP and LR in startup paths, and must ensure that the frame embedded in pt_regs is zeroed upon entry from EL0. To avoid these NULL entries showin in the backtrace, unwind_frame() is updated to avoid them. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Mark: compare current frame against .entry.text, avoid bogus PCs] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com>
| * arm64: move non-entry code out of .entry.textMark Rutland2017-08-081-44/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, cpu_switch_to and ret_from_fork both live in .entry.text, though neither form the critical path for an exception entry. In subsequent patches, we will require that code in .entry.text is part of the critical path for exception entry, for which we can assume certain properties (e.g. the presence of exception regs on the stack). Neither cpu_switch_to nor ret_from_fork will meet these requirements, so we must move them out of .entry.text. To ensure that neither are kprobed after being moved out of .entry.text, we must explicitly blacklist them, requiring a new NOKPROBE() asm helper. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com>
| * arm64: consistently use bl for C exception entryMark Rutland2017-08-081-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In most cases, our exception entry assembly branches to C handlers with a BL instruction, but in cases where we do not expect to return, we use B instead. While this is correct today, it means that backtraces for fatal exceptions miss the entry assembly (as the LR is stale at the point we call C code), while non-fatal exceptions have the entry assembly in the LR. In subsequent patches, we will need the LR to be set in these cases in order to backtrace reliably. This patch updates these sites to use a BL, ensuring consistency, and preparing for backtrace rework. An ASM_BUG() is added after each of these new BLs, which both catches unexpected returns, and ensures that the LR value doesn't point to another function label. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com>
* | arm64: Abstract syscallno manipulationDave Martin2017-08-071-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -1 "no syscall" value is written in various ways, shared with the user ABI in some places, and generally obscure. This patch attempts to make things a little more consistent and readable by replacing all these uses with a single #define. A couple of symbolic helpers are provided to clarify the intent further. Because the in-syscall check in do_signal() is changed from >= 0 to != NO_SYSCALL by this patch, different behaviour may be observable if syscallno is set to values less than -1 by a tracer. However, this is not different from the behaviour that is already observable if a tracer sets syscallno to a value >= __NR_(compat_)syscalls. It appears that this can cause spurious syscall restarting, but that is not a new behaviour either, and does not appear harmful. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | arm64: syscallno is secretly an int, make it officialDave Martin2017-08-071-17/+17
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The upper 32 bits of the syscallno field in thread_struct are handled inconsistently, being sometimes zero extended and sometimes sign-extended. In fact, only the lower 32 bits seem to have any real significance for the behaviour of the code: it's been OK to handle the upper bits inconsistently because they don't matter. Currently, the only place I can find where those bits are significant is in calling trace_sys_enter(), which may be unintentional: for example, if a compat tracer attempts to cancel a syscall by passing -1 to (COMPAT_)PTRACE_SET_SYSCALL at the syscall-enter-stop, it will be traced as syscall 4294967295 rather than -1 as might be expected (and as occurs for a native tracer doing the same thing). Elsewhere, reads of syscallno cast it to an int or truncate it. There's also a conspicuous amount of code and casting to bodge around the fact that although semantically an int, syscallno is stored as a u64. Let's not pretend any more. In order to preserve the stp x instruction that stores the syscall number in entry.S, this patch special-cases the layout of struct pt_regs for big endian so that the newly 32-bit syscallno field maps onto the low bits of the stored value. This is not beautiful, but benchmarking of the getpid syscall on Juno suggests indicates a minor slowdown if the stp is split into an stp x and stp w. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: entry: improve data abort handling of tagged pointersKristina Martsenko2017-05-091-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | When handling a data abort from EL0, we currently zero the top byte of the faulting address, as we assume the address is a TTBR0 address, which may contain a non-zero address tag. However, the address may be a TTBR1 address, in which case we should not zero the top byte. This patch fixes that. The effect is that the full TTBR1 address is passed to the task's signal handler (or printed out in the kernel log). When handling a data abort from EL1, we leave the faulting address intact, as we assume it's either a TTBR1 address or a TTBR0 address with tag 0x00. This is true as far as I'm aware, we don't seem to access a tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to forget about address tags, and code added in the future may not always remember to remove tags from addresses before accessing them. So add tag handling to the EL1 data abort handler as well. This also makes it consistent with the EL0 data abort handler. Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0") Cc: <stable@vger.kernel.org> # 3.12.x- Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: avoid returning from bad_modeMark Rutland2017-01-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generally, taking an unexpected exception should be a fatal event, and bad_mode is intended to cater for this. However, it should be possible to contain unexpected synchronous exceptions from EL0 without bringing the kernel down, by sending a SIGILL to the task. We tried to apply this approach in commit 9955ac47f4ba1c95 ("arm64: don't kill the kernel on a bad esr from el0"), by sending a signal for any bad_mode call resulting from an EL0 exception. However, this also applies to other unexpected exceptions, such as SError and FIQ. The entry paths for these exceptions branch to bad_mode without configuring the link register, and have no kernel_exit. Thus, if we take one of these exceptions from EL0, bad_mode will eventually return to the original user link register value. This patch fixes this by introducing a new bad_el0_sync handler to cater for the recoverable case, and restoring bad_mode to its original state, whereby it calls panic() and never returns. The recoverable case branches to bad_el0_sync with a bl, and returns to userspace via the usual ret_to_user mechanism. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: 9955ac47f4ba1c95 ("arm64: don't kill the kernel on a bad esr from el0") Reported-by: Mark Salter <msalter@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: don't pull uaccess.h into *.SAl Viro2016-12-261-1/+1
| | | | | | | Split asm-only parts of arm64 uaccess.h into a new header and use that from *.S. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>