summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_tYang Shi2019-04-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 143c2a89e0e5fda6c6fd08d7bc1126438c19ae90 ] When running kprobe on -rt kernel, the below bug is caught: |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931 |in_atomic(): 1, irqs_disabled(): 128, pid: 14, name: migration/0 |Preemption disabled at:[<802f2b98>] cpu_stopper_thread+0xc0/0x140 |CPU: 0 PID: 14 Comm: migration/0 Tainted: G O 4.8.3-rt2 #1 |Hardware name: Freescale LS1021A |[<8025a43c>] (___might_sleep) |[<80b5b324>] (rt_spin_lock) |[<80b5c31c>] (__patch_text_real) |[<80b5c3ac>] (patch_text_stop_machine) |[<802f2920>] (multi_cpu_stop) Since patch_text_stop_machine() is called in stop_machine() which disables IRQ, sleepable lock should be not used in this atomic context, so replace patch_lock to raw lock. Signed-off-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* crypto: sha512/arm - fix crash bug in Thumb2 buildArd Biesheuvel2019-04-202-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c64316502008064c158fa40cc250665e461b0f2a ] The SHA512 code we adopted from the OpenSSL project uses a rather peculiar way to take the address of the round constant table: it takes the address of the sha256_block_data_order() routine, and substracts a constant known quantity to arrive at the base of the table, which is emitted by the same assembler code right before the routine's entry point. However, recent versions of binutils have helpfully changed the behavior of references emitted via an ADR instruction when running in Thumb2 mode: it now takes the Thumb execution mode bit into account, which is bit 0 af the address. This means the produced table address also has bit 0 set, and so we end up with an address value pointing 1 byte past the start of the table, which results in crashes such as Unable to handle kernel paging request at virtual address bf825000 pgd = 42f44b11 [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000 Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2 Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ... CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm] LR is at __this_module+0x17fd/0xffffe800 [sha256_arm] pc : [<bf820bca>] lr : [<bf824ffd>] psr: 800b0033 sp : ebc8bbe8 ip : faaabe1c fp : 2fdd3433 r10: 4c5f1692 r9 : e43037df r8 : b04b0a5a r7 : c369d722 r6 : 39c3693e r5 : 7a013189 r4 : 1580d26b r3 : 8762a9b0 r2 : eea9c2cd r1 : 3e9ab536 r0 : 1dea4ae7 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user Control: 70c5383d Table: 6b8467c0 DAC: dbadc0de Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23) Stack: (0xebc8bbe8 to 0xebc8c000) ... unwind: Unknown symbol address bf820bca unwind: Index not found bf820bca Code: 441a ea80 40f9 440a (f85e) 3b04 ---[ end trace e560cce92700ef8a ]--- Given that this affects older kernels as well, in case they are built with a recent toolchain, apply a minimal backportable fix, which is to emit another non-code label at the start of the routine, and reference that instead. (This is similar to the current upstream state of this file in OpenSSL) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* crypto: sha256/arm - fix crash bug in Thumb2 buildArd Biesheuvel2019-04-202-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 69216a545cf81b2b32d01948f7039315abaf75a0 ] The SHA256 code we adopted from the OpenSSL project uses a rather peculiar way to take the address of the round constant table: it takes the address of the sha256_block_data_order() routine, and substracts a constant known quantity to arrive at the base of the table, which is emitted by the same assembler code right before the routine's entry point. However, recent versions of binutils have helpfully changed the behavior of references emitted via an ADR instruction when running in Thumb2 mode: it now takes the Thumb execution mode bit into account, which is bit 0 af the address. This means the produced table address also has bit 0 set, and so we end up with an address value pointing 1 byte past the start of the table, which results in crashes such as Unable to handle kernel paging request at virtual address bf825000 pgd = 42f44b11 [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000 Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2 Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ... CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm] LR is at __this_module+0x17fd/0xffffe800 [sha256_arm] pc : [<bf820bca>] lr : [<bf824ffd>] psr: 800b0033 sp : ebc8bbe8 ip : faaabe1c fp : 2fdd3433 r10: 4c5f1692 r9 : e43037df r8 : b04b0a5a r7 : c369d722 r6 : 39c3693e r5 : 7a013189 r4 : 1580d26b r3 : 8762a9b0 r2 : eea9c2cd r1 : 3e9ab536 r0 : 1dea4ae7 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user Control: 70c5383d Table: 6b8467c0 DAC: dbadc0de Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23) Stack: (0xebc8bbe8 to 0xebc8c000) ... unwind: Unknown symbol address bf820bca unwind: Index not found bf820bca Code: 441a ea80 40f9 440a (f85e) 3b04 ---[ end trace e560cce92700ef8a ]--- Given that this affects older kernels as well, in case they are built with a recent toolchain, apply a minimal backportable fix, which is to emit another non-code label at the start of the routine, and reference that instead. (This is similar to the current upstream state of this file in OpenSSL) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* KVM: nVMX: restore host state in nested_vmx_vmexit for VMFailSean Christopherson2019-04-201-20/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit bd18bffca35397214ae68d85cf7203aca25c3c1d ] A VMEnter that VMFails (as opposed to VMExits) does not touch host state beyond registers that are explicitly noted in the VMFail path, e.g. EFLAGS. Host state does not need to be loaded because VMFail is only signaled for consistency checks that occur before the CPU starts to load guest state, i.e. there is no need to restore any state as nothing has been modified. But in the case where a VMFail is detected by hardware and not by KVM (due to deferring consistency checks to hardware), KVM has already loaded some amount of guest state. Luckily, "loaded" only means loaded to KVM's software model, i.e. vmcs01 has not been modified. So, unwind our software model to the pre-VMEntry host state. Not restoring host state in this VMFail path leads to a variety of failures because we end up with stale data in vcpu->arch, e.g. CR0, CR4, EFER, etc... will all be out of sync relative to vmcs01. Any significant delta in the stale data is all but guaranteed to crash L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong. An alternative to this "soft" reload would be to load host state from vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is wildly inconsistent with respect to the VMX architecture, e.g. an L1 VMM with separate VMExit and VMFail paths would explode. Note that this approach does not mean KVM is 100% accurate with respect to VMX hardware behavior, even at an architectural level (the exact order of consistency checks is microarchitecture specific). But 100% emulation accuracy isn't the goal (with this patch), rather the goal is to be consistent in the information delivered to L1, e.g. a VMExit should not fall-through VMENTER, and a VMFail should not jump to HOST_RIP. This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)", but retains the core aspects of that patch, just in an open coded form due to the need to pull state from vmcs01 instead of vmcs12. Restoring host state resolves a variety of issues introduced by commit "4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)", which remedied the incorrect behavior of treating VMFail like VMExit but in doing so neglected to restore arch state that had been modified prior to attempting nested VMEnter. A sample failure that occurs due to stale vcpu.arch state is a fault of some form while emulating an LGDT (due to emulated UMIP) from L1 after a failed VMEntry to L3, in this case when running the KVM unit test test_tpr_threshold_values in L1. L0 also hits a WARN in this case due to a stale arch.cr4.UMIP. L1: BUG: unable to handle kernel paging request at ffffc90000663b9e PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163 Oops: 0009 [#1] SMP CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:native_load_gdt+0x0/0x10 ... Call Trace: load_fixmap_gdt+0x22/0x30 __vmx_load_host_state+0x10e/0x1c0 [kvm_intel] vmx_switch_vmcs+0x2d/0x50 [kvm_intel] nested_vmx_vmexit+0x222/0x9c0 [kvm_intel] vmx_handle_exit+0x246/0x15a0 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm] kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm] do_vfs_ioctl+0x9f/0x600 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4f/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 L0: WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel] ... CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76 Hardware name: Intel Corporation Kabylake Client platform/KBL S RIP: 0010:handle_desc+0x28/0x30 [kvm_intel] ... Call Trace: kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm] kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm] do_vfs_ioctl+0x9f/0x5e0 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x49/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure) Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly) Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platformsBartlomiej Zolnierkiewicz2019-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6862fdf2201ab67cd962dbf0643d37db909f4860 ] "S3C2410 PM Suspend Memory CRC" feature (controlled by SAMSUNG_PM_CHECK config option) is incompatible with highmem (uses phys_to_virt() instead of proper mapping) which is used by the majority of Exynos boards. The issue manifests itself in OOPS on affected boards, i.e. on Odroid-U3 I got the following one: Unable to handle kernel paging request at virtual address f0000000 pgd = 1c0f9bb4 [f0000000] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP ARM [<c0458034>] (crc32_le) from [<c0121f8c>] (s3c_pm_makecheck+0x34/0x54) [<c0121f8c>] (s3c_pm_makecheck) from [<c0121efc>] (s3c_pm_run_res+0x74/0x8c) [<c0121efc>] (s3c_pm_run_res) from [<c0121ecc>] (s3c_pm_run_res+0x44/0x8c) [<c0121ecc>] (s3c_pm_run_res) from [<c01210b8>] (exynos_suspend_enter+0x64/0x148) [<c01210b8>] (exynos_suspend_enter) from [<c018893c>] (suspend_devices_and_enter+0x9ec/0xe74) [<c018893c>] (suspend_devices_and_enter) from [<c0189534>] (pm_suspend+0x770/0xc04) [<c0189534>] (pm_suspend) from [<c0186ce8>] (state_store+0x6c/0xcc) [<c0186ce8>] (state_store) from [<c09db434>] (kobj_attr_store+0x14/0x20) [<c09db434>] (kobj_attr_store) from [<c02fa63c>] (sysfs_kf_write+0x4c/0x50) [<c02fa63c>] (sysfs_kf_write) from [<c02f97a4>] (kernfs_fop_write+0xfc/0x1e4) [<c02f97a4>] (kernfs_fop_write) from [<c027b198>] (__vfs_write+0x2c/0x140) [<c027b198>] (__vfs_write) from [<c027b418>] (vfs_write+0xa4/0x160) [<c027b418>] (vfs_write) from [<c027b5d8>] (ksys_write+0x40/0x8c) [<c027b5d8>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28) Add PLAT_S3C24XX, ARCH_S3C64XX and ARCH_S5PV210 dependencies to SAMSUNG_PM_CHECK config option to hide it on Exynos platforms. Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/pseries: Remove prrn_work workqueueNathan Fontenot2019-04-201-14/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cd24e457fd8b2d087d9236700c8d2957054598bf ] When a PRRN event is received we are already running in a worker thread. Instead of spawning off another worker thread on the prrn_work workqueue to handle the PRRN event we can just call the PRRN handler routine directly. With this update we can also pass the scope variable for the PRRN event directly to the handler instead of it being a global variable. This patch fixes the following oops mnessage we are seeing in PRRN testing: Oops: Bad kernel stack pointer, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes, External 54 CPU: 7 PID: 18967 Comm: kworker/u96:0 Tainted: G X 4.4.126-94.22-default #1 Workqueue: pseries hotplug workque pseries_hp_work_fn task: c000000775367790 ti: c00000001ebd4000 task.ti: c00000070d140000 NIP: 0000000000000000 LR: 000000001fb3d050 CTR: 0000000000000000 REGS: c00000001ebd7d40 TRAP: 0700 Tainted: G X (4.4.126-94.22-default) MSR: 8000000102081000 <41,VEC,ME5 CR: 28000002 XER: 20040018 4 CFAR: 000000001fb3d084 40 419 1 3 GPR00: 000000000000000040000000000010007 000000001ffff400 000000041fffe200 GPR04: 000000000000008050000000000000000 000000001fb15fa8 0000000500000500 GPR08: 000000000001f40040000000000000001 0000000000000000 000005:5200040002 GPR12: 00000000000000005c000000007a05400 c0000000000e89f8 000000001ed9f668 GPR16: 000000001fbeff944000000001fbeff94 000000001fb545e4 0000006000000060 GPR20: ffffffffffffffff4ffffffffffffffff 0000000000000000 0000000000000000 GPR24: 00000000000000005400000001fb3c000 0000000000000000 000000001fb1b040 GPR28: 000000001fb240004000000001fb440d8 0000000000000008 0000000000000000 NIP [0000000000000000] 5 (null) LR [000000001fb3d050] 031fb3d050 Call Trace: 4 Instruction dump: 4 5:47 12 2 XXXXXXXX XXXXXXXX XXXXX4XX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXX5XX XXXXXXXX 60000000 60000000 60000000 60000000 ---[ end trace aa5627b04a7d9d6b ]--- 3NMI watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [kworker/27:0:13903] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes, External CPU: 27 PID: 13903 Comm: kworker/27:0 Tainted: G D X 4.4.126-94.22-default #1 Workqueue: events prrn_work_fn task: c000000747cfa390 ti: c00000074712c000 task.ti: c00000074712c000 NIP: c0000000008002a8 LR: c000000000090770 CTR: 000000000032e088 REGS: c00000074712f7b0 TRAP: 0901 Tainted: G D X (4.4.126-94.22-default) MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22482044 XER: 20040000 CFAR: c0000000008002c4 SOFTE: 1 GPR00: c000000000090770 c00000074712fa30 c000000000f09800 c000000000fa1928 6:02 GPR04: c000000775f5e000 fffffffffffffffe 0000000000000001 c000000000f42db8 GPR08: 0000000000000001 0000000080000007 0000000000000000 0000000000000000 GPR12: 8006210083180000 c000000007a14400 NIP [c0000000008002a8] _raw_spin_lock+0x68/0xd0 LR [c000000000090770] mobility_rtas_call+0x50/0x100 Call Trace: 59 5 [c00000074712fa60] [c000000000090770] mobility_rtas_call+0x50/0x100 [c00000074712faf0] [c000000000090b08] pseries_devicetree_update+0xf8/0x530 [c00000074712fc20] [c000000000031ba4] prrn_work_fn+0x34/0x50 [c00000074712fc40] [c0000000000e0390] process_one_work+0x1a0/0x4e0 [c00000074712fcd0] [c0000000000e0870] worker_thread+0x1a0/0x6105:57 2 [c00000074712fd80] [c0000000000e8b18] kthread+0x128/0x150 [c00000074712fe30] [c0000000000096f8] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 2c090000 40c20010 7d40192d 40c2fff0 7c2004ac 2fa90000 40de0018 5:540030 3 e8010010 ebe1fff8 7c0803a6 4e800020 <7c210b78> e92d0000 89290009 792affe3 Signed-off-by: John Allen <jallen@linux.ibm.com> Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/gart: Exclude GART aperture from kcoreKairui Song2019-04-201-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ffc8599aa9763f39f6736a79da4d1575e7006f9a ] On machines where the GART aperture is mapped over physical RAM, /proc/kcore contains the GART aperture range. Accessing the GART range via /proc/kcore results in a kernel crash. vmcore used to have the same issue, until it was fixed with commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging existing hook infrastructure in vmcore to let /proc/vmcore return zeroes when attempting to read the aperture region, and so it won't read from the actual memory. Apply the same workaround for kcore. First implement the same hook infrastructure for kcore, then reuse the hook functions introduced in the previous vmcore fix. Just with some minor adjustment, rename some functions for more general usage, and simplify the hook infrastructure a bit as there is no module usage yet. Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Dave Young <dyoung@redhat.com> Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return ↵Nathan Chancellor2019-04-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | an error [ Upstream commit e898e69d6b9475bf123f99b3c5d1a67bb7cb2361 ] When building with -Wsometimes-uninitialized, Clang warns: arch/x86/kernel/hw_breakpoint.c:355:2: warning: variable 'align' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] The default cannot be reached because arch_build_bp_info() initializes hw->len to one of the specified cases. Nevertheless the warning is valid and returning -EINVAL makes sure that this cannot be broken by future modifications. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/392 Link: https://lkml.kernel.org/r/20190307212756.4648-1-natechancellor@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processorsMatthew Whitehead2019-04-201-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 18fb053f9b827bd98cfc64f2a35df8ab19745a1d ] There are comments in processor-cyrix.h advising you to _not_ make calls using the deprecated macros in this style: setCx86_old(CX86_CCR4, getCx86_old(CX86_CCR4) | 0x80); This is because it expands the macro into a non-functioning calling sequence. The calling order must be: outb(CX86_CCR2, 0x22); inb(0x23); From the comments: * When using the old macros a line like * setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88); * gets expanded to: * do { * outb((CX86_CCR2), 0x22); * outb((({ * outb((CX86_CCR2), 0x22); * inb(0x23); * }) | 0x88), 0x23); * } while (0); The new macros fix this problem, so use them instead. Tested on an actual Geode processor. Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1552596361-8967-2-git-send-email-tedheadster@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/hyperv: Prevent potential NULL pointer dereferenceKangjie Lu2019-04-201-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 534c89c22e26b183d838294f0937ee092c82ad3a ] The page allocation in hv_cpu_init() can fail, but the code does not have a check for that. Add a check and return -ENOMEM when the allocation fails. [ tglx: Massaged changelog ] Signed-off-by: Kangjie Lu <kjlu@umn.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Acked-by: "K. Y. Srinivasan" <kys@microsoft.com> Cc: pakki001@umn.edu Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: linux-hyperv@vger.kernel.org Link: https://lkml.kernel.org/r/20190314054651.1315-1-kjlu@umn.edu Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/hpet: Prevent potential NULL pointer dereferenceAditya Pakki2019-04-201-0/+2
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2e84f116afca3719c9d0a1a78b47b48f75fd5724 ] hpet_virt_address may be NULL when ioremap_nocache fail, but the code lacks a check. Add a check to prevent NULL pointer dereference. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: kjlu@umn.edu Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Roland Dreier <roland@purestorage.com> Link: https://lkml.kernel.org/r/20190319021958.17275-1-pakki001@umn.edu Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/mm: Don't leak kernel addressesMatteo Croce2019-04-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a3151724437f54076cc10bc02b1c4f0003ae36cd ] Since commit: ad67b74d2469d9b8 ("printk: hash addresses printed with %p") at boot "____ptrval____" is printed instead of actual addresses: found SMP MP-table at [mem 0x000f5cc0-0x000f5ccf] mapped at [(____ptrval____)] Instead of changing the print to "%px", and leaking a kernel addresses, just remove the print completely, like in: 071929dbdd865f77 ("arm64: Stop printing the virtual memory layout"). Signed-off-by: Matteo Croce <mcroce@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAMCorentin Labbe2019-04-201-0/+1
| | | | | | | | | | | | | | | [ Upstream commit 0728aeb7ead99a9b0dac2f3c92b3752b4e02ff97 ] We have now a HSDK device in our kernelci lab, but kernel builded via the hsdk_defconfig lacks ramfs supports, so it cannot boot kernelci jobs yet. So this patch enable CONFIG_BLK_DEV_RAM in hsdk_defconfig. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Acked-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARC: u-boot args: check that magic number is correctEugeniy Paltsev2019-04-202-0/+9
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit edb64bca50cd736c6894cc6081d5263c007ce005 ] In case of devboards we really often disable bootloader and load Linux image in memory via JTAG. Even if kernel tries to verify uboot_tag and uboot_arg there is sill a chance that we treat some garbage in registers as valid u-boot arguments in JTAG case. E.g. it is enough to have '1' in r0 to treat any value in r2 as a boot command line. So check that magic number passed from u-boot is correct and drop u-boot arguments otherwise. That helps to reduce the possibility of using garbage as u-boot arguments in JTAG case. We can safely check U-boot magic value (0x0) in linux passed via r1 register as U-boot pass it from the beginning. So there is no backward-compatibility issues. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* KVM: x86: nVMX: fix x2APIC VTPR read interceptMarc Orr2019-04-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c73f4c998e1fd4249b9edfa39e23f4fda2b9b041 upstream. Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the SDM, when "virtualize x2APIC mode" is 1 and "APIC-register virtualization" is 0, a RDMSR of 808H should return the VTPR from the virtual APIC page. However, for nested, KVM currently fails to disable the read intercept for this MSR. This means that a RDMSR exit takes precedence over "virtualize x2APIC mode", and KVM passes through L1's TPR to L2, instead of sourcing the value from L2's virtual APIC page. This patch fixes the issue by disabling the read intercept, in VMCS02, for the VTPR when "APIC-register virtualization" is 0. The issue described above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC mode w/ nested". Signed-off-by: Marc Orr <marcorr@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Fixes: c992384bde84f ("KVM: vmx: speed up MSR bitmap merge") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)Marc Orr2019-04-171-28/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit acff78477b9b4f26ecdf65733a4ed77fe837e9dc upstream. The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a result, we discovered the potential for a buggy or malicious L1 to get access to L0's x2APIC MSRs, via an L2, as follows. 1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl variable, in nested_vmx_prepare_msr_bitmap() to become true. 2. L1 disables "virtualize x2APIC mode" in VMCS12. 3. L1 enables "APIC-register virtualization" in VMCS12. Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then set "virtualize x2APIC mode" to 0 in VMCS02. Oops. This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR intercepts with VMCS12's "virtualize x2APIC mode" control. The scenario outlined above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Add leak scenario to virt_x2apic_mode_test". Note, it looks like this issue may have been introduced inadvertently during a merge---see 15303ba5d1cd. Signed-off-by: Marc Orr <marcorr@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity on rk3328-rock64Tomohiro Mayama2019-04-171-2/+1
| | | | | | | | | | | | | | | commit a8772e5d826d0f61f8aa9c284b3ab49035d5273d upstream. This patch makes USB ports functioning again. Fixes: 955bebde057e ("arm64: dts: rockchip: add rk3328-rock64 board") Cc: stable@vger.kernel.org Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Tomohiro Mayama <parly-gh@iris.mystia.org> Tested-by: Katsuhiro Suzuki <katsuhiro@katsuster.net> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: dts: rockchip: fix vcc_host1_5v pin assign on rk3328-rock64Katsuhiro Suzuki2019-04-171-2/+2
| | | | | | | | | | | | | | | commit ef05bcb60c1a8841e38c91923ba998181117a87c upstream. This patch fixes pin assign of vcc_host1_5v. This regulator is controlled by USB20_HOST_DRV signal. ROCK64 schematic says that GPIO0_A2 pin is used as USB20_HOST_DRV. GPIO0_D3 pin is for SPDIF_TX_M0. Signed-off-by: Katsuhiro Suzuki <katsuhiro@katsuster.net> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/perf/amd: Remove need to check "running" bit in NMI handlerLendacky, Thomas2019-04-172-12/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3966c3feca3fd10b2935caa0b4a08c7dd59469e5 upstream. Spurious interrupt support was added to perf in the following commit, almost a decade ago: 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters") The two previous patches (resolving the race condition when disabling a PMC and NMI latency mitigation) allow for the removal of this older spurious interrupt support. Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap is cleared before disabling the PMC, which sets up a race condition. This race condition was mitigated by introducing the running bitmap. That race condition can be eliminated by first disabling the PMC, waiting for PMC reset on overflow and then clearing the bit for the PMC in the active_mask bitmap. The NMI handler will not re-enable a disabled counter. If x86_pmu_stop() is called from the perf NMI handler, the NMI latency mitigation support will guard against any unhandled NMI messages. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/perf/amd: Resolve NMI latency issues for active PMCsLendacky, Thomas2019-04-171-1/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6d3edaae16c6c7d238360f2841212c2b26774d5e upstream. On AMD processors, the detection of an overflowed PMC counter in the NMI handler relies on the current value of the PMC. So, for example, to check for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). When the perf NMI handler executes it does not know in advance which PMC counters have overflowed. As such, the NMI handler will process all active PMC counters that have overflowed. NMI latency in newer AMD processors can result in multiple overflowed PMC counters being processed in one NMI and then a subsequent NMI, that does not appear to be a back-to-back NMI, not finding any PMC counters that have overflowed. This may appear to be an unhandled NMI resulting in either a panic or a series of messages, depending on how the kernel was configured. To mitigate this issue, add an AMD handle_irq callback function, amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq() function and upon return perform some additional processing that will indicate if the NMI has been handled or would have been handled had an earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a minimum value of the number of active PMCs or 2 will be set whenever a PMC is active. This is used to indicate the possible number of NMIs that can still occur. The value of 2 is used for when an NMI does not arrive at the LAPIC in time to be collapsed into an already pending NMI. Each time the function is called without having handled an overflowed counter, the per-CPU value is checked. If the value is non-zero, it is decremented and the NMI indicates that it handled the NMI. If the value is zero, then the NMI indicates that it did not handle the NMI. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/perf/amd: Resolve race condition when disabling PMCLendacky, Thomas2019-04-171-3/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 914123fa39042e651d79eaf86bbf63a1b938dddf upstream. On AMD processors, the detection of an overflowed counter in the NMI handler relies on the current value of the counter. So, for example, to check for overflow on a 48 bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). There is currently a race condition present when disabling and then updating the PMC. Increased NMI latency in newer AMD processors makes this race condition more pronounced. If the counter value has overflowed, it is possible to update the PMC value before the NMI handler can run. The updated PMC value is not an overflowed value, so when the perf NMI handler does run, it will not find an overflowed counter. This may appear as an unknown NMI resulting in either a panic or a series of messages, depending on how the kernel is configured. To eliminate this race condition, the PMC value must be checked after disabling the counter. Add an AMD function, amd_pmu_disable_all(), that will wait for the NMI handler to reset any active and overflowed counter after calling x86_pmu_disable_all(). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/asm: Use stricter assembly constraints in bitopsAlexander Potapenko2019-04-171-23/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 upstream. There's a number of problems with how arch/x86/include/asm/bitops.h is currently using assembly constraints for the memory region bitops are modifying: 1) Use memory clobber in bitops that touch arbitrary memory Certain bit operations that read/write bits take a base pointer and an arbitrarily large offset to address the bit relative to that base. Inline assembly constraints aren't expressive enough to tell the compiler that the assembly directive is going to touch a specific memory location of unknown size, therefore we have to use the "memory" clobber to indicate that the assembly is going to access memory locations other than those listed in the inputs/outputs. To indicate that BTR/BTS instructions don't necessarily touch the first sizeof(long) bytes of the argument, we also move the address to assembly inputs. This particular change leads to size increase of 124 kernel functions in a defconfig build. For some of them the diff is in NOP operations, other end up re-reading values from memory and may potentially slow down the execution. But without these clobbers the compiler is free to cache the contents of the bitmaps and use them as if they weren't changed by the inline assembly. 2) Use byte-sized arguments for operations touching single bytes. Passing a long value to ANDB/ORB/XORB instructions makes the compiler treat sizeof(long) bytes as being clobbered, which isn't the case. This may theoretically lead to worse code in the case of heavy optimization. Practical impact: I've built a defconfig kernel and looked through some of the functions generated by GCC 7.3.0 with and without this clobber, and didn't spot any miscompilations. However there is a (trivial) theoretical case where this code leads to miscompilation: https://lkml.org/lkml/2019/3/28/393 using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes such a function in the kernel someday. So the primary motivation is to fix an existing misuse of the asm directive, which happens to work in certain configurations now, but isn't guaranteed to work under different circumstances. [ --mingo: Added -stable tag because defconfig only builds a fraction of the kernel and the trivial testcase looks normal enough to be used in existing or in-development code. ] Signed-off-by: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: James Y Knight <jyknight@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com [ Edited the changelog, tidied up one of the defines. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/asm: Remove dead __GNUC__ conditionalsRasmus Villemoes2019-04-173-41/+0
| | | | | | | | | | | | | | | | | | | | commit 88ca66d8540ca26119b1428cddb96b37925bdf01 upstream. The minimum supported gcc version is >= 4.6, so these can be removed. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190111084931.24601-1-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xtensa: fix return_addressMax Filippov2019-04-171-1/+5
| | | | | | | | | | | | | | | | commit ada770b1e74a77fff2d5f539bf6c42c25f4784db upstream. return_address returns the address that is one level higher in the call stack than requested in its argument, because level 0 corresponds to its caller's return address. Use requested level as the number of stack frames to skip. This fixes the address reported by might_sleep and friends. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xen: Prevent buffer overflow in privcmd ioctlDan Carpenter2019-04-171-0/+3
| | | | | | | | | | | | | | | | | commit 42d8644bd77dd2d747e004e367cb0c895a606f39 upstream. The "call" variable comes from the user in privcmd_ioctl_hypercall(). It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32) elements. We need to put an upper bound on it to prevent an out of bounds access. Cc: stable@vger.kernel.org Fixes: 1246ae0bb992 ("xen: add variable hypercall caller") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: backtrace: Don't bother trying to unwind the userspace stackWill Deacon2019-04-171-6/+9
| | | | | | | | | | | | | | | | | | commit 1e6f5440a6814d28c32d347f338bfef68bc3e69d upstream. Calling dump_backtrace() with a pt_regs argument corresponding to userspace doesn't make any sense and our unwinder will simply print "Call trace:" before unwinding the stack looking for user frames. Rather than go through this song and dance, just return early if we're passed a user register state. Cc: <stable@vger.kernel.org> Fixes: 1149aad10b1e ("arm64: Add dump_backtrace() in show_regs") Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: dts: rockchip: fix rk3328 rgmii high tx error ratePeter Geis2019-04-171-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6fd8b9780ec1a49ac46e0aaf8775247205e66231 upstream. Several rk3328 based boards experience high rgmii tx error rates. This is due to several pins in the rk3328.dtsi rgmii pinmux that are missing a defined pull strength setting. This causes the pinmux driver to default to 2ma (bit mask 00). These pins are only defined in the rk3328.dtsi, and are not listed in the rk3328 specification. The TRM only lists them as "Reserved" (RK3328 TRM V1.1, 3.3.3 Detail Register Description, GRF_GPIO0B_IOMUX, GRF_GPIO0C_IOMUX, GRF_GPIO0D_IOMUX). However, removal of these pins from the rgmii pinmux definition causes the interface to fail to transmit. Also, the rgmii tx and rx pins defined in the dtsi are not consistent with the rk3328 specification, with tx pins currently set to 12ma and rx pins set to 2ma. Fix this by setting tx pins to 8ma and the rx pins to 4ma, consistent with the specification. Defining the drive strength for the undefined pins eliminated the high tx packet error rate observed under heavy data transfers. Aligning the drive strength to the TRM values eliminated the occasional packet retry errors under iperf3 testing. This allows much higher data rates with no recorded tx errors. Tested on the rk3328-roc-cc board. Fixes: 52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs") Cc: stable@vger.kernel.org Signed-off-by: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result valueWill Deacon2019-04-171-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 045afc24124d80c6998d9c770844c67912083506 upstream. Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012. The reasons we appear to get away with this are: 1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications 2. If the result of the atomic operation is zero, the system call behaves correctly 3. Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all. Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping. Cc: <stable@kernel.org> Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: dts: at91: Fix typo in ISC_D0 on PC9David Engraf2019-04-171-1/+1
| | | | | | | | | | | | | | | commit e7dfb6d04e4715be1f3eb2c60d97b753fd2e4516 upstream. The function argument for the ISC_D0 on PC9 was incorrect. According to the documentation it should be 'C' aka 3. Signed-off-by: David Engraf <david.engraf@sysgo.com> Reviewed-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Fixes: 7f16cb676c00 ("ARM: at91/dt: add sama5d2 pinmux") Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: dts: am335x-evm: Correct the regulators for the audio codecPeter Ujfalusi2019-04-171-4/+22
| | | | | | | | | | | | | commit 4f96dc0a3e79ec257a2b082dab3ee694ff88c317 upstream. Correctly map the regulators used by tlv320aic3106. Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators. Cc: <Stable@vger.kernel.org> # v4.14+ Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: dts: am335x-evmsk: Correct the regulators for the audio codecPeter Ujfalusi2019-04-171-4/+22
| | | | | | | | | | | | | commit 6691370646e844be98bb6558c024269791d20bd7 upstream. Correctly map the regulators used by tlv320aic3106. Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators. Cc: <Stable@vger.kernel.org> # v4.14+ Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: dts: rockchip: fix rk3288 cpu opp node referenceJonas Karlman2019-04-171-3/+3
| | | | | | | | | | | | | | | | | commit 6b2fde3dbfab6ebc45b0cd605e17ca5057ff9a3b upstream. The following error can be seen during boot: of: /cpus/cpu@501: Couldn't find opp node Change cpu nodes to use operating-points-v2 in order to fix this. Fixes: ce76de984649 ("ARM: dts: rockchip: convert rk3288 to operating-points-v2") Cc: stable@vger.kernel.org Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* riscv: Fix syscall_get_arguments() and syscall_set_arguments()Dmitry V. Levin2019-04-171-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 10a16997db3d99fc02c026cf2c6e6c670acafab0 upstream. RISC-V syscall arguments are located in orig_a0,a1..a5 fields of struct pt_regs. Due to an off-by-one bug and a bug in pointer arithmetic syscall_get_arguments() was reading s3..s7 fields instead of a1..a5. Likewise, syscall_set_arguments() was writing s3..s7 fields instead of a1..a5. Link: http://lkml.kernel.org/r/20190329171221.GA32456@altlinux.org Fixes: e2c0cdfba7f69 ("RISC-V: User-facing API") Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org # v4.15+ Acked-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kvm: svm: fix potential get_num_contig_pages overflowDavid Rientjes2019-04-171-5/+5
| | | | | | | | | | | | | | commit ede885ecb2cdf8a8dd5367702e3d964ec846a2d5 upstream. get_num_contig_pages() could potentially overflow int so make its type consistent with its usage. Reported-by: Cfir Cohen <cfir@google.com> Cc: stable@vger.kernel.org Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* parisc: also set iaoq_b in instruction_pointer_set()Sven Schnelle2019-04-171-1/+2
| | | | | | | | | | | | | | | | commit f324fa58327791b2696628b31480e7e21c745706 upstream. When setting the instruction pointer on PA-RISC we also need to set the back of the instruction queue to the new offset, otherwise we will execute on instruction from the new location, and jumping back to the old location stored in iaoq_b. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Helge Deller <deller@gmx.de> Fixes: 75ebedf1d263 ("parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* parisc: regs_return_value() should return gpr28Sven Schnelle2019-04-171-1/+1
| | | | | | | | | | | | | | commit 45efd871bf0a47648f119d1b41467f70484de5bc upstream. While working on kretprobes for PA-RISC I was wondering while the kprobes sanity test always fails on kretprobes. This is caused by returning gpr20 instead of gpr28. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* parisc: Detect QEMU earlier in boot processHelge Deller2019-04-172-6/+3
| | | | | | | | | | | | | | | | | | | commit d006e95b5561f708d0385e9677ffe2c46f2ae345 upstream. While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before. This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding. Fixes: 310d82784fb4 ("parisc: qemu idle sleep support") Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: dts: rockchip: fix rk3328 sdmmc0 write errorsPeter Geis2019-04-171-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | commit 09f91381fa5de1d44bc323d8bf345f5d57b3d9b5 upstream. Various rk3328 based boards experience occasional sdmmc0 write errors. This is due to the rk3328.dtsi tx drive levels being set to 4ma, vs 8ma per the rk3328 datasheet default settings. Fix this by setting the tx signal pins to 8ma. Inspiration from tonymac32's patch, https://github.com/ayufan-rock64/linux-kernel/commit/dc1212b347e0da17c5460bcc0a56b07d02bac3f8 Fixes issues on the rk3328-roc-cc and the rk3328-rock64 (as per the above commit message). Tested on the rk3328-roc-cc board. Fixes: 52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs") Cc: stable@vger.kernel.org Signed-off-by: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear regionYueyi Li2019-04-171-1/+1
| | | | | | | | | | | | | | | | | [ Upstream commit c8a43c18a97845e7f94ed7d181c11f41964976a2 ] When KASLR is enabled (CONFIG_RANDOMIZE_BASE=y), the top 4K of kernel virtual address space may be mapped to physical addresses despite being reserved for ERR_PTR values. Fix the randomization of the linear region so that we avoid mapping the last page of the virtual address space. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: liyueyi <liyueyi@live.com> [will: rewrote commit message; merged in suggestion from Ard] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
* x86/vdso: Drop implicit common-page-size linker flagNick Desaulniers2019-04-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ac3e233d29f7f77f28243af0132057d378d3ea58 upstream. GNU linker's -z common-page-size's default value is based on the target architecture. arch/x86/entry/vdso/Makefile sets it to the architecture default, which is implicit and redundant. Drop it. Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Reported-by: Dmitry Golovin <dima@golovin.in> Reported-by: Bill Wendling <morbo@google.com> Suggested-by: Dmitry Golovin <dima@golovin.in> Suggested-by: Rui Ueyama <ruiu@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Fangrui Song <maskray@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com Link: https://bugs.llvm.org/show_bug.cgi?id=38774 Link: https://github.com/ClangBuiltLinux/linux/issues/31 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEMBreno Leitao2019-04-171-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 897bc3df8c5aebb54c32d831f917592e873d0559 upstream. Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") moved a code block around and this block uses a 'msr' variable outside of the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when CONFIG_PPC_TRANSACTION_MEM is not defined. error: 'msr' undeclared (first use in this function) This is not causing a compilation error in the mainline kernel, because 'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set: #define MSR_TM_ACTIVE(x) 0 This patch just fixes this issue avoiding the 'msr' variable usage outside the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the MSR_TM_ACTIVE() definition. Cc: stable@vger.kernel.org Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* kvm: nVMX: NMI-window and interrupt-window exiting should wake L2 from HLTJim Mattson2019-04-171-3/+7
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 9ebdfe5230f2e50e3ba05c57723a06e90946815a ] According to the SDM, "NMI-window exiting" VM-exits wake a logical processor from the same inactive states as would an NMI and "interrupt-window exiting" VM-exits wake a logical processor from the same inactive states as would an external interrupt. Specifically, they wake a logical processor from the shutdown state and from the states entered using the HLT and MWAIT instructions. Fixes: 6dfacadd5858 ("KVM: nVMX: Add support for activity state HLT") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> [Squashed comments of two Jim's patches and used the simplified code hunk provided by Sean. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* kprobes/x86: Blacklist non-attachable interrupt functionsAndrea Righi2019-04-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a50480cb6d61d5c5fc13308479407b628b6bc1c5 ] These interrupt functions are already non-attachable by kprobes. Blacklist them explicitly so that they can show up in /sys/kernel/debug/kprobes/blacklist and tools like BCC can use this additional information. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/20181206095648.GA8249@Dell Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/build: Mark per-CPU symbols as absolute explicitly for LLDRafael Ávila de Espíndola2019-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d071ae09a4a1414c1433d5ae9908959a7325b0ad ] Accessing per-CPU variables is done by finding the offset of the variable in the per-CPU block and adding it to the address of the respective CPU's block. Section 3.10.8 of ld.bfd's documentation states: For expressions involving numbers, relative addresses and absolute addresses, ld follows these rules to evaluate terms: Other binary operations, that is, between two relative addresses not in the same section, or between a relative address and an absolute address, first convert any non-absolute term to an absolute address before applying the operator." Note that LLVM's linker does not adhere to the GNU ld's implementation and as such requires implicitly-absolute terms to be explicitly marked as absolute in the linker script. If not, it fails currently with: ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute Makefile:1040: recipe for target 'vmlinux' failed This is not a functional change for ld.bfd which converts the term to an absolute symbol anyways as specified above. Based on a previous submission by Tri Vo <trong@android.com>. Reported-by: Dmitry Golovin <dima@golovin.in> Signed-off-by: Rafael Ávila de Espíndola <rafael@espindo.la> [ Update commit message per Boris' and Michael's suggestions. ] Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> [ Massage commit message more, fix typos. ] Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Dmitry Golovin <dima@golovin.in> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Cao Jin <caoj.fnst@cn.fujitsu.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tri Vo <trong@android.com> Cc: dima@golovin.in Cc: morbo@google.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/build: Specify elf_i386 linker emulation explicitly for i386 objectsGeorge Rimar2019-04-052-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 927185c124d62a9a4d35878d7f6d432a166b74e3 ] The kernel uses the OUTPUT_FORMAT linker script command in it's linker scripts. Most of the time, the -m option is passed to the linker with correct architecture, but sometimes (at least for x86_64) the -m option contradicts the OUTPUT_FORMAT directive. Specifically, arch/x86/boot and arch/x86/realmode/rm produce i386 object files, but are linked with the -m elf_x86_64 linker flag when building for x86_64. The GNU linker manpage doesn't explicitly state any tie-breakers between -m and OUTPUT_FORMAT. But with BFD and Gold linkers, OUTPUT_FORMAT overrides the emulation value specified with the -m option. LLVM lld has a different behavior, however. When supplied with contradicting -m and OUTPUT_FORMAT values it fails with the following error message: ld.lld: error: arch/x86/realmode/rm/header.o is incompatible with elf_x86_64 Therefore, just add the correct -m after the incorrect one (it overrides it), so the linker invocation looks like this: ld -m elf_x86_64 -z max-page-size=0x200000 -m elf_i386 --emit-relocs -T \ realmode.lds header.o trampoline_64.o stack.o reboot.o -o realmode.elf This is not a functional change for GNU ld, because (although not explicitly documented) OUTPUT_FORMAT overrides -m EMULATION. Tested by building x86_64 kernel with GNU gcc/ld toolchain and booting it in QEMU. [ bp: massage and clarify text. ] Suggested-by: Dmitry Golovin <dima@golovin.in> Signed-off-by: George Rimar <grimar@accesssoftek.com> Signed-off-by: Tri Vo <trong@android.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tri Vo <trong@android.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Matz <matz@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: morbo@google.com Cc: ndesaulniers@google.com Cc: ruiu@google.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190111201012.71210-1-trong@android.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/pseries: Perform full re-add of CPU for topology update post-migrationNathan Fontenot2019-04-053-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 81b61324922c67f73813d8a9c175f3c153f6a1c6 ] On pseries systems, performing a partition migration can result in altering the nodes a CPU is assigned to on the destination system. For exampl, pre-migration on the source system CPUs are in node 1 and 3, post-migration on the destination system CPUs are in nodes 2 and 3. Handling the node change for a CPU can cause corruption in the slab cache if we hit a timing where a CPUs node is changed while cache_reap() is invoked. The corruption occurs because the slab cache code appears to rely on the CPU and slab cache pages being on the same node. The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c does not prevent us from hitting this scenario. Changing the device tree property update notification handler that recognizes an affinity change for a CPU to do a full DLPAR remove and add of the CPU instead of dynamically changing its node resolves this issue. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com> Tested-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/64s: Clear on-stack exception marker upon exception returnNicolai Stange2019-04-051-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit eddd0b332304d554ad6243942f87c2fcea98c56b ] The ppc64 specific implementation of the reliable stacktracer, save_stack_trace_tsk_reliable(), bails out and reports an "unreliable trace" whenever it finds an exception frame on the stack. Stack frames are classified as exception frames if the STACK_FRAME_REGS_MARKER magic, as written by exception prologues, is found at a particular location. However, as observed by Joe Lawrence, it is possible in practice that non-exception stack frames can alias with prior exception frames and thus, that the reliable stacktracer can find a stale STACK_FRAME_REGS_MARKER on the stack. It in turn falsely reports an unreliable stacktrace and blocks any live patching transition to finish. Said condition lasts until the stack frame is overwritten/initialized by function call or other means. In principle, we could mitigate this by making the exception frame classification condition in save_stack_trace_tsk_reliable() stronger: in addition to testing for STACK_FRAME_REGS_MARKER, we could also take into account that for all exceptions executing on the kernel stack - their stack frames's backlink pointers always match what is saved in their pt_regs instance's ->gpr[1] slot and that - their exception frame size equals STACK_INT_FRAME_SIZE, a value uncommonly large for non-exception frames. However, while these are currently true, relying on them would make the reliable stacktrace implementation more sensitive towards future changes in the exception entry code. Note that false negatives, i.e. not detecting exception frames, would silently break the live patching consistency model. Furthermore, certain other places (diagnostic stacktraces, perf, xmon) rely on STACK_FRAME_REGS_MARKER as well. Make the exception exit code clear the on-stack STACK_FRAME_REGS_MARKER for those exceptions running on the "normal" kernel stack and returning to kernelspace: because the topmost frame is ignored by the reliable stack tracer anyway, returns to userspace don't need to take care of clearing the marker. Furthermore, as I don't have the ability to test this on Book 3E or 32 bits, limit the change to Book 3S and 64 bits. Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model") Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: avoid Cortex-A9 livelock on tight dmb loopsRussell King2019-04-055-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5388a5b82199facacd3d7ac0d05aca6e8f902fed ] machine_crash_nonpanic_core() does this: while (1) cpu_relax(); because the kernel has crashed, and we have no known safe way to deal with the CPU. So, we place the CPU into an infinite loop which we expect it to never exit - at least not until the system as a whole is reset by some method. In the absence of erratum 754327, this code assembles to: b . In other words, an infinite loop. When erratum 754327 is enabled, this becomes: 1: dmb b 1b It has been observed that on some systems (eg, OMAP4) where, if a crash is triggered, the system tries to kexec into the panic kernel, but fails after taking the secondary CPU down - placing it into one of these loops. This causes the system to livelock, and the most noticable effect is the system stops after issuing: Loading crashdump kernel... to the system console. The tested as working solution I came up with was to add wfe() to these infinite loops thusly: while (1) { cpu_relax(); wfe(); } which, without 754327 builds to: 1: wfe b 1b or with 754327 is enabled: 1: dmb wfe b 1b Adding "wfe" does two things depending on the environment we're running under: - where we're running on bare metal, and the processor implements "wfe", it stops us spinning endlessly in a loop where we're never going to do any useful work. - if we're running in a VM, it allows the CPU to be given back to the hypervisor and rescheduled for other purposes (maybe a different VM) rather than wasting CPU cycles inside a crashed VM. However, in light of erratum 794072, Will Deacon wanted to see 10 nops as well - which is reasonable to cover the case where we have erratum 754327 enabled _and_ we have a processor that doesn't implement the wfe hint. So, we now end up with: 1: wfe b 1b when erratum 754327 is disabled, or: 1: dmb nop nop nop nop nop nop nop nop nop nop wfe b 1b when erratum 754327 is enabled. We also get the dmb + 10 nop sequence elsewhere in the kernel, in terminating loops. This is reasonable - it means we get the workaround for erratum 794072 when erratum 754327 is enabled, but still relinquish the dead processor - either by placing it in a lower power mode when wfe is implemented as such or by returning it to the hypervisior, or in the case where wfe is a no-op, we use the workaround specified in erratum 794072 to avoid the problem. These as two entirely orthogonal problems - the 10 nops addresses erratum 794072, and the wfe is an optimisation that makes the system more efficient when crashed either in terms of power consumption or by allowing the host/other VMs to make use of the CPU. I don't see any reason not to use kexec() inside a VM - it has the potential to provide automated recovery from a failure of the VMs kernel with the opportunity for saving a crashdump of the failure. A panic() with a reboot timeout won't do that, and reading the libvirt documentation, setting on_reboot to "preserve" won't either (the documentation states "The preserve action for an on_reboot event is treated as a destroy".) Surely it has to be a good thing to avoiding having CPUs spinning inside a VM that is doing no useful work. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care ofVladimir Murzin2019-04-054-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 72cd4064fccaae15ab84d40d4be23667402df4ed ] ARMv8M introduces support for Security extension to M class, among other things it affects exception handling, especially, encoding of EXC_RETURN. The new bits have been added: Bit [6] Secure or Non-secure stack Bit [5] Default callee register stacking Bit [0] Exception Secure which conflicts with hard-coded value of EXC_RETURN: In fact, we only care of few bits: Bit [3] Mode (0 - Handler, 1 - Thread) Bit [2] Stack pointer selection (0 - Main, 1 - Process) We can toggle only those bits and left other bits as they were on exception entry. It is basically, what patch does - saves EXC_RETURN when we do transition form Thread to Handler mode (it is first svc), so later saved value is used instead of EXC_RET_THREADMODE_PROCESSSTACK. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notationMathieu Malaterre2019-04-051-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3e3380d0675d5e20b0af067d60cb947a4348bf9b ] Improve the DTS files by removing all the leading "0x" and zeros to fix the following dtc warnings: Warning (unit_address_format): Node /XXX unit name should not have leading "0x" and Warning (unit_address_format): Node /XXX unit name should not have leading 0s Converted using the following command: find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -i -e "s/@\([0-9a-fA-FxX\.;:#]+\)\s*{/@\L\1 {/g" -e "s/@0x\(.*\) {/@\1 {/g" -e "s/@0+\(.*\) {/@\1 {/g" {} + For simplicity, two sed expressions were used to solve each warnings separately. To make the regex expression more robust a few other issues were resolved, namely setting unit-address to lower case, and adding a whitespace before the opening curly brace: https://elinux.org/Device_Tree_Linux#Linux_conventions This will solve as a side effect warning: Warning (simple_bus_reg): Node /XXX@<UPPER> simple-bus unit address format error, expected "<lower>" This is a follow up to commit 4c9847b7375a ("dt-bindings: Remove leading 0x from bindings notation") Reported-by: David Daney <ddaney@caviumnetworks.com> Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Mathieu Malaterre <malat@debian.org> [vzapolskiy: fixed commit message to pass checkpatch.pl test] Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>