summaryrefslogtreecommitdiffstats
path: root/arch/s390/include
Commit message (Collapse)AuthorAgeFilesLines
* s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()Heiko Carstens2023-01-181-1/+1
| | | | | | | | | | | | | commit e3f360db08d55a14112bd27454e616a24296a8b0 upstream. Make sure that *ptr__ within arch_this_cpu_to_op_simple() is only dereferenced once by using READ_ONCE(). Otherwise the compiler could generate incorrect code. Cc: <stable@vger.kernel.org> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/futex: add missing EX_TABLE entry to __futex_atomic_op()Heiko Carstens2022-11-031-1/+2
| | | | | | | | | | | | | | commit a262d3ad6a433e4080cecd0a8841104a5906355e upstream. For some exception types the instruction address points behind the instruction that caused the exception. Take that into account and add the missing exception table entry. Cc: <stable@vger.kernel.org> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/hugetlb: fix prepare_hugepage_range() check for 2 GB hugepagesGerald Schaefer2022-09-151-2/+4
| | | | | | | | | | | | | | | | | | | | | commit 7c8d42fdf1a84b1a0dd60d6528309c8ec127e87c upstream. The alignment check in prepare_hugepage_range() is wrong for 2 GB hugepages, it only checks for 1 MB hugepage alignment. This can result in kernel crash in __unmap_hugepage_range() at the BUG_ON(start & ~huge_page_mask(h)) alignment check, for mappings created with MAP_FIXED at unaligned address. Fix this by correctly handling multiple hugepage sizes, similar to the generic version of prepare_hugepage_range(). Fixes: d08de8e2d867 ("s390/mm: add support for 2GB hugepages") Cc: <stable@vger.kernel.org> # 4.8+ Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/archrandom: prevent CPACF trng invocations in interrupt contextHarald Freudenberger2022-08-111-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 918e75f77af7d2e049bb70469ec0a2c12782d96a upstream. This patch slightly reworks the s390 arch_get_random_seed_{int,long} implementation: Make sure the CPACF trng instruction is never called in any interrupt context. This is done by adding an additional condition in_task(). Justification: There are some constrains to satisfy for the invocation of the arch_get_random_seed_{int,long}() functions: - They should provide good random data during kernel initialization. - They should not be called in interrupt context as the TRNG instruction is relatively heavy weight and may for example make some network loads cause to timeout and buck. However, it was not clear what kind of interrupt context is exactly encountered during kernel init or network traffic eventually calling arch_get_random_seed_long(). After some days of investigations it is clear that the s390 start_kernel function is not running in any interrupt context and so the trng is called: Jul 11 18:33:39 t35lp54 kernel: [<00000001064e90ca>] arch_get_random_seed_long.part.0+0x32/0x70 Jul 11 18:33:39 t35lp54 kernel: [<000000010715f246>] random_init+0xf6/0x238 Jul 11 18:33:39 t35lp54 kernel: [<000000010712545c>] start_kernel+0x4a4/0x628 Jul 11 18:33:39 t35lp54 kernel: [<000000010590402a>] startup_continue+0x2a/0x40 The condition in_task() is true and the CPACF trng provides random data during kernel startup. The network traffic however, is more difficult. A typical call stack looks like this: Jul 06 17:37:07 t35lp54 kernel: [<000000008b5600fc>] extract_entropy.constprop.0+0x23c/0x240 Jul 06 17:37:07 t35lp54 kernel: [<000000008b560136>] crng_reseed+0x36/0xd8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b5604b8>] crng_make_state+0x78/0x340 Jul 06 17:37:07 t35lp54 kernel: [<000000008b5607e0>] _get_random_bytes+0x60/0xf8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b56108a>] get_random_u32+0xda/0x248 Jul 06 17:37:07 t35lp54 kernel: [<000000008aefe7a8>] kfence_guarded_alloc+0x48/0x4b8 Jul 06 17:37:07 t35lp54 kernel: [<000000008aeff35e>] __kfence_alloc+0x18e/0x1b8 Jul 06 17:37:07 t35lp54 kernel: [<000000008aef7f10>] __kmalloc_node_track_caller+0x368/0x4d8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b611eac>] kmalloc_reserve+0x44/0xa0 Jul 06 17:37:07 t35lp54 kernel: [<000000008b611f98>] __alloc_skb+0x90/0x178 Jul 06 17:37:07 t35lp54 kernel: [<000000008b6120dc>] __napi_alloc_skb+0x5c/0x118 Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f06b4>] qeth_extract_skb+0x13c/0x680 Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f6526>] qeth_poll+0x256/0x3f8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b63d76e>] __napi_poll.constprop.0+0x46/0x2f8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b63dbec>] net_rx_action+0x1cc/0x408 Jul 06 17:37:07 t35lp54 kernel: [<000000008b937302>] __do_softirq+0x132/0x6b0 Jul 06 17:37:07 t35lp54 kernel: [<000000008abf46ce>] __irq_exit_rcu+0x13e/0x170 Jul 06 17:37:07 t35lp54 kernel: [<000000008abf531a>] irq_exit_rcu+0x22/0x50 Jul 06 17:37:07 t35lp54 kernel: [<000000008b922506>] do_io_irq+0xe6/0x198 Jul 06 17:37:07 t35lp54 kernel: [<000000008b935826>] io_int_handler+0xd6/0x110 Jul 06 17:37:07 t35lp54 kernel: [<000000008b9358a6>] psw_idle_exit+0x0/0xa Jul 06 17:37:07 t35lp54 kernel: ([<000000008ab9c59a>] arch_cpu_idle+0x52/0xe0) Jul 06 17:37:07 t35lp54 kernel: [<000000008b933cfe>] default_idle_call+0x6e/0xd0 Jul 06 17:37:07 t35lp54 kernel: [<000000008ac59f4e>] do_idle+0xf6/0x1b0 Jul 06 17:37:07 t35lp54 kernel: [<000000008ac5a28e>] cpu_startup_entry+0x36/0x40 Jul 06 17:37:07 t35lp54 kernel: [<000000008abb0d90>] smp_start_secondary+0x148/0x158 Jul 06 17:37:07 t35lp54 kernel: [<000000008b935b9e>] restart_int_handler+0x6e/0x90 which confirms that the call is in softirq context. So in_task() covers exactly the cases where we want to have CPACF trng called: not in nmi, not in hard irq, not in soft irq but in normal task context and during kernel init. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Link: https://lore.kernel.org/r/20220713131721.257907-1-freude@linux.ibm.com Fixes: e4f74400308c ("s390/archrandom: simplify back to earlier design and initialize earlier") [agordeev@linux.ibm.com changed desc, added Fixes and Link, removed -stable] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/archrandom: simplify back to earlier design and initialize earlierJason A. Donenfeld2022-07-071-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e4f74400308cb8abde5fdc9cad609c2aba32110c upstream. s390x appears to present two RNG interfaces: - a "TRNG" that gathers entropy using some hardware function; and - a "DRBG" that takes in a seed and expands it. Previously, the TRNG was wired up to arch_get_random_{long,int}(), but it was observed that this was being called really frequently, resulting in high overhead. So it was changed to be wired up to arch_get_random_ seed_{long,int}(), which was a reasonable decision. Later on, the DRBG was then wired up to arch_get_random_{long,int}(), with a complicated buffer filling thread, to control overhead and rate. Fortunately, none of the performance issues matter much now. The RNG always attempts to use arch_get_random_seed_{long,int}() first, which means a complicated implementation of arch_get_random_{long,int}() isn't really valuable or useful to have around. And it's only used when reseeding, which means it won't hit the high throughput complications that were faced before. So this commit returns to an earlier design of just calling the TRNG in arch_get_random_seed_{long,int}(), and returning false in arch_get_ random_{long,int}(). Part of what makes the simplification possible is that the RNG now seeds itself using the TRNG at bootup. But this only works if the TRNG is detected early in boot, before random_init() is called. So this commit also causes that check to happen in setup_arch(). Cc: stable@vger.kernel.org Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Ingo Franzki <ifranzki@linux.ibm.com> Cc: Juergen Christ <jchrist@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20220610222023.378448-1-Jason@zx2c4.com Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]Naveen N. Rao2022-07-021-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3e35142ef99fe6b4fe5d834ad43ee13cca10a2dc upstream. Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [1], binutils (v2.36+) started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Address this by dropping the weak attribute from these functions. Instead, follow the existing pattern of having architectures #define the name of the function they want to override in their headers. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h] Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390: define get_cycles macro for arch-overrideJason A. Donenfeld2022-06-251-0/+1
| | | | | | | | | | | | | | | | | | | | | commit 2e3df523256cb9836de8441e9c791a796759bb3c upstream. S390x defines a get_cycles() function, but it does not do the usual `#define get_cycles get_cycles` dance, making it impossible for generic code to see if an arch-specific function was defined. While the get_cycles() ifdef is not currently used, the following timekeeping patch in this series will depend on the macro existing (or not existing) when defining random_get_entropy(). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390: Remove arch_has_random, arch_has_random_seedRichard Henderson2022-06-251-12/+0
| | | | | | | | | | | | | | commit 5e054c820f59bbb9714d5767f5f476581c309ca8 upstream. These symbols are currently part of the generic archrandom.h interface, but are currently unused and can be removed. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200110145422.49141-4-broonie@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/preempt: disable __preempt_count_add() optimization for ↵Heiko Carstens2022-06-141-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | PROFILE_ALL_BRANCHES [ Upstream commit 63678eecec57fc51b778be3da35a397931287170 ] gcc 12 does not (always) optimize away code that should only be generated if parameters are constant and within in a certain range. This depends on various obscure kernel config options, however in particular PROFILE_ALL_BRANCHES can trigger this compile error: In function ‘__atomic_add_const’, inlined from ‘__preempt_count_add.part.0’ at ./arch/s390/include/asm/preempt.h:50:3: ./arch/s390/include/asm/atomic_ops.h:80:9: error: impossible constraint in ‘asm’ 80 | asm volatile( \ | ^~~ Workaround this by simply disabling the optimization for PROFILE_ALL_BRANCHES, since the kernel will be so slow, that this optimization won't matter at all. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* hugetlbfs: flush TLBs correctly after huge_pmd_unshareNadav Amit2021-12-011-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a4a118f2eead1d6c49e00765de89878288d4b890 upstream. When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB flush is missing. This TLB flush must be performed before releasing the i_mmap_rwsem, in order to prevent an unshared PMDs page from being released and reused before the TLB flush took place. Arguably, a comprehensive solution would use mmu_gather interface to batch the TLB flushes and the PMDs page release, however it is not an easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2) deferring the release of the page reference for the PMDs page until after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into thinking PMDs are shared when they are not. Fix __unmap_hugepage_range() by adding the missing TLB flush, and forcing a flush when unshare is successful. Fixes: 24669e58477e ("hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages)" # 3.6 Signed-off-by: Nadav Amit <namit@vmware.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/ftrace: fix ftrace_update_ftrace_func implementationVasily Gorbik2021-07-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f8c2602733c953ed7a16e060640b8e96f9d94b9b upstream. s390 enforces DYNAMIC_FTRACE if FUNCTION_TRACER is selected. At the same time implementation of ftrace_caller is not compliant with HAVE_DYNAMIC_FTRACE since it doesn't provide implementation of ftrace_update_ftrace_func() and calls ftrace_trace_function() directly. The subtle difference is that during ftrace code patching ftrace replaces function tracer via ftrace_update_ftrace_func() and activates it back afterwards. Unexpected direct calls to ftrace_trace_function() during ftrace code patching leads to nullptr-dereferences when tracing is activated for one of functions which are used during code patching. Those function currently are: copy_from_kernel_nofault() copy_from_kernel_nofault_allowed() preempt_count_sub() [with debug_defconfig] preempt_count_add() [with debug_defconfig] Corresponding KASAN report: BUG: KASAN: nullptr-dereference in function_trace_call+0x316/0x3b0 Read of size 4 at addr 0000000000001e08 by task migration/0/15 CPU: 0 PID: 15 Comm: migration/0 Tainted: G B 5.13.0-41423-g08316af3644d Hardware name: IBM 3906 M04 704 (LPAR) Stopper: multi_cpu_stop+0x0/0x3e0 <- stop_machine_cpuslocked+0x1e4/0x218 Call Trace: [<0000000001f77caa>] show_stack+0x16a/0x1d0 [<0000000001f8de42>] dump_stack+0x15a/0x1b0 [<0000000001f81d56>] print_address_description.constprop.0+0x66/0x2e0 [<000000000082b0ca>] kasan_report+0x152/0x1c0 [<00000000004cfd8e>] function_trace_call+0x316/0x3b0 [<0000000001fb7082>] ftrace_caller+0x7a/0x7e [<00000000006bb3e6>] copy_from_kernel_nofault_allowed+0x6/0x10 [<00000000006bb42e>] copy_from_kernel_nofault+0x3e/0xd0 [<000000000014605c>] ftrace_make_call+0xb4/0x1f8 [<000000000047a1b4>] ftrace_replace_code+0x134/0x1d8 [<000000000047a6e0>] ftrace_modify_all_code+0x120/0x1d0 [<000000000047a7ec>] __ftrace_modify_code+0x5c/0x78 [<000000000042395c>] multi_cpu_stop+0x224/0x3e0 [<0000000000423212>] cpu_stopper_thread+0x33a/0x5a0 [<0000000000243ff2>] smpboot_thread_fn+0x302/0x708 [<00000000002329ea>] kthread+0x342/0x408 [<00000000001066b2>] __ret_from_fork+0x92/0xf0 [<0000000001fb57fa>] ret_from_fork+0xa/0x30 The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1 flags: 0x1ffff00000001000(reserved|node=0|zone=0|lastcpupid=0x1ffff) raw: 1ffff00000001000 0000040000000048 0000040000000048 0000000000000000 raw: 0000000000000000 0000000000000000 ffffffff00000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: 0000000000001d00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 0000000000001d80: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 >0000000000001e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 ^ 0000000000001e80: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 0000000000001f00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 ================================================================== To fix that introduce ftrace_func callback to be called from ftrace_caller and update it in ftrace_update_ftrace_func(). Fixes: 4cc9bed034d1 ("[S390] cleanup ftrace backend functions") Cc: stable@vger.kernel.org Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390: don't trace preemption in percpu macrosSven Schnelle2020-09-091-14/+14
| | | | | | | | | | | | | | [ Upstream commit 1196f12a2c960951d02262af25af0bb1775ebcc2 ] Since commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") the lockdep code itself uses percpu variables. This leads to recursions because the percpu macros are calling preempt_enable() which might call trace_preempt_on(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* KVM: s390: reduce number of IO pins to 1Christian Borntraeger2020-07-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 774911290c589e98e3638e73b24b0a4d4530e97c ] The current number of KVM_IRQCHIP_NUM_PINS results in an order 3 allocation (32kb) for each guest start/restart. This can result in OOM killer activity even with free swap when the memory is fragmented enough: kernel: qemu-system-s39 invoked oom-killer: gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=3, oom_score_adj=0 kernel: CPU: 1 PID: 357274 Comm: qemu-system-s39 Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu kernel: Hardware name: IBM 8562 T02 Z06 (LPAR) kernel: Call Trace: kernel: ([<00000001f848fe2a>] show_stack+0x7a/0xc0) kernel: [<00000001f8d3437a>] dump_stack+0x8a/0xc0 kernel: [<00000001f8687032>] dump_header+0x62/0x258 kernel: [<00000001f8686122>] oom_kill_process+0x172/0x180 kernel: [<00000001f8686abe>] out_of_memory+0xee/0x580 kernel: [<00000001f86e66b8>] __alloc_pages_slowpath+0xd18/0xe90 kernel: [<00000001f86e6ad4>] __alloc_pages_nodemask+0x2a4/0x320 kernel: [<00000001f86b1ab4>] kmalloc_order+0x34/0xb0 kernel: [<00000001f86b1b62>] kmalloc_order_trace+0x32/0xe0 kernel: [<00000001f84bb806>] kvm_set_irq_routing+0xa6/0x2e0 kernel: [<00000001f84c99a4>] kvm_arch_vm_ioctl+0x544/0x9e0 kernel: [<00000001f84b8936>] kvm_vm_ioctl+0x396/0x760 kernel: [<00000001f875df66>] do_vfs_ioctl+0x376/0x690 kernel: [<00000001f875e304>] ksys_ioctl+0x84/0xb0 kernel: [<00000001f875e39a>] __s390x_sys_ioctl+0x2a/0x40 kernel: [<00000001f8d55424>] system_call+0xd8/0x2c8 As far as I can tell s390x does not use the iopins as we bail our for anything other than KVM_IRQ_ROUTING_S390_ADAPTER and the chip/pin is only used for KVM_IRQ_ROUTING_IRQCHIP. So let us use a small number to reduce the memory footprint. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200617083620.5409-1-borntraeger@de.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/vdso: fix vDSO clock_getres()Vincenzo Frascino2020-06-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 478237a595120a18e9b52fd2c57a6e8b7a01e411 ] clock_getres in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; and hrtimer_resolution depends on the enablement of the high resolution timers that can happen either at compile or at run time. Fix the s390 vdso implementation of clock_getres keeping a copy of hrtimer_resolution in vdso data and using that directly. Link: https://lkml.kernel.org/r/20200324121027.21665-1-vincenzo.frascino@arm.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [heiko.carstens@de.ibm.com: use llgf for proper zero extension] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390: fix syscall_get_error for compat processesDmitry V. Levin2020-06-251-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit b3583fca5fb654af2cfc1c08259abb9728272538 upstream. If both the tracer and the tracee are compat processes, and gprs[2] is assigned a value by __poke_user_compat, then the higher 32 bits of gprs[2] are cleared, IS_ERR_VALUE() always returns false, and syscall_get_error() always returns 0. Fix the implementation by sign-extending the value for compat processes the same way as x86 implementation does. The bug was exposed to user space by commit 201766a20e30f ("ptrace: add PTRACE_GET_SYSCALL_INFO request") and detected by strace test suite. This change fixes strace syscall tampering on s390. Link: https://lkml.kernel.org/r/20200602180051.GA2427@altlinux.org Fixes: 753c4dd6a2fa2 ("[S390] ptrace changes") Cc: Elvira Khabirova <lineprinter@altlinux.org> Cc: stable@vger.kernel.org # v2.6.28+ Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/qdio: fill SL with absolute addressesJulian Wiedmann2020-03-111-1/+1
| | | | | | | | | | | | | | | | [ Upstream commit e9091ffd6a0aaced111b5d6ead5eaab5cd7101bc ] As the comment says, sl->sbal holds an absolute address. qeth currently solves this through wild casting, while zfcp doesn't care. Handle this properly in the code that actually builds the SL. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Steffen Maier <maier@linux.ibm.com> [for qdio] Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in ↵Nathan Chancellor2020-02-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | storage_key_init_range commit 380324734956c64cd060e1db4304f3117ac15809 upstream. Clang warns: In file included from ../arch/s390/purgatory/purgatory.c:10: In file included from ../include/linux/kexec.h:18: In file included from ../include/linux/crash_core.h:6: In file included from ../include/linux/elfcore.h:5: In file included from ../include/linux/user.h:1: In file included from ../arch/s390/include/asm/user.h:11: ../arch/s390/include/asm/page.h:45:6: warning: converting the result of '<<' to a boolean always evaluates to false [-Wtautological-constant-compare] if (PAGE_DEFAULT_KEY) ^ ../arch/s390/include/asm/page.h:23:44: note: expanded from macro 'PAGE_DEFAULT_KEY' #define PAGE_DEFAULT_KEY (PAGE_DEFAULT_ACC << 4) ^ 1 warning generated. Explicitly compare this against zero to silence the warning as it is intended to be used in a boolean context. Fixes: de3fa841e429 ("s390/mm: fix compile for PAGE_DEFAULT_KEY != 0") Link: https://github.com/ClangBuiltLinux/linux/issues/860 Link: https://lkml.kernel.org/r/20200214064207.10381-1-natechancellor@gmail.com Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/time: Fix clk type in get_tod_clockNathan Chancellor2020-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0f8a206df7c920150d2aa45574fba0ab7ff6be4f upstream. Clang warns: In file included from ../arch/s390/boot/startup.c:3: In file included from ../include/linux/elf.h:5: In file included from ../arch/s390/include/asm/elf.h:132: In file included from ../include/linux/compat.h:10: In file included from ../include/linux/time.h:74: In file included from ../include/linux/time32.h:13: In file included from ../include/linux/timex.h:65: ../arch/s390/include/asm/timex.h:160:20: warning: passing 'unsigned char [16]' to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] get_tod_clock_ext(clk); ^~~ ../arch/s390/include/asm/timex.h:149:44: note: passing argument to parameter 'clk' here static inline void get_tod_clock_ext(char *clk) ^ Change clk's type to just be char so that it matches what happens in get_tod_clock_ext. Fixes: 57b28f66316d ("[S390] s390_hypfs: Add new attributes") Link: https://github.com/ClangBuiltLinux/linux/issues/861 Link: http://lkml.kernel.org/r/20200208140858.47970-1-natechancellor@gmail.com Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/mm: fix dynamic pagetable upgrade for hugetlbfsGerald Schaefer2020-02-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5f490a520bcb393389a4d44bec90afcb332eb112 upstream. Commit ee71d16d22bb ("s390/mm: make TASK_SIZE independent from the number of page table levels") changed the logic of TASK_SIZE and also removed the arch_mmap_check() implementation for s390. This combination has a subtle effect on how get_unmapped_area() for hugetlbfs pages works. It is now possible that a user process establishes a hugetlbfs mapping at an address above 4 TB, without triggering a dynamic pagetable upgrade from 3 to 4 levels. This is because hugetlbfs mappings will not use mm->get_unmapped_area, but rather file->f_op->get_unmapped_area, which currently is the generic implementation of hugetlb_get_unmapped_area() that does not know about s390 dynamic pagetable upgrades, but with the new definition of TASK_SIZE, it will now allow mappings above 4 TB. Subsequent access to such a mapped address above 4 TB will result in a page fault loop, because the CPU cannot translate such a large address with 3 pagetable levels. The fault handler will try to map in a hugepage at the address, but due to the folded pagetable logic it will end up with creating entries in the 3 level pagetable, possibly overwriting existing mappings, and then it all repeats when the access is retried. Apart from the page fault loop, this can have various nasty effects, e.g. kernel panic from one of the BUG_ON() checks in memory management code, or even data loss if an existing mapping gets overwritten. Fix this by implementing HAVE_ARCH_HUGETLB_UNMAPPED_AREA support for s390, providing an s390 version for hugetlb_get_unmapped_area() with pagetable upgrade support similar to arch_get_unmapped_area(), which will then be used instead of the generic version. Fixes: ee71d16d22bb ("s390/mm: make TASK_SIZE independent from the number of page table levels") Cc: <stable@vger.kernel.org> # 4.12+ Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/ftrace: fix endless recursion in function_graph tracerSven Schnelle2019-12-311-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6feeee8efc53035c3195b02068b58ae947538aa4 ] The following sequence triggers a kernel stack overflow on s390x: mount -t tracefs tracefs /sys/kernel/tracing cd /sys/kernel/tracing echo function_graph > current_tracer [crash] This is because preempt_count_{add,sub} are in the list of traced functions, which can be demonstrated by: echo preempt_count_add >set_ftrace_filter echo function_graph > current_tracer [crash] The stack overflow happens because get_tod_clock_monotonic() gets called by ftrace but itself calls preempt_{disable,enable}(), which leads to a endless recursion. Fix this by using preempt_{disable,enable}_notrace(). Fixes: 011620688a71 ("s390/time: ensure get_clock_monotonic() returns monotonic values") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/mm: add mm_pxd_folded() checks to pxd_free()Gerald Schaefer2019-12-311-2/+14
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 2416cefc504ba8ae9b17e3e6b40afc72708f96be ] Unlike pxd_free_tlb(), the pxd_free() functions do not check for folded page tables. This is not an issue so far, as those functions will actually never be called, since no code will reach them when page tables are folded. In order to avoid future issues, and to make the s390 code more similar to other architectures, add mm_pxd_folded() checks, similar to how it is done in pxd_free_tlb(). This was found by testing a patch from from Anshuman Khandual, which is currently discussed on LKML ("mm/debug: Add tests validating architecture page table helpers"). Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/time: ensure get_clock_monotonic() returns monotonic valuesHeiko Carstens2019-12-311-6/+10
| | | | | | | | | | | | | | | [ Upstream commit 011620688a71f2f1fe9901dbc2479a7c01053196 ] The current implementation of get_clock_monotonic() leaves it up to the caller to call the function with preemption disabled. The only core kernel caller (sched_clock) however does not disable preemption. In order to make sure that all callers of this function see monotonic values handle disabling preemption within the function itself. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/mm: properly clear _PAGE_NOEXEC bit when it is not supportedGerald Schaefer2019-12-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ab874f22d35a8058d8fdee5f13eb69d8867efeae upstream. On older HW or under a hypervisor, w/o the instruction-execution- protection (IEP) facility, and also w/o EDAT-1, a translation-specification exception may be recognized when bit 55 of a pte is one (_PAGE_NOEXEC). The current code tries to prevent setting _PAGE_NOEXEC in such cases, by removing it within set_pte_at(). However, ptep_set_access_flags() will modify a pte directly, w/o using set_pte_at(). There is at least one scenario where this can result in an active pte with _PAGE_NOEXEC set, which would then lead to a panic due to a translation-specification exception (write to swapped out page): do_swap_page pte = mk_pte (with _PAGE_NOEXEC bit) set_pte_at (will remove _PAGE_NOEXEC bit in page table, but keep it in local variable pte) vmf->orig_pte = pte (pte still contains _PAGE_NOEXEC bit) do_wp_page wp_page_reuse entry = vmf->orig_pte (still with _PAGE_NOEXEC bit) ptep_set_access_flags (writes entry with _PAGE_NOEXEC bit) Fix this by clearing _PAGE_NOEXEC already in mk_pte_phys(), where the pgprot value is applied, so that no pte with _PAGE_NOEXEC will ever be visible, if it is not supported. The check in set_pte_at() can then also be removed. Cc: <stable@vger.kernel.org> # 4.11+ Fixes: 57d7f939e7bd ("s390: add no-execute support") Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/vdso: correct vdso mapping for compat tasksVasily Gorbik2019-11-201-1/+1
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 190f056fba230abee80712eb810939ef9a8c462f ] While "s390/vdso: avoid 64-bit vdso mapping for compat tasks" fixed 64-bit vdso mapping for compat tasks under gdb it introduced another problem. "compat_mm" flag is not inherited during fork and when 31-bit process forks a child (but does not perform exec) it ends up with 64-bit vdso. To address that, init_new_context (which is called during fork and exec) now initialize compat_mm based on thread TIF_31BIT flag. Later compat_mm is adjusted in arch_setup_additional_pages, which is called during exec. Fixes: d1befa65823e ("s390/vdso: avoid 64-bit vdso mapping for compat tasks") Reported-by: Stefan Liebler <stli@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@vger.kernel.org> # v4.20+ Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/vdso: avoid 64-bit vdso mapping for compat tasksVasily Gorbik2019-11-202-0/+3
| | | | | | | | | | | | | | | | | | | | [ Upstream commit d1befa65823e9c6d013883b8a41d081ec338c489 ] vdso_fault used is_compat_task function (on s390 it tests "current" thread_info flags) to distinguish compat tasks and map 31-bit vdso pages. But "current" task might not correspond to mm context. When 31-bit compat inferior is executed under gdb, gdb does PTRACE_PEEKTEXT on vdso page, causing vdso_fault with "current" being 64-bit gdb process. So, 31-bit inferior ends up with 64-bit vdso mapped. To avoid this problem a new compat_mm flag has been introduced into mm context. This flag is used in vdso_fault and vdso_mremap instead of is_compat_task. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/uaccess: avoid (false positive) compiler warningsChristian Borntraeger2019-11-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 062795fcdcb2d22822fb42644b1d76a8ad8439b3 ] Depending on inlining decisions by the compiler, __get/put_user_fn might become out of line. Then the compiler is no longer able to tell that size can only be 1,2,4 or 8 due to the check in __get/put_user resulting in false positives like ./arch/s390/include/asm/uaccess.h: In function ‘__put_user_fn’: ./arch/s390/include/asm/uaccess.h:113:9: warning: ‘rc’ may be used uninitialized in this function [-Wmaybe-uninitialized] 113 | return rc; | ^~ ./arch/s390/include/asm/uaccess.h: In function ‘__get_user_fn’: ./arch/s390/include/asm/uaccess.h:143:9: warning: ‘rc’ may be used uninitialized in this function [-Wmaybe-uninitialized] 143 | return rc; | ^~ These functions are supposed to be always inlined. Mark it as such. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/dma: provide proper ARCH_ZONE_DMA_BITS valueHalil Pasic2019-08-161-0/+2
| | | | | | | | | | | | | | | | [ Upstream commit 1a2dcff881059dedc14fafc8a442664c8dbd60f1 ] On s390 ZONE_DMA is up to 2G, i.e. ARCH_ZONE_DMA_BITS should be 31 bits. The current value is 24 and makes __dma_direct_alloc_pages() take a wrong turn first (but __dma_direct_alloc_pages() recovers then). Let's correct ARCH_ZONE_DMA_BITS value and avoid wrong turns. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reported-by: Petr Tesarik <ptesarik@suse.cz> Fixes: c61e9637340e ("dma-direct: add support for allocation from ZONE_DMA and ZONE_DMA32") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390: fix stfle zero paddingHeiko Carstens2019-07-211-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4f18d869ffd056c7858f3d617c71345cf19be008 upstream. The stfle inline assembly returns the number of double words written (condition code 0) or the double words it would have written (condition code 3), if the memory array it got as parameter would have been large enough. The current stfle implementation assumes that the array is always large enough and clears those parts of the array that have not been written to with a subsequent memset call. If however the array is not large enough memset will get a negative length parameter, which means that memset clears memory until it gets an exception and the kernel crashes. To fix this simply limit the maximum length. Move also the inline assembly to an extra function to avoid clobbering of register 0, which might happen because of the added min_t invocation together with code instrumentation. The bug was introduced with commit 14375bc4eb8d ("[S390] cleanup facility list handling") but was rather harmless, since it would only write to a rather large array. It became a potential problem with commit 3ab121ab1866 ("[S390] kernel: Add z/VM LGR detection"). Since then it writes to an array with only four double words, while some machines already deliver three double words. As soon as machines have a facility bit within the fifth double a crash on IPL would happen. Fixes: 14375bc4eb8d ("[S390] cleanup facility list handling") Cc: <stable@vger.kernel.org> # v2.6.37+ Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/ap: rework assembler functions to use unions for in/out register variablesHarald Freudenberger2019-06-251-9/+19
| | | | | | | | | | | | | | | | | | [ Upstream commit 159491f3b509bd8101199944dc7b0673b881c734 ] The inline assembler functions ap_aqic() and ap_qact() used two variables declared on the very same register. One variable was for input only, the other for output. Looks like newer versions of the gcc don't like this. Anyway it is a better coding to use one variable (which may have a union data type) on one register for input and output. So this patch introduces unions and uses only one variable now for input and output for GR1 for the PQAP(QACT) and PQAP(QIC) invocation. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/jump_label: Use "jdd" constraint on gcc9Ilya Leoshkevich2019-06-251-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 146448524bddbf6dfc62de31957e428de001cbda ] [heiko.carstens@de.ibm.com]: ----- Laura Abbott reported that the kernel doesn't build anymore with gcc 9, due to the "X" constraint. Ilya provided the gcc 9 patch "S/390: Introduce jdd constraint" which introduces the new "jdd" constraint which fixes this. ----- The support for section anchors on S/390 introduced in gcc9 has changed the behavior of "X" constraint, which can now produce register references. Since existing constraints, in particular, "i", do not fit the intended use case on S/390, the new machine-specific "jdd" constraint was introduced. This patch makes jump labels use "jdd" constraint when building with gcc9. Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/kasan: fix strncpy_from_user kasan checksVasily Gorbik2019-06-191-0/+2
| | | | | | | | | | | | | | | | | [ Upstream commit 01eb42afb45719cb41bb32c278e068073738899d ] arch/s390/lib/uaccess.c is built without kasan instrumentation. Kasan checks are performed explicitly in copy_from_user/copy_to_user functions. But since those functions could be inlined, calls from files like uaccess.c with instrumentation disabled won't generate kasan reports. This is currently the case with strncpy_from_user function which was revealed by newly added kasan test. Avoid inlining of copy_from_user/copy_to_user when the kernel is built with kasan support to make sure kasan checks are fully functional. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390: limit brk randomization to 32MBMartin Schwidefsky2019-05-041-4/+7
| | | | | | | | | | | | | | | [ Upstream commit cd479eccd2e057116d504852814402a1e68ead80 ] For a 64-bit process the randomization of the program break is quite large with 1GB. That is as big as the randomization of the anonymous mapping base, for a test case started with '/lib/ld64.so.1 <exec>' it can happen that the heap is placed after the stack. To avoid this limit the program break randomization to 32MB for 64-bit and keep 8MB for 31-bit. Reported-by: Stefan Liebler <stli@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
* KVM: Call kvm_arch_memslots_updated() before updating memslotsSean Christopherson2019-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream. kvm_arch_memslots_updated() is at this point in time an x86-specific hook for handling MMIO generation wraparound. x86 stashes 19 bits of the memslots generation number in its MMIO sptes in order to avoid full page fault walks for repeat faults on emulated MMIO addresses. Because only 19 bits are used, wrapping the MMIO generation number is possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that the generation has changed so that it can invalidate all MMIO sptes in case the effective MMIO generation has wrapped so as to avoid using a stale spte, e.g. a (very) old spte that was created with generation==0. Given that the purpose of kvm_arch_memslots_updated() is to prevent consuming stale entries, it needs to be called before the new generation is propagated to memslots. Invalidating the MMIO sptes after updating memslots means that there is a window where a vCPU could dereference the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO spte that was created with (pre-wrap) generation==0. Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/zcrypt: improve special ap message cmd handlingHarald Freudenberger2019-02-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit be534791011100d204602e2e0496e9e6ce8edf63 ] There exist very few ap messages which need to have the 'special' flag enabled. This flag tells the firmware layer to do some pre- and maybe postprocessing. However, it may happen that this special flag is enabled but the firmware is unable to deal with this kind of message and thus returns with reply code 0x41. For example older firmware may not know the newest messages triggered by the zcrypt device driver and thus react with reject and the named reply code. Unfortunately this reply code is not known to the zcrypt error routines and thus default behavior is to switch the ap queue offline. This patch now makes the ap error routine aware of the reply code and so userspace is informed about the bad processing result but the queue is not switched to offline state any more. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/mm: always force a load of the primary ASCE on context switchMartin Schwidefsky2019-01-311-3/+2
| | | | | | | | | | | | | | | | | | | | | | commit a38662084c8bdb829ff486468c7ea801c13fcc34 upstream. The ASCE of an mm_struct can be modified after a task has been created, e.g. via crst_table_downgrade for a compat process. The active_mm logic to avoid the switch_mm call if the next task is a kernel thread can lead to a situation where switch_mm is called where 'prev == next' is true but 'prev->context.asce == next->context.asce' is not. This can lead to a situation where a CPU uses the outdated ASCE to run a task. The result can be a crash, endless loops and really subtle problem due to TLBs being created with an invalid ASCE. Cc: stable@kernel.org # v3.15+ Fixes: 53e857f30867 ("s390/mm,tlb: race of lazy TLB flush vs. recreation") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/mm: fix mis-accounting of pgtable_bytesMartin Schwidefsky2018-11-274-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e12e4044aede97974f2222eb7f0ed726a5179a32 ] In case a fork or a clone system fails in copy_process and the error handling does the mmput() at the bad_fork_cleanup_mm label, the following warning messages will appear on the console: BUG: non-zero pgtables_bytes on freeing mm: 16384 The reason for that is the tricks we play with mm_inc_nr_puds() and mm_inc_nr_pmds() in init_new_context(). A normal 64-bit process has 3 levels of page table, the p4d level and the pud level are folded. On process termination the free_pud_range() function in mm/memory.c will subtract 16KB from pgtable_bytes with a mm_dec_nr_puds() call, but there actually is not really a pud table. One issue with this is the fact that pgtable_bytes is usually off by a few kilobytes, but the more severe problem is that for a failed fork or clone the free_pgtables() function is not called. In this case there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context(). The pgtable_bytes will be off by 16384 or 32768 bytes and we get the BUG message. The message itself is purely cosmetic, but annoying. To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded function to check for the true size of the address space. Reported-by: Li Wang <liwang@redhat.com> Tested-by: Li Wang <liwang@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/hibernate: fix error handling when suspend cpu != resume cpuGerald Schaefer2018-09-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The resume code checks if the resume cpu is the same as the suspend cpu. If not, and if it is also not possible to switch to the suspend cpu, an error message should be printed and the resume process should be stopped by loading a disabled wait psw. The current logic is broken in multiple ways, the message is never printed, and the disabled wait psw never loaded because the kernel panics before that: - sam31 and SIGP_SET_ARCHITECTURE to ESA mode is wrong, this will break on the first 64bit instruction in sclp_early_printk(). - The init stack should be used, but the stack pointer is not set up correctly (missing aghi %r15,-STACK_FRAME_OVERHEAD). - __sclp_early_printk() checks the sclp_init_state. If it is not sclp_init_state_uninitialized, it simply returns w/o printing anything. In the resumed kernel however, sclp_init_state will never be uninitialized. This patch fixes those issues by removing the sam31/ESA logic, adding a correct init stack pointer, and also introducing sclp_early_printk_force() to allow using sclp_early_printk() even when sclp_init_state is not uninitialized. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* KVM: s390: Properly lock mm context allow_gmap_hpage_1m settingJanosch Frank2018-09-041-1/+7
| | | | | | | | | | We have to do down_write on the mm semaphore to set a bitfield in the mm context. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Fixes: a4499382 ("KVM: s390: Add huge page enablement control") Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2018-08-242-43/+43
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - A couple of patches for the zcrypt driver: + Add two masks to determine which AP cards and queues are host devices, this will be useful for KVM AP device passthrough + Add-on patch to improve the parsing of the new apmask and aqmask + Some code beautification - Second try to reenable the GCC plugins, the first patch set had a patch to do this but the merge somehow missed this - Remove the s390 specific GCC version check and use the generic one - Three patches for kdump, two bug fixes and one cleanup - Three patches for the PCI layer, one bug fix and two cleanups * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: remove gcc version check (4.3 or newer) s390/zcrypt: hex string mask improvements for apmask and aqmask. s390/zcrypt: AP bus support for alternate driver(s) s390/zcrypt: code beautify s390/zcrypt: switch return type to bool for ap_instructions_available() s390/kdump: Remove kzalloc_panic s390/kdump: Fix memleak in nt_vmcoreinfo s390/kdump: Make elfcorehdr size calculation ABI compliant s390/pci: remove fmb address from debug output s390/pci: remove stale rc s390/pci: fix out of bounds access during irq setup s390/zcrypt: fix ap_instructions_available() returncodes s390: reenable gcc plugins for real
| * s390/zcrypt: code beautifyHarald Freudenberger2018-08-201-36/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code beautify by following most of the checkpatch suggestions: - SPDX license identifier line complains by checkpatch - missing space or newline complains by checkpatch - octal numbers for permssions complains by checkpatch - renaming of static sysfs functions complains by checkpatch - fix of block comment complains by checkpatch - fix printf like calls where function name instead of %s __func__ was used - __packed instead of __attribute__((packed)) - init to zero for static variables removed - use of DEVICE_ATTR_RO and DEVICE_ATTR_RW macros No functional code changes or API changes! Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/zcrypt: switch return type to bool for ap_instructions_available()Harald Freudenberger2018-08-201-3/+3
| | | | | | | | | | | | | | | | | | Function ap_instructions_available() had returntype int but in fact returned 1 for true and 0 for false. Changed returntype to bool. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/zcrypt: fix ap_instructions_available() returncodesHarald Freudenberger2018-08-161-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During review of KVM patches it was complained that the ap_instructions_available() function returns 0 if AP instructions are available and -ENODEV if not. The function acts like a boolean function to check for AP instructions available and thus should return 0 on failure and != 0 on success. Changed to the suggested behaviour and adapted the one and only caller of this function which is the ap bus core code. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* | Merge tag 'trace-v4.19' of ↵Linus Torvalds2018-08-201-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Restructure of lockdep and latency tracers This is the biggest change. Joel Fernandes restructured the hooks from irqs and preemption disabling and enabling. He got rid of a lot of the preprocessor #ifdef mess that they caused. He turned both lockdep and the latency tracers to use trace events inserted in the preempt/irqs disabling paths. But unfortunately, these started to cause issues in corner cases. Thus, parts of the code was reverted back to where lockdep and the latency tracers just get called directly (without using the trace events). But because the original change cleaned up the code very nicely we kept that, as well as the trace events for preempt and irqs disabling, but they are limited to not being called in NMIs. - Have trace events use SRCU for "rcu idle" calls. This was required for the preempt/irqs off trace events. But it also had to not allow them to be called in NMI context. Waiting till Paul makes an NMI safe SRCU API. - New notrace SRCU API to allow trace events to use SRCU. - Addition of mcount-nop option support - SPDX headers replacing GPL templates. - Various other fixes and clean ups. - Some fixes are marked for stable, but were not fully tested before the merge window opened. * tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) tracing: Fix SPDX format headers to use C++ style comments tracing: Add SPDX License format tags to tracing files tracing: Add SPDX License format to bpf_trace.c blktrace: Add SPDX License format header s390/ftrace: Add -mfentry and -mnop-mcount support tracing: Add -mcount-nop option support tracing: Avoid calling cc-option -mrecord-mcount for every Makefile tracing: Handle CC_FLAGS_FTRACE more accurately Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Uprobes: Simplify uprobe_register() body tracepoints: Free early tracepoints after RCU is initialized uprobes: Use synchronize_rcu() not synchronize_sched() tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister() ftrace: Remove unused pointer ftrace_swapper_pid tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing/irqsoff: Handle preempt_count for different configs tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing: irqsoff: Account for additional preempt_disable trace: Use rcu_dereference_raw for hooks from trace-event subsystem tracing/kprobes: Fix within_notrace_func() to check only notrace functions ...
| * | s390/ftrace: Add -mfentry and -mnop-mcount supportVasily Gorbik2018-08-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Utilize -mfentry and -mnop-mcount gcc options together with -mrecord-mcount to get compiler generated calls to the profiling functions as nops which are compatible with current -mhotpatch=0,3 approach. At the same time -mrecord-mcount enables __mcount_loc section generation by the compiler which allows to avoid using scripts/recordmcount.pl script. Link: http://lkml.kernel.org/r/patch-4.thread-aa7b8d.git-aa7b8dbf236f.your-ad-here.call-01533557518-ext-9465@work.hours Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2018-08-192-8/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull first set of KVM updates from Paolo Bonzini: "PPC: - minor code cleanups x86: - PCID emulation and CR3 caching for shadow page tables - nested VMX live migration - nested VMCS shadowing - optimized IPI hypercall - some optimizations ARM will come next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (85 commits) kvm: x86: Set highest physical address bits in non-present/reserved SPTEs KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c KVM: X86: Implement PV IPIs in linux guest KVM: X86: Add kvm hypervisor init time platform setup callback KVM: X86: Implement "send IPI" hypercall KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs() KVM: x86: Skip pae_root shadow allocation if tdp enabled KVM/MMU: Combine flushing remote tlb in mmu_set_spte() KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup KVM: vmx: move struct host_state usage to struct loaded_vmcs KVM: vmx: compute need to reload FS/GS/LDT on demand KVM: nVMX: remove a misleading comment regarding vmcs02 fields KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state() KVM: vmx: add dedicated utility to access guest's kernel_gs_base KVM: vmx: track host_state.loaded using a loaded_vmcs pointer KVM: vmx: refactor segmentation code in vmx_save_host_state() kvm: nVMX: Fix fault priority for VMX operations kvm: nVMX: Fix fault vector for VMX operation at CPL > 0 ...
| * \ \ Merge tag 'hlp_stage1' of ↵Janosch Frank2018-07-305-4/+27
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvms390/next KVM: s390: initial host large page support - must be enabled via module parameter hpage=1 - cannot be used together with nested - does support migration - does support hugetlbfs - no THP yet
| * | | | KVM: s390: Beautify skey enable checkJanosch Frank2018-07-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's introduce an explicit check if skeys have already been enabled for the vcpu, so we don't have to check the mm context if we don't have the storage key facility. This lets us check for enablement without having to take the mm semaphore and thus speedup skey emulation. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | | | KVM: s390: add etoken support for guestsChristian Borntraeger2018-07-192-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to provide facility 156 (etoken facility) to our guests. This includes migration support (via sync regs) and VSIE changes. The tokens are being reset on clear reset. This has to be implemented by userspace (via sync regs). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com>
| * | | | KVM: s390: Fix storage attributes migration with memory slotsClaudio Imbrenda2018-07-131-7/+2
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a fix for several issues that were found in the original code for storage attributes migration. Now no bitmap is allocated to keep track of dirty storage attributes; the extra bits of the per-memslot bitmap that are always present anyway are now used for this purpose. The code has also been refactored a little to improve readability. Fixes: 190df4a212a ("KVM: s390: CMMA tracking, ESSA emulation, migration mode") Fixes: 4036e3874a1 ("KVM: s390: ioctls to get and set guest storage attributes") Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Message-Id: <1525106005-13931-3-git-send-email-imbrenda@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2018-08-151-0/+3
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "Highlights: - Gustavo A. R. Silva keeps working on the implicit switch fallthru changes. - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From Luca Coelho. - Re-enable ASPM in r8169, from Kai-Heng Feng. - Add virtual XFRM interfaces, which avoids all of the limitations of existing IPSEC tunnels. From Steffen Klassert. - Convert GRO over to use a hash table, so that when we have many flows active we don't traverse a long list during accumluation. - Many new self tests for routing, TC, tunnels, etc. Too many contributors to mention them all, but I'm really happy to keep seeing this stuff. - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu. - Lots of cleanups and fixes in L2TP code from Guillaume Nault. - Add IPSEC offload support to netdevsim, from Shannon Nelson. - Add support for slotting with non-uniform distribution to netem packet scheduler, from Yousuk Seung. - Add UDP GSO support to mlx5e, from Boris Pismenny. - Support offloading of Team LAG in NFP, from John Hurley. - Allow to configure TX queue selection based upon RX queue, from Amritha Nambiar. - Support ethtool ring size configuration in aquantia, from Anton Mikaev. - Support DSCP and flowlabel per-transport in SCTP, from Xin Long. - Support list based batching and stack traversal of SKBs, this is very exciting work. From Edward Cree. - Busyloop optimizations in vhost_net, from Toshiaki Makita. - Introduce the ETF qdisc, which allows time based transmissions. IGB can offload this in hardware. From Vinicius Costa Gomes. - Add parameter support to devlink, from Moshe Shemesh. - Several multiplication and division optimizations for BPF JIT in nfp driver, from Jiong Wang. - Lots of prepatory work to make more of the packet scheduler layer lockless, when possible, from Vlad Buslov. - Add ACK filter and NAT awareness to sch_cake packet scheduler, from Toke Høiland-Jørgensen. - Support regions and region snapshots in devlink, from Alex Vesker. - Allow to attach XDP programs to both HW and SW at the same time on a given device, with initial support in nfp. From Jakub Kicinski. - Add TLS RX offload and support in mlx5, from Ilya Lesokhin. - Use PHYLIB in r8169 driver, from Heiner Kallweit. - All sorts of changes to support Spectrum 2 in mlxsw driver, from Ido Schimmel. - PTP support in mv88e6xxx DSA driver, from Andrew Lunn. - Make TCP_USER_TIMEOUT socket option more accurate, from Jon Maxwell. - Support for templates in packet scheduler classifier, from Jiri Pirko. - IPV6 support in RDS, from Ka-Cheong Poon. - Native tproxy support in nf_tables, from Máté Eckl. - Maintain IP fragment queue in an rbtree, but optimize properly for in-order frags. From Peter Oskolkov. - Improvde handling of ACKs on hole repairs, from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits) bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT" hv/netvsc: Fix NULL dereference at single queue mode fallback net: filter: mark expected switch fall-through xen-netfront: fix warn message as irq device name has '/' cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0 net: dsa: mv88e6xxx: missing unlock on error path rds: fix building with IPV6=m inet/connection_sock: prefer _THIS_IP_ to current_text_addr net: dsa: mv88e6xxx: bitwise vs logical bug net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() ieee802154: hwsim: using right kind of iteration net: hns3: Add vlan filter setting by ethtool command -K net: hns3: Set tx ring' tc info when netdev is up net: hns3: Remove tx ring BD len register in hns3_enet net: hns3: Fix desc num set to default when setting channel net: hns3: Fix for phy link issue when using marvell phy driver net: hns3: Fix for information of phydev lost problem when down/up net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero net: hns3: Add support for serdes loopback selftest bnxt_en: take coredump_record structure off stack ...