summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'pm-5.2-rc7' of ↵Linus Torvalds2019-06-293-6/+31
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Avoid skipping bus-level PCI power management during system resume for PCIe ports left in D0 during the preceding suspend transition on platforms where the power states of those ports can change out of the PCI layer's control" * tag 'pm-5.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PCI: PM: Avoid skipping bus-level PM on platforms without ACPI
| * PCI: PM: Avoid skipping bus-level PM on platforms without ACPIRafael J. Wysocki2019-06-263-6/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are platforms that do not call pm_set_suspend_via_firmware(), so pm_suspend_via_firmware() returns 'false' on them, but the power states of PCI devices (PCIe ports in particular) are changed as a result of powering down core platform components during system-wide suspend. Thus the pm_suspend_via_firmware() checks in pci_pm_suspend_noirq() and pci_pm_resume_noirq() introduced by commit 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to- idle") are not sufficient to determine that devices left in D0 during suspend will remain in D0 during resume and so the bus-level power management can be skipped for them. For this reason, introduce a new global suspend flag, PM_SUSPEND_FLAG_NO_PLATFORM, set it for suspend-to-idle only and replace the pm_suspend_via_firmware() checks mentioned above with checks against this flag. Fixes: 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-idle") Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | Merge tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds2019-06-296-5/+108
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull XArray fixes from Matthew Wilcox: - Account XArray nodes for the page cache to the appropriate cgroup (Johannes Weiner) - Fix idr_get_next() when called under the RCU lock (Matthew Wilcox) - Add a test for xa_insert() (Matthew Wilcox) * tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax: XArray tests: Add check_insert idr: Fix idr_get_next race with idr_remove mm: fix page cache convergence regression
| * | XArray tests: Add check_insertMatthew Wilcox2019-06-021-0/+38
| | | | | | | | | | | | | | | | | | | | | A simple test which just checks that inserting an entry into an empty array succeeds. Try various different interesting indices. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | idr: Fix idr_get_next race with idr_removeMatthew Wilcox (Oracle)2019-06-022-2/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the entry is deleted from the IDR between the call to radix_tree_iter_find() and rcu_dereference_raw(), idr_get_next() will return NULL, which will end the iteration prematurely. We should instead continue to the next entry in the IDR. This only happens if the iteration is protected by the RCU lock. Most IDR users use a spinlock or semaphore to exclude simultaneous modifications. It was noticed once the PID allocator was converted to use the IDR, as it uses the RCU lock, but there may be other users elsewhere in the kernel. We can't use the normal pattern of calling radix_tree_deref_retry() (which catches both a retry entry in a leaf node and a node entry in the root) as the IDR supports storing entries which are unaligned, which will trigger an infinite loop if they are encountered. Instead, we have to explicitly check whether the entry is a retry entry. Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree") Reported-by: Brendan Gregg <bgregg@netflix.com> Tested-by: Brendan Gregg <bgregg@netflix.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
| * | mm: fix page cache convergence regressionJohannes Weiner2019-05-313-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since a28334862993 ("page cache: Finish XArray conversion"), on most major Linux distributions, the page cache doesn't correctly transition when the hot data set is changing, and leaves the new pages thrashing indefinitely instead of kicking out the cold ones. On a freshly booted, freshly ssh'd into virtual machine with 1G RAM running stock Arch Linux: [root@ham ~]# ./reclaimtest.sh + dd of=workingset-a bs=1M count=0 seek=600 + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + ./mincore workingset-a 153600/153600 workingset-a + dd of=workingset-b bs=1M count=0 seek=600 + cat workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 104029/153600 workingset-a 120086/153600 workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 104029/153600 workingset-a 120268/153600 workingset-b workingset-b is a 600M file on a 1G host that is otherwise entirely idle. No matter how often it's being accessed, it won't get cached. While investigating, I noticed that the non-resident information gets aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is a problem because a workingset transition like this relies on the non-resident information tracked in the page cache tree of evicted file ranges: when the cache faults are refaults of recently evicted cache, we challenge the existing active set, and that allows a new workingset to establish itself. Tracing the shrinker that maintains this memory revealed that all page cache tree nodes were allocated to the root cgroup. This is a problem, because 1) the shrinker sizes the amount of non-resident information it keeps to the size of the cgroup's other memory and 2) on most major Linux distributions, only kernel threads live in the root cgroup and everything else gets put into services or session groups: [root@ham ~]# cat /proc/self/cgroup 0::/user.slice/user-0.slice/session-c1.scope As a result, we basically maintain no non-resident information for the workloads running on the system, thus breaking the caching algorithm. Looking through the code, I found the culprit in the above-mentioned patch: when switching from the radix tree to xarray, it dropped the __GFP_ACCOUNT flag from the tree node allocations - the flag that makes sure the allocated memory gets charged to and tracked by the cgroup of the calling process - in this case, the one doing the fault. To fix this, allow xarray users to specify per-tree flag that makes xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache tree annotation to request such cgroup tracking for the cache nodes. With this patch applied, the page cache correctly converges on new workingsets again after just a few iterations: [root@ham ~]# ./reclaimtest.sh + dd of=workingset-a bs=1M count=0 seek=600 + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + ./mincore workingset-a 153600/153600 workingset-a + dd of=workingset-b bs=1M count=0 seek=600 + cat workingset-b + ./mincore workingset-a workingset-b 124607/153600 workingset-a 87876/153600 workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 81313/153600 workingset-a 133321/153600 workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 63036/153600 workingset-a 153600/153600 workingset-b Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
* | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2019-06-2920-80/+97
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL mm, swap: fix THP swap out fork,memcg: alloc_thread_stack_node needs to set tsk->stack MAINTAINERS: add CLANG/LLVM BUILD SUPPORT info mm/vmalloc.c: avoid bogus -Wmaybe-uninitialized warning mm/page_idle.c: fix oops because end_pfn is larger than max_pfn initramfs: fix populate_initrd_image() section mismatch mm/oom_kill.c: fix uninitialized oc->constraint mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails signal: remove the wrong signal_pending() check in restore_user_sigmask() fs/binfmt_flat.c: make load_flat_shared_library() work mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask fs/proc/array.c: allow reporting eip/esp for all coredumping threads mm/dev_pfn: exclude MEMORY_DEVICE_PRIVATE while computing virtual address
| * | | linux/kernel.h: fix overflow for DIV_ROUND_UP_ULLVinod Koul2019-06-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DIV_ROUND_UP_ULL adds the two arguments and then invokes DIV_ROUND_DOWN_ULL. But on a 32bit system the addition of two 32 bit values can overflow. DIV_ROUND_DOWN_ULL does it correctly and stashes the addition into a unsigned long long so cast the result to unsigned long long here to avoid the overflow condition. [akpm@linux-foundation.org: DIV_ROUND_UP_ULL must be an rval] Link: http://lkml.kernel.org/r/20190625100518.30753-1-vkoul@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm, swap: fix THP swap outHuang Ying2019-06-291-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0-Day test system reported some OOM regressions for several THP (Transparent Huge Page) swap test cases. These regressions are bisected to 6861428921b5 ("block: always define BIO_MAX_PAGES as 256"). In the commit, BIO_MAX_PAGES is set to 256 even when THP swap is enabled. So the bio_alloc(gfp_flags, 512) in get_swap_bio() may fail when swapping out THP. That causes the OOM. As in the patch description of 6861428921b5 ("block: always define BIO_MAX_PAGES as 256"), THP swap should use multi-page bvec to write THP to swap space. So the issue is fixed via doing that in get_swap_bio(). BTW: I remember I have checked the THP swap code when 6861428921b5 ("block: always define BIO_MAX_PAGES as 256") was merged, and thought the THP swap code needn't to be changed. But apparently, I was wrong. I should have done this at that time. Link: http://lkml.kernel.org/r/20190624075515.31040-1-ying.huang@intel.com Fixes: 6861428921b5 ("block: always define BIO_MAX_PAGES as 256") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | fork,memcg: alloc_thread_stack_node needs to set tsk->stackAndrea Arcangeli2019-06-291-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail") corrected two instances, but there was a third instance of this bug. Without setting tsk->stack, if memcg_charge_kernel_stack fails, it'll execute free_thread_stack() on a dangling pointer. Enterprise kernels are compiled with VMAP_STACK=y so this isn't critical, but custom VMAP_STACK=n builds should have some performance advantage, with the drawback of risking to fail fork because compaction didn't succeed. So as long as VMAP_STACK=n is a supported option it's worth fixing it upstream. Link: http://lkml.kernel.org/r/20190619011450.28048-1-aarcange@redhat.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | MAINTAINERS: add CLANG/LLVM BUILD SUPPORT infoNick Desaulniers2019-06-291-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add keyword support so that our mailing list gets cc'ed for clang/llvm patches. We're pretty active on our mailing list so far as code review. There are numerous Googlers like myself that are paid to support building the Linux kernel with Clang and LLVM. Link: http://lkml.kernel.org/r/20190620001907.255803-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm/vmalloc.c: avoid bogus -Wmaybe-uninitialized warningArnd Bergmann2019-06-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc gets confused in pcpu_get_vm_areas() because there are too many branches that affect whether 'lva' was initialized before it gets used: mm/vmalloc.c: In function 'pcpu_get_vm_areas': mm/vmalloc.c:991:4: error: 'lva' may be used uninitialized in this function [-Werror=maybe-uninitialized] insert_vmap_area_augment(lva, &va->rb_node, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ &free_vmap_area_root, &free_vmap_area_list); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/vmalloc.c:916:20: note: 'lva' was declared here struct vmap_area *lva; ^~~ Add an intialization to NULL, and check whether this has changed before the first use. [akpm@linux-foundation.org: tweak comments] Link: http://lkml.kernel.org/r/20190618092650.2943749-1-arnd@arndb.de Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Joel Fernandes <joelaf@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm/page_idle.c: fix oops because end_pfn is larger than max_pfnColin Ian King2019-06-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the calcuation of end_pfn can round up the pfn number to more than the actual maximum number of pfns, causing an Oops. Fix this by ensuring end_pfn is never more than max_pfn. This can be easily triggered when on systems where the end_pfn gets rounded up to more than max_pfn using the idle-page stress-ng stress test: sudo stress-ng --idle-page 0 BUG: unable to handle kernel paging request at 00000000000020d8 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 11039 Comm: stress-ng-idle- Not tainted 5.0.0-5-generic #6-Ubuntu Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:page_idle_get_page+0xc8/0x1a0 Code: 0f b1 0a 75 7d 48 8b 03 48 89 c2 48 c1 e8 33 83 e0 07 48 c1 ea 36 48 8d 0c 40 4c 8d 24 88 49 c1 e4 07 4c 03 24 d5 00 89 c3 be <49> 8b 44 24 58 48 8d b8 80 a1 02 00 e8 07 d5 77 00 48 8b 53 08 48 RSP: 0018:ffffafd7c672fde8 EFLAGS: 00010202 RAX: 0000000000000005 RBX: ffffe36341fff700 RCX: 000000000000000f RDX: 0000000000000284 RSI: 0000000000000275 RDI: 0000000001fff700 RBP: ffffafd7c672fe00 R08: ffffa0bc34056410 R09: 0000000000000276 R10: ffffa0bc754e9b40 R11: ffffa0bc330f6400 R12: 0000000000002080 R13: ffffe36341fff700 R14: 0000000000080000 R15: ffffa0bc330f6400 FS: 00007f0ec1ea5740(0000) GS:ffffa0bc7db00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000020d8 CR3: 0000000077d68000 CR4: 00000000000006e0 Call Trace: page_idle_bitmap_write+0x8c/0x140 sysfs_kf_bin_write+0x5c/0x70 kernfs_fop_write+0x12e/0x1b0 __vfs_write+0x1b/0x40 vfs_write+0xab/0x1b0 ksys_write+0x55/0xc0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: http://lkml.kernel.org/r/20190618124352.28307-1-colin.king@canonical.com Fixes: 33c3fc71c8cf ("mm: introduce idle page tracking") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | initramfs: fix populate_initrd_image() section mismatchGeert Uytterhoeven2019-06-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With gcc-4.6.3: WARNING: vmlinux.o(.text.unlikely+0x140): Section mismatch in reference from the function populate_initrd_image() to the variable .init.ramfs.info:__initramfs_size The function populate_initrd_image() references the variable __init __initramfs_size. This is often because populate_initrd_image lacks a __init annotation or the annotation of __initramfs_size is wrong. WARNING: vmlinux.o(.text.unlikely+0x14c): Section mismatch in reference from the function populate_initrd_image() to the function .init.text:unpack_to_rootfs() The function populate_initrd_image() references the function __init unpack_to_rootfs(). This is often because populate_initrd_image lacks a __init annotation or the annotation of unpack_to_rootfs is wrong. WARNING: vmlinux.o(.text.unlikely+0x198): Section mismatch in reference from the function populate_initrd_image() to the function .init.text:xwrite() The function populate_initrd_image() references the function __init xwrite(). This is often because populate_initrd_image lacks a __init annotation or the annotation of xwrite is wrong. Indeed, if the compiler decides not to inline populate_initrd_image(), a warning is generated. Fix this by adding the missing __init annotations. Link: http://lkml.kernel.org/r/20190617074340.12779-1-geert@linux-m68k.org Fixes: 7c184ecd262fe64f ("initramfs: factor out a helper to populate the initrd image") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm/oom_kill.c: fix uninitialized oc->constraintYafang Shao2019-06-291-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In dump_oom_summary() oc->constraint is used to show oom_constraint_text, but it hasn't been set before. So the value of it is always the default value 0. We should inititialize it before. Bellow is the output when memcg oom occurs, before this patch: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null), cpuset=/,mems_allowed=0,oom_memcg=/foo,task_memcg=/foo,task=bash,pid=7997,uid=0 after this patch: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null), cpuset=/,mems_allowed=0,oom_memcg=/foo,task_memcg=/foo,task=bash,pid=13681,uid=0 Link: http://lkml.kernel.org/r/1560522038-15879-1-git-send-email-laoar.shao@gmail.com Fixes: ef8444ea01d7 ("mm, oom: reorganize the oom report in dump_header") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Wind Yu <yuzhoujian@didichuxing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHugeNaoya Horiguchi2019-06-292-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline for hugepages with overcommitting enabled. That was caused by the suboptimal code in current soft-offline code. See the following part: ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { ... } else { /* * We set PG_hwpoison only when the migration source hugepage * was successfully dissolved, because otherwise hwpoisoned * hugepage remains on free hugepage list, then userspace will * find it as SIGBUS by allocation failure. That's not expected * in soft-offlining. */ ret = dissolve_free_huge_page(page); if (!ret) { if (set_hwpoison_free_buddy_page(page)) num_poisoned_pages_inc(); } } return ret; Here dissolve_free_huge_page() returns -EBUSY if the migration source page was freed into buddy in migrate_pages(), but even in that case we actually has a chance that set_hwpoison_free_buddy_page() succeeds. So that means current code gives up offlining too early now. dissolve_free_huge_page() checks that a given hugepage is suitable for dissolving, where we should return success for !PageHuge() case because the given hugepage is considered as already dissolved. This change also affects other callers of dissolve_free_huge_page(), which are cleaned up together. [n-horiguchi@ah.jp.nec.com: v3] Link: http://lkml.kernel.org/r/1560761476-4651-3-git-send-email-n-horiguchi@ah.jp.nec.comLink: http://lkml.kernel.org/r/1560154686-18497-3-git-send-email-n-horiguchi@ah.jp.nec.com Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Chen, Jerry T <jerry.t.chen@intel.com> Tested-by: Chen, Jerry T <jerry.t.chen@intel.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com> Cc: "Chen, Jerry T" <jerry.t.chen@intel.com> Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() failsNaoya Horiguchi2019-06-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pass/fail of soft offline should be judged by checking whether the raw error page was finally contained or not (i.e. the result of set_hwpoison_free_buddy_page()), but current code do not work like that. It might lead us to misjudge the test result when set_hwpoison_free_buddy_page() fails. Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the original page and will not return an error. Link: http://lkml.kernel.org/r/1560154686-18497-2-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining") Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com> Cc: "Chen, Jerry T" <jerry.t.chen@intel.com> Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | signal: remove the wrong signal_pending() check in restore_user_sigmask()Oleg Nesterov2019-06-296-28/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the minimal fix for stable, I'll send cleanups later. Commit 854a6ed56839 ("signal: Add restore_user_sigmask()") introduced the visible change which breaks user-space: a signal temporary unblocked by set_user_sigmask() can be delivered even if the caller returns success or timeout. Change restore_user_sigmask() to accept the additional "interrupted" argument which should be used instead of signal_pending() check, and update the callers. Eric said: : For clarity. I don't think this is required by posix, or fundamentally to : remove the races in select. It is what linux has always done and we have : applications who care so I agree this fix is needed. : : Further in any case where the semantic change that this patch rolls back : (aka where allowing a signal to be delivered and the select like call to : complete) would be advantage we can do as well if not better by using : signalfd. : : Michael is there any chance we can get this guarantee of the linux : implementation of pselect and friends clearly documented. The guarantee : that if the system call completes successfully we are guaranteed that no : signal that is unblocked by using sigmask will be delivered? Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Eric Wong <e@80x24.org> Tested-by: Eric Wong <e@80x24.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jason Baron <jbaron@akamai.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Laight <David.Laight@ACULAB.COM> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | fs/binfmt_flat.c: make load_flat_shared_library() workJann Horn2019-06-291-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | load_flat_shared_library() is broken: It only calls load_flat_file() if prepare_binprm() returns zero, but prepare_binprm() returns the number of bytes read - so this only happens if the file is empty. Instead, call into load_flat_file() if the number of bytes read is non-negative. (Even if the number of bytes is zero - in that case, load_flat_file() will see nullbytes and return a nice -ENOEXEC.) In addition, remove the code related to bprm creds and stop using prepare_binprm() - this code is loading a library, not a main executable, and it only actually uses the members "buf", "file" and "filename" of the linux_binprm struct. Instead, call kernel_read() directly. Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com Fixes: 287980e49ffc ("remove lots of IS_ERR_VALUE abuses") Signed-off-by: Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemaskzhong jiang2019-06-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mpol_rebind_nodemask() is called for MPOL_BIND and MPOL_INTERLEAVE mempoclicies when the tasks's cpuset's mems_allowed changes. For policies created without MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES, it works by remapping the policy's allowed nodes (stored in v.nodes) using the previous value of mems_allowed (stored in w.cpuset_mems_allowed) as the domain of map and the new mems_allowed (passed as nodes) as the range of the map (see the comment of bitmap_remap() for details). The result of remapping is stored back as policy's nodemask in v.nodes, and the new value of mems_allowed should be stored in w.cpuset_mems_allowed to facilitate the next rebind, if it happens. However, 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") introduced a bug where the result of remapping is stored in w.cpuset_mems_allowed instead. Thus, a mempolicy's allowed nodes can evolve in an unexpected way after a series of rebinding due to cpuset mems_allowed changes, possibly binding to a wrong node or a smaller number of nodes which may e.g. overload them. This patch fixes the bug so rebinding again works as intended. [vbabka@suse.cz: new changlog] Link: http://lkml.kernel.org/r/ef6a69c6-c052-b067-8f2c-9d615c619bb9@suse.cz Link: http://lkml.kernel.org/r/1558768043-23184-1-git-send-email-zhongjiang@huawei.com Fixes: 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | fs/proc/array.c: allow reporting eip/esp for all coredumping threadsJohn Ogness2019-06-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") stopped reporting eip/esp and fd7d56270b52 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") reintroduced the feature to fix a regression with userspace core dump handlers (such as minicoredumper). Because PF_DUMPCORE is only set for the primary thread, this didn't fix the original problem for secondary threads. Allow reporting the eip/esp for all threads by checking for PF_EXITING as well. This is set for all the other threads when they are killed. coredump_wait() waits for all the tasks to become inactive before proceeding to invoke a core dumper. Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de Fixes: fd7d56270b526ca3 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reported-by: Jan Luebbe <jlu@pengutronix.de> Tested-by: Jan Luebbe <jlu@pengutronix.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm/dev_pfn: exclude MEMORY_DEVICE_PRIVATE while computing virtual addressAnshuman Khandual2019-06-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The presence of struct page does not guarantee linear mapping for the pfn physical range. Device private memory which is non-coherent is excluded from linear mapping during devm_memremap_pages() though they will still have struct page coverage. Change pfn_t_to_virt() to just check for device private memory before giving out virtual address for a given pfn. pfn_t_to_virt() actually has no callers. Let's fix it for the 5.2 kernel and remove it in 5.3. Link: http://lkml.kernel.org/r/1558089514-25067-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'arc-5.2-rc7' of ↵Linus Torvalds2019-06-292-8/+157
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - hsdk platform unifying apertures - build system CROSS_COMPILE prefix * tag 'arc-5.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: [plat-hsdk]: unify memory apertures configuration ARC: build: Try to guess CROSS_COMPILE with cc-cross-prefix
| * | | | ARC: [plat-hsdk]: unify memory apertures configurationEugeniy Paltsev2019-06-111-8/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HSDK SoC has memory bridge which allows to configure memory map for different AXI masters in runtime. As of today we adjust memory apertures configuration in U-boot so we have different configuration in case of loading kernel via U-boot and JTAG. It isn't really critical in case of existing platform configuration as configuration differs for <currently> unused address space regions or unused AXI masters. However we may face with this issue when we'll bringup new peripherals or touch their address space. Fix that by perform full configuration of memory bridge in HSDK platform code. Basically we simply copy memory bridge configuration code from U-boot. Acked-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * | | | ARC: build: Try to guess CROSS_COMPILE with cc-cross-prefixAlexey Brodkin2019-06-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a long time we used to hard-code CROSS_COMPILE prefix for ARC until it started to cause problems, so we decided to solely rely on CROSS_COMPILE externally set by a user: commit 40660f1fcee8 ("ARC: build: Don't set CROSS_COMPILE in arch's Makefile"). While it works perfectly fine for build-systems where the prefix gets defined anyways for us human beings it's quite an annoying requirement especially given most of time the same one prefix "arc-linux-" is all what we need. It looks like finally we're getting the best of both worlds: 1. W/o cross-toolchain we still may install headers, build .dtb etc 2. W/ cross-toolchain get the kerne built with only ARCH=arc Inspired by [1] & [2]. [1] http://lists.infradead.org/pipermail/linux-snps-arc/2019-May/005788.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fc2b47b55f17 A side note: even though "cc-cross-prefix" does its job it pollutes console with output of "which" for all the prefixes it didn't manage to find a matching cross-compiler for like that: | # ARCH=arc make defconfig | which: no arceb-linux-gcc in (~/.local/bin:~/bin:/usr/bin:/usr/sbin) | *** Default configuration is based on 'nsim_hs_defconfig' Suggested-by: Vineet Gupta <vgupta@synopsys.com> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* | | | | Merge tag 'riscv-for-v5.2/fixes-rc7' of ↵Linus Torvalds2019-06-296-16/+39
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "Minor RISC-V fixes and one defconfig update. The fixes have no functional impact: - Fix some comment text in the memory management vmalloc_fault path. - Fix some warnings from the DT compiler in our newly-added DT files. - Change the newly-added DT bindings such that SoC IP blocks with external I/O are marked as "disabled" by default, then enable them explicitly in board DT files when the devices are used on the board. This aligns the bindings with existing upstream practice. - Add the MIT license as an option for a minor header file, at the request of one of the U-Boot maintainers. The RISC-V defconfig update builds the SiFive SPI driver and the MMC-SPI driver by default. The intention here is to make v5.2 more usable for testers and users with RISC-V hardware" * tag 'riscv-for-v5.2/fixes-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: mm: Fix code comment dt-bindings: clock: sifive: add MIT license as an option for the header file dt-bindings: riscv: resolve 'make dt_binding_check' warnings riscv: dts: Re-organize the DT nodes RISC-V: defconfig: enable MMC & SPI for RISC-V
| * | | | | riscv: mm: Fix code commentShihPo Hung2019-06-261-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the comment since vmalloc_fault doesn't reach flush_tlb_fix_spurious_fault. Signed-off-by: ShihPo Hung <shihpo.hung@sifive.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-riscv@lists.infradead.org Reviewed-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
| * | | | | dt-bindings: clock: sifive: add MIT license as an option for the header filePaul Walmsley2019-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At Bin Meng's request, add the MIT license as an option for the SiFive FU540 PRCI header file. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Bin Meng <bmeng.cn@gmail.com>
| * | | | | dt-bindings: riscv: resolve 'make dt_binding_check' warningsPaul Walmsley2019-06-261-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rob pointed out that one of the examples in the RISC-V 'cpus' YAML schema results in warnings from 'make dt_binding_check'. Fix these. While here, make the whitespace in the second example consistent with the first example. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rob Herring <robh@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> # for fixing the dtc warnings
| * | | | | riscv: dts: Re-organize the DT nodesYash Shah2019-06-262-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per the convention for any SOC device with external connection, define only device DT node in SOC DTSi file with status = "disabled" and enable device in Board DTS file with status = "okay" Reported-by: Anup Patel <anup@brainfault.org> Signed-off-by: Yash Shah <yash.shah@sifive.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
| * | | | | RISC-V: defconfig: enable MMC & SPI for RISC-VAtish Patra2019-06-261-0/+5
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, riscv upstream defconfig doesn't let you boot through userspace if rootfs is on the SD card. Let's enable MMC & SPI drivers as well so that one can boot to the user space using default config in upstream kernel. While here, enable automatic mounting of devtmpfs to simplify kernel testing with minimal root filesystems. (pjw) Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Palmer Dabbelt <palmer@sifive.com> [paul.walmsley@sifive.com: mention the DEVTMPFS_MOUNT change in the patch description] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
* | | | | Merge tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2019-06-292-9/+9
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull two more NFS client fixes from Anna Schumaker: "These are both stable fixes. One to calculate the correct client message length in the case of partial transmissions. And the other to set the proper TCP timeout for flexfiles" * tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O SUNRPC: Fix up calculation of client message length
| * | | | | NFS/flexfiles: Use the correct TCP timeout for flexfiles I/OTrond Myklebust2019-06-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a typo where we're confusing the default TCP retrans value (NFS_DEF_TCP_RETRANS) for the default TCP timeout value. Fixes: 15d03055cf39f ("pNFS/flexfiles: Set reasonable default ...") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | SUNRPC: Fix up calculation of client message lengthTrond Myklebust2019-06-281-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case where a record marker was used, xs_sendpages() needs to return the length of the payload + record marker so that we operate correctly in the case of a partial transmission. When the callers check return value, they therefore need to take into account the record marker length. Fixes: 06b5fc3ad94e ("Merge tag 'nfs-rdma-for-5.1-1'...") Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | | | | Merge tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-clientLinus Torvalds2019-06-291-1/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph fix from Ilya Dryomov: "A small fix for a potential -rc1 regression from Jeff" * tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-client: ceph: fix ceph_mdsc_build_path to not stop on first component
| * | | | | | ceph: fix ceph_mdsc_build_path to not stop on first componentJeff Layton2019-06-271-1/+2
| | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ceph_mdsc_build_path is handed a positive dentry, it will return a zero-length path string with the base set to that dentry. This is not what we want. Always include at least one path component in the string. ceph_mdsc_build_path has behaved this way for a long time but it didn't matter until recent d_name handling rework. Fixes: 964fff7491e4 ("ceph: use ceph_mdsc_build_path instead of clone_dentry_name") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | | | | Merge tag 'scsi-fixes' of ↵Linus Torvalds2019-06-291-2/+4
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "One simple fix for a driver use after free" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
| * | | | | | scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()Jan Kara2019-06-201-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once we unlock adapter->hw_lock in pvscsi_queue_lck() nothing prevents just queued scsi_cmnd from completing and freeing the request. Thus cmd->cmnd[0] dereference can dereference already freed request leading to kernel crashes or other issues (which one of our customers observed). Store cmd->cmnd[0] in a local variable before unlocking adapter->hw_lock to fix the issue. CC: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | | | | | | Merge tag 'for-linus-20190628' of git://git.kernel.dk/linux-blockLinus Torvalds2019-06-292-4/+3
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "Just two small fixes. One from Paolo, fixing a silly mistake in BFQ. The other one is from me, ensuring that we have ->file cleared in the io_uring request a bit earlier. That avoids a use-before-free, if we encounter an error before ->file is assigned" * tag 'for-linus-20190628' of git://git.kernel.dk/linux-block: block, bfq: fix operator in BFQQ_TOTALLY_SEEKY io_uring: ensure req->file is cleared on allocation
| * | | | | | | block, bfq: fix operator in BFQQ_TOTALLY_SEEKYPaolo Valente2019-06-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By mistake, there is a '&' instead of a '==' in the definition of the macro BFQQ_TOTALLY_SEEKY. This commit replaces the wrong operator with the correct one. Fixes: 7074f076ff15 ("block, bfq: do not tag totally seeky queues as soft rt") Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | io_uring: ensure req->file is cleared on allocationJens Axboe2019-06-211-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stephen reports: I hit the following General Protection Fault when testing io_uring via the io_uring engine in fio. This was on a VM running 5.2-rc5 and the latest version of fio. The issue occurs for both null_blk and fake NVMe drives. I have not tested bare metal or real NVMe SSDs. The fio script used is given below. [io_uring] time_based=1 runtime=60 filename=/dev/nvme2n1 (note /dev/nullb0 also fails) ioengine=io_uring bs=4k rw=readwrite direct=1 fixedbufs=1 sqthread_poll=1 sqthread_poll_cpu=0 general protection fault: 0000 [#1] SMP PTI CPU: 0 PID: 872 Comm: io_uring-sq Not tainted 5.2.0-rc5-cpacket-io-uring #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 RIP: 0010:fput_many+0x7/0x90 Code: 01 48 85 ff 74 17 55 48 89 e5 53 48 8b 1f e8 a0 f9 ff ff 48 85 db 48 89 df 75 f0 5b 5d f3 c3 0f 1f 40 00 0f 1f 44 00 00 89 f6 <f0> 48 29 77 38 74 01 c3 55 48 89 e5 53 48 89 fb 65 48 \ RSP: 0018:ffffadeb817ebc50 EFLAGS: 00010246 RAX: 0000000000000004 RBX: ffff8f46ad477480 RCX: 0000000000001805 RDX: 0000000000000000 RSI: 0000000000000001 RDI: f18b51b9a39552b5 RBP: ffffadeb817ebc58 R08: ffff8f46b7a318c0 R09: 000000000000015d R10: ffffadeb817ebce8 R11: 0000000000000020 R12: ffff8f46ad4cd000 R13: 00000000fffffff7 R14: ffffadeb817ebe30 R15: 0000000000000004 FS: 0000000000000000(0000) GS:ffff8f46b7a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055828f0bbbf0 CR3: 0000000232176004 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? fput+0x13/0x20 io_free_req+0x20/0x40 io_put_req+0x1b/0x20 io_submit_sqe+0x40a/0x680 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 io_submit_sqes+0xb9/0x160 ? io_submit_sqes+0xb9/0x160 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? __schedule+0x3f2/0x6a0 ? __switch_to_asm+0x34/0x70 io_sq_thread+0x1af/0x470 ? __switch_to_asm+0x34/0x70 ? wait_woken+0x80/0x80 ? __switch_to+0x85/0x410 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? __schedule+0x3f2/0x6a0 kthread+0x105/0x140 ? io_submit_sqes+0x160/0x160 ? kthread+0x105/0x140 ? io_submit_sqes+0x160/0x160 ? kthread_destroy_worker+0x50/0x50 ret_from_fork+0x35/0x40 which occurs because using a kernel side submission thread isn't valid without using fixed files (registered through io_uring_register()). This causes io_uring to put the request after logging an error, but before the file field is set in the request. If it happens to be non-zero, we attempt to fput() garbage. Fix this by ensuring that req->file is initialized when the request is allocated. Cc: stable@vger.kernel.org # 5.1+ Reported-by: Stephen Bates <sbates@raithlin.com> Tested-by: Stephen Bates <sbates@raithlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | | | Merge tag 'pinctrl-v5.2-3' of ↵Linus Torvalds2019-06-293-27/+33
|\ \ \ \ \ \ \ \ | |_|_|_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Sorry to bomb in fixes this late. Maybe I can comfort you by saying it is only driver fixes, and mostly IRQ handling which is something GPIO and pin control drivers never get right. You think it works and then it doesn't. Summary: - Fix IRQ setup in the MCP23s08. - Fix pin setup on pins > 31 in the Ocelot driver. - Fix IRQs in the Mediatek driver" * tag 'pinctrl-v5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: mediatek: Update cur_mask in mask/mask ops pinctrl: mediatek: Ignore interrupts that are wake only during resume pinctrl: ocelot: fix pinmuxing for pins after 31 pinctrl: ocelot: fix gpio direction for pins after 31 pinctrl: mcp23s08: Fix add_data and irqchip_add_nested call order
| * | | | | | | pinctrl: mediatek: Update cur_mask in mask/mask opsNicolas Boichat2019-06-271-14/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During suspend/resume, mtk_eint_mask may be called while wake_mask is active. For example, this happens if a wake-source with an active interrupt handler wakes the system: irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so that it can be handled later on in the resume flow. However, this may happen before mtk_eint_do_resume is called: in this case, wake_mask is loaded, and cur_mask is restored from an older copy, re-enabling the interrupt, and causing an interrupt storm (especially for level interrupts). Step by step, for a line that has both wake and interrupt enabled: 1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt enabled at hardware level) 2. System suspends, resumes due to that line (at this stage EINT_EN == wake_mask) 3. irq_pm_check_wakeup is called, and disables the interrupt => EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1 4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so it reenables EINT_EN[irq] = 1 => interrupt storm as the driver is not yet ready to handle the interrupt. This patch fixes the issue in step 3, by recording all mask/unmask changes in cur_mask. This also avoids the need to read the current mask in eint_do_suspend, and we can remove mtk_eint_chip_read_mask function. The interrupt will be re-enabled properly later on, sometimes after mtk_eint_do_resume, when the driver is ready to handle it. Fixes: 58a5e1b64bb0 ("pinctrl: mediatek: Implement wake handler and suspend resume") Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * | | | | | | pinctrl: mediatek: Ignore interrupts that are wake only during resumeNicolas Boichat2019-06-261-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before suspending, mtk-eint would set the interrupt mask to the one in wake_mask. However, some of these interrupts may not have a corresponding interrupt handler, or the interrupt may be disabled. On resume, the eint irq handler would trigger nevertheless, and irq/pm.c:irq_pm_check_wakeup would be called, which would try to call irq_disable. However, if the interrupt is not enabled (irqd_irq_disabled(&desc->irq_data) is true), the call does nothing, and the interrupt is left enabled in the eint driver. Especially for level-sensitive interrupts, this will lead to an interrupt storm on resume. If we detect that an interrupt is only in wake_mask, but not in cur_mask, we can just mask it out immediately (as mtk_eint_resume would do anyway at a later stage in the resume sequence, when restoring cur_mask). Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls") Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * | | | | | | pinctrl: ocelot: fix pinmuxing for pins after 31Alexandre Belloni2019-06-251-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The actual layout for OCELOT_GPIO_ALT[01] when there are more than 32 pins is interleaved, i.e. OCELOT_GPIO_ALT0[0], OCELOT_GPIO_ALT1[0], OCELOT_GPIO_ALT0[1], OCELOT_GPIO_ALT1[1]. Introduce a new REG_ALT macro to facilitate the register offset calculation and use it where necessary. Fixes: da801ab56ad8 pinctrl: ocelot: add MSCC Jaguar2 support Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * | | | | | | pinctrl: ocelot: fix gpio direction for pins after 31Alexandre Belloni2019-06-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The third argument passed to REG is not the correct one and ocelot_gpio_set_direction is not working for pins after 31. Fix that by passing the pin number instead of the modulo 32 value. Fixes: da801ab56ad8 pinctrl: ocelot: add MSCC Jaguar2 support Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * | | | | | | pinctrl: mcp23s08: Fix add_data and irqchip_add_nested call orderPhil Reid2019-06-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently probing of the mcp23s08 results in an error message "detected irqchip that is shared with multiple gpiochips: please fix the driver" This is due to the following: Call to mcp23s08_irqchip_setup() with call hierarchy: mcp23s08_irqchip_setup() gpiochip_irqchip_add_nested() gpiochip_irqchip_add_key() gpiochip_set_irq_hooks() Call to devm_gpiochip_add_data() with call hierarchy: devm_gpiochip_add_data() gpiochip_add_data_with_key() gpiochip_add_irqchip() gpiochip_set_irq_hooks() The gpiochip_add_irqchip() returns immediately if there isn't a irqchip but we added a irqchip due to the previous mcp23s08_irqchip_setup() call. So it calls gpiochip_set_irq_hooks() a second time. Fix this by moving the call to devm_gpiochip_add_data before the call to mcp23s08_irqchip_setup Fixes: 02e389e63e35 ("pinctrl: mcp23s08: fix irq setup order") Suggested-by: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Phil Reid <preid@electromag.com.au> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
* | | | | | | | Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds2019-06-288-14/+19
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A handful of clk driver fixes and one core framework fix - Do a DT/firmware lookup in clk_core_get() even when the DT index is a nonsensical value - Fix some clk data typos in the Amlogic DT headers/code - Avoid returning junk in the TI clk driver when an invalid clk is looked for - Fix dividers for the emac clks on Stratix10 SoCs - Fix default HDA rates on Tegra210 to correct distorted audio" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: socfpga: stratix10: fix divider entry for the emac clocks clk: Do a DT parent lookup even when index < 0 clk: tegra210: Fix default rates for HDA clocks clk: ti: clkctrl: Fix returning uninitialized data clk: meson: meson8b: fix a typo in the VPU parent names array variable clk: meson: fix MPLL 50M binding id typo
| * | | | | | | | clk: socfpga: stratix10: fix divider entry for the emac clocksDinh Nguyen2019-06-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fixed dividers for the emac clocks should be 2 not 4. Cc: stable@vger.kernel.org Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
| * | | | | | | | clk: Do a DT parent lookup even when index < 0Stephen Boyd2019-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to allow the parent lookup to happen even if the index is some value less than 0. This may be the case if a clk provider only specifies the .name member to match a string in the "clock-names" DT property. We shouldn't require that the index be >= 0 to make this use case work. Fixes: 601b6e93304a ("clk: Allow parents to be specified via clkspec index") Reported-by: Alexandre Mergnat <amergnat@baylibre.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Cc: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>