summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* fs/libfs.c: fix kernel-doc warningRandy Dunlap2019-10-141-2/+1
| | | | | | | | | | | | | | Fix kernel-doc warning in fs/libfs.c: fs/libfs.c:496: warning: Excess function parameter 'available' description in 'simple_write_end' Link: http://lkml.kernel.org/r/5fc9d70b-e377-0ec9-066a-970d49579041@infradead.org Fixes: ad2a722f196d ("libfs: Open code simple_commit_write into only user") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Boaz Harrosh <boazh@netapp.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/direct-io.c: fix kernel-doc warningRandy Dunlap2019-10-141-2/+1
| | | | | | | | | | | | | | | | | Fix kernel-doc warning in fs/direct-io.c: fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete' Also, don't mark this function as having kernel-doc notation since it is not exported. Link: http://lkml.kernel.org/r/97908511-4328-4a56-17fe-f43a1d7aa470@infradead.org Fixes: 6d544bb4d901 ("dio: centralize completion in dio_complete()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Zach Brown <zab@zabbo.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()Vlastimil Babka2019-10-141-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Florian and Dave reported [1] a NULL pointer dereference in __reset_isolation_pfn(). While the exact cause is unclear, staring at the code revealed two bugs, which might be related. One bug is that if zone starts in the middle of pageblock, block_page might correspond to different pfn than block_pfn, and then the pfn_valid_within() checks will check different pfn's than those accessed via struct page. This might result in acessing an unitialized page in CONFIG_HOLES_IN_ZONE configs. The other bug is that end_page refers to the first page of next pageblock and not last page of current pageblock. The online and valid check is then wrong and with sections, the while (page < end_page) loop might wander off actual struct page arrays. [1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/ Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz Fixes: 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Florian Weimer <fw@deneb.enyo.de> Reported-by: Dave Chinner <david@fromorbit.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, hugetlb: allow hugepage allocations to reclaim as neededDavid Rientjes2019-10-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") has chnaged the allocator to bail out from the allocator early to prevent from a potentially excessive memory reclaim. __GFP_RETRY_MAYFAIL is designed to retry the allocation, reclaim and compaction loop as long as there is a reasonable chance to make forward progress. Neither COMPACT_SKIPPED nor COMPACT_DEFERRED at the INIT_COMPACT_PRIORITY compaction attempt gives this feedback. The most obvious affected subsystem is hugetlbfs which allocates huge pages based on an admin request (or via admin configured overcommit). I have done a simple test which tries to allocate half of the memory for hugetlb pages while the memory is full of a clean page cache. This is not an unusual situation because we try to cache as much of the memory as possible and sysctl/sysfs interface to allocate huge pages is there for flexibility to allocate hugetlb pages at any time. System has 1GB of RAM and we are requesting 515MB worth of hugetlb pages after the memory is prefilled by a clean page cache: root@test1:~# cat hugetlb_test.sh set -x echo 0 > /proc/sys/vm/nr_hugepages echo 3 > /proc/sys/vm/drop_caches echo 1 > /proc/sys/vm/compact_memory dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10)) TS=$(date +%s) echo 256 > /proc/sys/vm/nr_hugepages cat /proc/sys/vm/nr_hugepages The results for 2 consecutive runs on clean 5.3 root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s + date +%s + TS=1569905284 + echo 256 + cat /proc/sys/vm/nr_hugepages 256 root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s + date +%s + TS=1569905311 + echo 256 + cat /proc/sys/vm/nr_hugepages 256 Now with b39d0ee2632d applied root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s + date +%s + TS=1569905516 + echo 256 + cat /proc/sys/vm/nr_hugepages 11 root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s + date +%s + TS=1569905541 + echo 256 + cat /proc/sys/vm/nr_hugepages 12 The success rate went down by factor of 20! Although hugetlb allocation requests might fail and it is reasonable to expect them to under extremely fragmented memory or when the memory is under a heavy pressure but the above situation is not that case. Fix the regression by reverting back to the previous behavior for __GFP_RETRY_MAYFAIL requests and disable the beail out heuristic for those requests. Mike said: : hugetlbfs allocations are commonly done via sysctl/sysfs shortly after : boot where this may not be as much of an issue. However, I am aware of at : least three use cases where allocations are made after the system has been : up and running for quite some time: : : - DB reconfiguration. If sysctl/sysfs fails to get required number of : huge pages, system is rebooted to perform allocation after boot. : : - VM provisioning. If unable get required number of huge pages, fall : back to base pages. : : - An application that does not preallocate pool, but rather allocates : pages at fault time for optimal NUMA locality. : : In all cases, I would expect b39d0ee2632d to cause regressions and : noticable behavior changes. : : My quick/limited testing in : https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90bd7e@oracle.com : was insufficient. It was also mentioned that if something like : b39d0ee2632d went forward, I would like exemptions for __GFP_RETRY_MAYFAIL : requests as in this patch. [mhocko@suse.com: reworded changelog] Link: http://lkml.kernel.org/r/20191007075548.12456-1-mhocko@kernel.org Fixes: b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lib/test_meminit: add a kmem_cache_alloc_bulk() testAlexander Potapenko2019-10-141-0/+27
| | | | | | | | | | | | | | Make sure allocations from kmem_cache_alloc_bulk() and kmem_cache_free_bulk() are properly initialized. Link: http://lkml.kernel.org/r/20191007091605.30530-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Thibaut Sautereau <thibaut@sautereau.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocationsAlexander Potapenko2019-10-141-6/+16
| | | | | | | | | | | | | | | | | | | | | | | slab_alloc_node() already zeroed out the freelist pointer if init_on_free was on. Thibaut Sautereau noticed that the same needs to be done for kmem_cache_alloc_bulk(), which performs the allocations separately. kmem_cache_alloc_bulk() is currently used in two places in the kernel, so this change is unlikely to have a major performance impact. SLAB doesn't require a similar change, as auto-initialization makes the allocator store the freelist pointers off-slab. Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Thibaut Sautereau <thibaut@sautereau.fr> Reported-by: Kees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lib/generic-radix-tree.c: add kmemleak annotationsEric Biggers2019-10-141-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kmemleak is falsely reporting a leak of the slab allocation in sctp_stream_init_ext(): BUG: memory leak unreferenced object 0xffff8881114f5d80 (size 96): comm "syz-executor934", pid 7160, jiffies 4294993058 (age 31.950s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000ce7a1326>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<00000000ce7a1326>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000ce7a1326>] slab_alloc mm/slab.c:3326 [inline] [<00000000ce7a1326>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<000000007abb7ac9>] kmalloc include/linux/slab.h:547 [inline] [<000000007abb7ac9>] kzalloc include/linux/slab.h:742 [inline] [<000000007abb7ac9>] sctp_stream_init_ext+0x2b/0xa0 net/sctp/stream.c:157 [<0000000048ecb9c1>] sctp_sendmsg_to_asoc+0x946/0xa00 net/sctp/socket.c:1882 [<000000004483ca2b>] sctp_sendmsg+0x2a8/0x990 net/sctp/socket.c:2102 [...] But it's freed later. Kmemleak misses the allocation because its pointer is stored in the generic radix tree sctp_stream::out, and the generic radix tree uses raw pages which aren't tracked by kmemleak. Fix this by adding the kmemleak hooks to the generic radix tree code. Link: http://lkml.kernel.org/r/20191004065039.727564-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: <syzbot+7f3b6b106be8dcdcdeec@syzkaller.appspotmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/slub: fix a deadlock in show_slab_objects()Qian Cai2019-10-141-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A long time ago we fixed a similar deadlock in show_slab_objects() [1]. However, it is apparently due to the commits like 01fb58bcba63 ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path") and 03afc0e25f7f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by just reading files in /sys/kernel/slab which will generate a lockdep splat below. Since the "mem_hotplug_lock" here is only to obtain a stable online node mask while racing with NUMA node hotplug, in the worst case, the results may me miscalculated while doing NUMA node hotplug, but they shall be corrected by later reads of the same files. WARNING: possible circular locking dependency detected ------------------------------------------------------ cat/5224 is trying to acquire lock: ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at: show_slab_objects+0x94/0x3a8 but task is already holding lock: b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (kn->count#45){++++}: lock_acquire+0x31c/0x360 __kernfs_remove+0x290/0x490 kernfs_remove+0x30/0x44 sysfs_remove_dir+0x70/0x88 kobject_del+0x50/0xb0 sysfs_slab_unlink+0x2c/0x38 shutdown_cache+0xa0/0xf0 kmemcg_cache_shutdown_fn+0x1c/0x34 kmemcg_workfn+0x44/0x64 process_one_work+0x4f4/0x950 worker_thread+0x390/0x4bc kthread+0x1cc/0x1e8 ret_from_fork+0x10/0x18 -> #1 (slab_mutex){+.+.}: lock_acquire+0x31c/0x360 __mutex_lock_common+0x16c/0xf78 mutex_lock_nested+0x40/0x50 memcg_create_kmem_cache+0x38/0x16c memcg_kmem_cache_create_func+0x3c/0x70 process_one_work+0x4f4/0x950 worker_thread+0x390/0x4bc kthread+0x1cc/0x1e8 ret_from_fork+0x10/0x18 -> #0 (mem_hotplug_lock.rw_sem){++++}: validate_chain+0xd10/0x2bcc __lock_acquire+0x7f4/0xb8c lock_acquire+0x31c/0x360 get_online_mems+0x54/0x150 show_slab_objects+0x94/0x3a8 total_objects_show+0x28/0x34 slab_attr_show+0x38/0x54 sysfs_kf_seq_show+0x198/0x2d4 kernfs_seq_show+0xa4/0xcc seq_read+0x30c/0x8a8 kernfs_fop_read+0xa8/0x314 __vfs_read+0x88/0x20c vfs_read+0xd8/0x10c ksys_read+0xb0/0x120 __arm64_sys_read+0x54/0x88 el0_svc_handler+0x170/0x240 el0_svc+0x8/0xc other info that might help us debug this: Chain exists of: mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kn->count#45); lock(slab_mutex); lock(kn->count#45); lock(mem_hotplug_lock.rw_sem); *** DEADLOCK *** 3 locks held by cat/5224: #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8 #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0 #2: b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0 stack backtrace: Call trace: dump_backtrace+0x0/0x248 show_stack+0x20/0x2c dump_stack+0xd0/0x140 print_circular_bug+0x368/0x380 check_noncircular+0x248/0x250 validate_chain+0xd10/0x2bcc __lock_acquire+0x7f4/0xb8c lock_acquire+0x31c/0x360 get_online_mems+0x54/0x150 show_slab_objects+0x94/0x3a8 total_objects_show+0x28/0x34 slab_attr_show+0x38/0x54 sysfs_kf_seq_show+0x198/0x2d4 kernfs_seq_show+0xa4/0xcc seq_read+0x30c/0x8a8 kernfs_fop_read+0xa8/0x314 __vfs_read+0x88/0x20c vfs_read+0xd8/0x10c ksys_read+0xb0/0x120 __arm64_sys_read+0x54/0x88 el0_svc_handler+0x170/0x240 el0_svc+0x8/0xc I think it is important to mention that this doesn't expose the show_slab_objects to use-after-free. There is only a single path that might really race here and that is the slab hotplug notifier callback __kmem_cache_shrink (via slab_mem_going_offline_callback) but that path doesn't really destroy kmem_cache_node data structures. [1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html [akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock] Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw Fixes: 01fb58bcba63 ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path") Fixes: 03afc0e25f7f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, page_owner: rename flag indicating that page is allocatedVlastimil Babka2019-10-142-7/+7
| | | | | | | | | | | | | | | | | | Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page is tracked as being allocated. Kirril suggested naming it PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat loaded term for a page". Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, page_owner: decouple freeing stack trace from debug_pageallocVlastimil Babka2019-10-142-21/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8974558f49a6 ("mm, page_owner, debug_pagealloc: save and dump freeing stack trace") enhanced page_owner to also store freeing stack trace, when debug_pagealloc is also enabled. KASAN would also like to do this [1] to improve error reports to debug e.g. UAF issues. Kirill has suggested that the freeing stack trace saving should be also possible to be enabled separately from KASAN or debug_pagealloc, i.e. with an extra boot option. Qian argued that we have enough options already, and avoiding the extra overhead is not worth the complications in the case of a debugging option. Kirill noted that the extra stack handle in struct page_owner requires 0.1% of memory. This patch therefore enables free stack saving whenever page_owner is enabled, regardless of whether debug_pagealloc or KASAN is also enabled. KASAN kernels booted with page_owner=on will thus benefit from the improved error reports. [1] https://bugzilla.kernel.org/show_bug.cgi?id=203967 [vbabka@suse.cz: v3] Link: http://lkml.kernel.org/r/20191007091808.7096-3-vbabka@suse.cz Link: http://lkml.kernel.org/r/20190930122916.14969-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Qian Cai <cai@lca.pw> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, page_owner: fix off-by-one error in __set_page_owner_handle()Vlastimil Babka2019-10-143-22/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "followups to debug_pagealloc improvements through page_owner", v3. These are followups to [1] which made it to Linus meanwhile. Patches 1 and 3 are based on Kirill's review, patch 2 on KASAN request [2]. It would be nice if all of this made it to 5.4 with [1] already there (or at least Patch 1). This patch (of 3): As noted by Kirill, commit 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage") has introduced an off-by-one error in __set_page_owner_handle() when looking up page_ext for subpages. As a result, the head page page_owner info is set twice, while for the last tail page, it's not set at all. Fix this and also make the code more efficient by advancing the page_ext pointer we already have, instead of calling lookup_page_ext() for each subpage. Since the full size of struct page_ext is not known at compile time, we can't use a simple page_ext++ statement, so introduce a page_ext_next() inline function for that. Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz Fixes: 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Reported-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kmemleak: Do not corrupt the object_list during clean-upCatalin Marinas2019-10-141-9/+21
| | | | | | | | | | | | | | | | | | | | | | In case of an error (e.g. memory pool too small), kmemleak disables itself and cleans up the already allocated metadata objects. However, if this happens early before the RCU callback mechanism is available, put_object() skips call_rcu() and frees the object directly. This is not safe with the RCU list traversal in __kmemleak_do_cleanup(). Change the list traversal in __kmemleak_do_cleanup() to list_for_each_entry_safe() and remove the rcu_read_{lock,unlock} since the kmemleak is already disabled at this point. In addition, avoid an unnecessary metadata object rb-tree look-up since it already has the struct kmemleak_object pointer. Fixes: c5665868183f ("mm: kmemleak: use the memory pool for early allocations") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reported-by: Marc Dionne <marc.c.dionne@gmail.com> Reported-by: Ted Ts'o <tytso@mit.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Linux 5.4-rc3v5.4-rc3Linus Torvalds2019-10-131-1/+1
|
* Merge tag 'trace-v5.4-rc2' of ↵Linus Torvalds2019-10-1315-132/+223
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few tracing fixes: - Remove lockdown from tracefs itself and moved it to the trace directory. Have the open functions there do the lockdown checks. - Fix a few races with opening an instance file and the instance being deleted (Discovered during the lockdown updates). Kept separate from the clean up code such that they can be backported to stable easier. - Clean up and consolidated the checks done when opening a trace file, as there were multiple checks that need to be done, and it did not make sense having them done in each open instance. - Fix a regression in the record mcount code. - Small hw_lat detector tracer fixes. - A trace_pipe read fix due to not initializing trace_seq" * tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Initialize iter->seq after zeroing in tracing_read_pipe() tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency tracing/hwlat: Report total time spent in all NMIs during the sample recordmcount: Fix nop_mcount() function tracing: Do not create tracefs files if tracefs lockdown is in effect tracing: Add locked_down checks to the open calls of files created for tracefs tracing: Add tracing_check_open_get_tr() tracing: Have trace events system open call tracing_open_generic_tr() tracing: Get trace_array reference for available_tracers files ftrace: Get a reference counter for the trace_array on filter files tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
| * tracing: Initialize iter->seq after zeroing in tracing_read_pipe()Petr Mladek2019-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A customer reported the following softlockup: [899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464] [899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4 [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] Kernel panic - not syncing: softlockup: hung tasks [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12 [899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000 [899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00 [899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000 [899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8 [899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000 [899688.160002] tracing_read_pipe+0x336/0x3c0 [899688.160002] __vfs_read+0x26/0x140 [899688.160002] vfs_read+0x87/0x130 [899688.160002] SyS_read+0x42/0x90 [899688.160002] do_syscall_64+0x74/0x160 It caught the process in the middle of trace_access_unlock(). There is no loop. So, it must be looping in the caller tracing_read_pipe() via the "waitagain" label. Crashdump analyze uncovered that iter->seq was completely zeroed at this point, including iter->seq.seq.size. It means that print_trace_line() was never able to print anything and there was no forward progress. The culprit seems to be in the code: /* reset all but tr, trace, and overruns */ memset(&iter->seq, 0, sizeof(struct trace_iterator) - offsetof(struct trace_iterator, seq)); It was added by the commit 53d0aa773053ab182877 ("ftrace: add logic to record overruns"). It was v2.6.27-rc1. It was the time when iter->seq looked like: struct trace_seq { unsigned char buffer[PAGE_SIZE]; unsigned int len; }; There was no "size" variable and zeroing was perfectly fine. The solution is to reinitialize the structure after or without zeroing. Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/hwlat: Don't ignore outer-loop duration when calculating max_latencySrivatsa S. Bhat (VMware)2019-10-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | max_latency is intended to record the maximum ever observed hardware latency, which may occur in either part of the loop (inner/outer). So we need to also consider the outer-loop sample when updating max_latency. Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer") Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/hwlat: Report total time spent in all NMIs during the sampleSrivatsa S. Bhat (VMware)2019-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nmi_total_ts is supposed to record the total time spent in *all* NMIs that occur on the given CPU during the (active portion of the) sampling window. However, the code seems to be overwriting this variable for each NMI, thereby only recording the time spent in the most recent NMI. Fix it by accumulating the duration instead. Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector") Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * recordmcount: Fix nop_mcount() functionSteven Rostedt (VMware)2019-10-121-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The removal of the longjmp code in recordmcount.c mistakenly made the return of make_nop() being negative an exit of nop_mcount(). It should not exit the routine, but instead just not process that part of the code. By exiting with an error code, it would cause the update of recordmcount to fail some files which would fail the build if ftrace function tracing was enabled. Link: http://lkml.kernel.org/r/20191009110538.5909fec6@gandalf.local.home Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Fixes: 3f1df12019f3 ("recordmcount: Rewrite error/success handling") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Do not create tracefs files if tracefs lockdown is in effectSteven Rostedt (VMware)2019-10-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | If on boot up, lockdown is activated for tracefs, don't even bother creating the files. This can also prevent instances from being created if lockdown is in effect. Link: http://lkml.kernel.org/r/CAHk-=whC6Ji=fWnjh2+eS4b15TnbsS4VPVtvBOwCy1jjEG_JHQ@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Add locked_down checks to the open calls of files created for tracefsSteven Rostedt (VMware)2019-10-1210-4/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added various checks on open tracefs calls to see if tracefs is in lockdown mode, and if so, to return -EPERM. Note, the event format files (which are basically standard on all machines) as well as the enabled_functions file (which shows what is currently being traced) are not lockde down. Perhaps they should be, but it seems counter intuitive to lockdown information to help you know if the system has been modified. Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Add tracing_check_open_get_tr()Steven Rostedt (VMware)2019-10-126-60/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, most files in the tracefs directory test if tracing_disabled is set. If so, it should return -ENODEV. The tracing_disabled is called when tracing is found to be broken. Originally it was done in case the ring buffer was found to be corrupted, and we wanted to prevent reading it from crashing the kernel. But it's also called if a tracing selftest fails on boot. It's a one way switch. That is, once it is triggered, tracing is disabled until reboot. As most tracefs files can also be used by instances in the tracefs directory, they need to be carefully done. Each instance has a trace_array associated to it, and when the instance is removed, the trace_array is freed. But if an instance is opened with a reference to the trace_array, then it requires looking up the trace_array to get its ref counter (as there could be a race with it being deleted and the open itself). Once it is found, a reference is added to prevent the instance from being removed (and the trace_array associated with it freed). Combine the two checks (tracing_disabled and trace_array_get()) into a single helper function. This will also make it easier to add lockdown to tracefs later. Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Have trace events system open call tracing_open_generic_tr()Steven Rostedt (VMware)2019-10-123-15/+5
| | | | | | | | | | | | | | | | | | | | Instead of having the trace events system open call open code the taking of the trace_array descriptor (with trace_array_get()) and then calling trace_open_generic(), have it use the tracing_open_generic_tr() that does the combination of the two. This requires making tracing_open_generic_tr() global. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Get trace_array reference for available_tracers filesSteven Rostedt (VMware)2019-10-121-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | As instances may have different tracers available, we need to look at the trace_array descriptor that shows the list of the available tracers for the instance. But there's a race between opening the file and an admin deleting the instance. The trace_array_get() needs to be called before accessing the trace_array. Cc: stable@vger.kernel.org Fixes: 607e2ea167e56 ("tracing: Set up infrastructure to allow tracers for instances") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * ftrace: Get a reference counter for the trace_array on filter filesSteven Rostedt (VMware)2019-10-121-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for an instance now. They need to take a reference to the instance otherwise there could be a race between accessing the files and deleting the instance. It wasn't until the :mod: caching where these file operations started referencing the trace_array directly. Cc: stable@vger.kernel.org Fixes: 673feb9d76ab3 ("ftrace: Add :mod: caching infrastructure to trace_array") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is ↵Steven Rostedt (VMware)2019-10-121-41/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | locked down") Running the latest kernel through my "make instances" stress tests, I triggered the following bug (with KASAN and kmemleak enabled): mkdir invoked oom-killer: gfp_mask=0x40cd0(GFP_KERNEL|__GFP_COMP|__GFP_RECLAIMABLE), order=0, oom_score_adj=0 CPU: 1 PID: 2229 Comm: mkdir Not tainted 5.4.0-rc2-test #325 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 Call Trace: dump_stack+0x64/0x8c dump_header+0x43/0x3b7 ? trace_hardirqs_on+0x48/0x4a oom_kill_process+0x68/0x2d5 out_of_memory+0x2aa/0x2d0 __alloc_pages_nodemask+0x96d/0xb67 __alloc_pages_node+0x19/0x1e alloc_slab_page+0x17/0x45 new_slab+0xd0/0x234 ___slab_alloc.constprop.86+0x18f/0x336 ? alloc_inode+0x2c/0x74 ? irq_trace+0x12/0x1e ? tracer_hardirqs_off+0x1d/0xd7 ? __slab_alloc.constprop.85+0x21/0x53 __slab_alloc.constprop.85+0x31/0x53 ? __slab_alloc.constprop.85+0x31/0x53 ? alloc_inode+0x2c/0x74 kmem_cache_alloc+0x50/0x179 ? alloc_inode+0x2c/0x74 alloc_inode+0x2c/0x74 new_inode_pseudo+0xf/0x48 new_inode+0x15/0x25 tracefs_get_inode+0x23/0x7c ? lookup_one_len+0x54/0x6c tracefs_create_file+0x53/0x11d trace_create_file+0x15/0x33 event_create_dir+0x2a3/0x34b __trace_add_new_event+0x1c/0x26 event_trace_add_tracer+0x56/0x86 trace_array_create+0x13e/0x1e1 instance_mkdir+0x8/0x17 tracefs_syscall_mkdir+0x39/0x50 ? get_dname+0x31/0x31 vfs_mkdir+0x78/0xa3 do_mkdirat+0x71/0xb0 sys_mkdir+0x19/0x1b do_fast_syscall_32+0xb0/0xed I bisected this down to the addition of the proxy_ops into tracefs for lockdown. It appears that the allocation of the proxy_ops and then freeing it in the destroy_inode callback, is causing havoc with the memory system. Reading the documentation about destroy_inode and talking with Linus about this, this is buggy and wrong. When defining the destroy_inode() method, it is expected that the destroy_inode() will also free the inode, and not just the extra allocations done in the creation of the inode. The faulty commit causes a memory leak of the inode data structure when they are deleted. Instead of allocating the proxy_ops (and then having to free it) the checks should be done by the open functions themselves, and not hack into the tracefs directory. First revert the tracefs updates for locked_down and then later we can add the locked_down checks in the kernel/trace files. Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home Fixes: ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | Merge tag 'hwmon-for-v5.4-rc3' of ↵Linus Torvalds2019-10-135-9/+47
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Update/fix inspur-ipsps1 and k10temp Documentation - Fix nct7904 driver - Fix HWMON_P_MIN_ALARM mask in hwmon core * tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: docs: Extend inspur-ipsps1 title underline hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct. docs: hwmon: Include 'inspur-ipsps1.rst' into docs hwmon: Fix HWMON_P_MIN_ALARM mask hwmon: (k10temp) Update documentation and add temp2_input info hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
| * | hwmon: docs: Extend inspur-ipsps1 title underlineAdam Zerella2019-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Sphinx is generating a build warning as the title underline of this file is too short. Signed-off-by: Adam Zerella <adam.zerella@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
| * | hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in ↵amy.shih2019-10-021-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nct7904_data struct. SMI# interrupt for fan and voltage is Two-Times Interrupt Mode. Fan or voltage exceeds high limit or going below low limit, it will causes an interrupt if the previous interrupt has been reset by reading all the interrupt Status Register. Thus, add the array fan_alarm and vsen_alarm to store the alarms for all of the fan and voltage sensors. Signed-off-by: amy.shih <amy.shih@advantech.com.tw> Link: https://lore.kernel.org/r/20190919030205.11440-1-Amy.Shih@advantech.com.tw Fixes: 486842db3b79 ("hwmon: (nct7904) Add extra sysfs support for fan, voltage and temperature.") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
| * | docs: hwmon: Include 'inspur-ipsps1.rst' into docsAdam Zerella2019-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When generating documentation output Sphinx outputs a warning for not including the page 'inspur-ipsps1.rst' in 'index.rst'. Assuming this documentation is useful it should be included in the index. Signed-off-by: Adam Zerella <adam.zerella@gmail.com> Link: https://lore.kernel.org/r/20190925131715.GB19073@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
| * | hwmon: Fix HWMON_P_MIN_ALARM maskNuno Sá2019-10-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both HWMON_P_MIN_ALARM and HWMON_P_MAX_ALARM were using BIT(hwmon_power_max_alarm). Fixes: aa7f29b07c870 ("hwmon: Add support for power min, lcrit, min_alarm and lcrit_alarm") CC: <stable@vger.kernel.org> Signed-off-by: Nuno Sá <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20190924124945.491326-2-nuno.sa@analog.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
| * | hwmon: (k10temp) Update documentation and add temp2_input infoLukas Zapletal2019-10-021-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's been a while since the k10temp documentation has been updated. There are new CPU families supported as well as Tdie temp was added. This patch adds all missing families which I was able to find from git history and provides more info about Tctl vs Tdie exported temps. Signed-off-by: Lukas Zapletal <lzap+git@redhat.com> Link: https://lore.kernel.org/r/20190923105931.27881-1-lzap+git@redhat.com [groeck: Formatting] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
| * | hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data structamy.shih2019-10-021-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Voltage sensors overlap with external temperature sensors. Detect the multi-function of voltage, thermal diode and thermistor from register VT_ADC_MD_REG to set value of vsen_mask in nct7904_data struct. Signed-off-by: amy.shih <amy.shih@advantech.com.tw> Link: https://lore.kernel.org/r/20190918084801.9859-1-Amy.Shih@advantech.com.tw Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* | | Merge tag 'fixes-for-5.4-rc3' of ↵Linus Torvalds2019-10-132-4/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD fixes from Richard Weinberger: "Two fixes for MTD: - spi-nor: Fix for a regression in write_sr() - rawnand: Regression fix for the au1550nd driver" * tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: au1550nd: Fix au_read_buf16() prototype mtd: spi-nor: Fix direction of the write_sr() transfer
| * | | mtd: rawnand: au1550nd: Fix au_read_buf16() prototypePaul Burton2019-10-071-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() hooks") modified the prototype of the struct nand_chip read_buf function pointer. In the au1550nd driver we have 2 implementations of read_buf. The previously mentioned commit modified the au_read_buf() implementation to match the function pointer, but not au_read_buf16(). This results in a compiler warning for MIPS db1xxx_defconfig builds: drivers/mtd/nand/raw/au1550nd.c:443:57: warning: pointer type mismatch in conditional expression Fix this by updating the prototype of au_read_buf16() to take a struct nand_chip pointer as its first argument, as is expected after commit 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() hooks"). Note that this shouldn't have caused any functional issues at runtime, since the offset of the struct mtd_info within struct nand_chip is 0 making mtd_to_nand() effectively a type-cast. Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() hooks") Cc: stable@vger.kernel.org # v4.20+ Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
| * | | mtd: spi-nor: Fix direction of the write_sr() transferTudor Ambarus2019-10-041-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | write_sr() sends data to the SPI memory, fix the direction. Fixes: b35b9a10362d ("mtd: spi-nor: Move m25p80 code in spi-nor.c") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Tested-by: John Garry <john.garry@huawei.com> Acked-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
* | | Merge tag 'for-linus-20191012' of git://git.kernel.dk/linux-blockLinus Torvalds2019-10-131-17/+20
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fix from Jens Axboe: "Single small fix for a regression in the sequence logic for linked commands" * tag 'for-linus-20191012' of git://git.kernel.dk/linux-block: io_uring: fix sequence logic for timeout requests
| * | | io_uring: fix sequence logic for timeout requestsJens Axboe2019-10-101-17/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have two ways a request can be deferred: 1) It's a regular request that depends on another one 2) It's a timeout that tracks completions We have a shared helper to determine whether to defer, and that attempts to make the right decision based on the request. But we only have some of this information in the caller. Un-share the two timeout/defer helpers so the caller can use the right one. Fixes: 5262f567987d ("io_uring: IORING_OP_TIMEOUT support") Reported-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge tag 'char-misc-5.4-rc3' of ↵Linus Torvalds2019-10-1213-18/+60
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc driver fixes for 5.4-rc3. Nothing huge here. Some binder driver fixes (although it is still being discussed if these all fix the reported issues or not, so more might be coming later), some mei device ids and fixes, and a google firmware driver bugfix that fixes a regression, as well as some other tiny fixes. All have been in linux-next with no reported issues" * tag 'char-misc-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: firmware: google: increment VPD key_len properly w1: ds250x: Fix build error without CRC16 virt: vbox: fix memory leak in hgcm_call_preprocess_linaddr binder: Fix comment headers on binder_alloc_prepare_to_free() binder: prevent UAF read in print_binder_transaction_log_entry() misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach mei: avoid FW version request on Ibex Peak and earlier mei: me: add comet point (lake) LP device ids
| * | | | firmware: google: increment VPD key_len properlyBrian Norris2019-10-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4b708b7b1a2c ("firmware: google: check if size is valid when decoding VPD data") adds length checks, but the new vpd_decode_entry() function botched the logic -- it adds the key length twice, instead of adding the key and value lengths separately. On my local system, this means vpd.c's vpd_section_create_attribs() hits an error case after the first attribute it parses, since it's no longer looking at the correct offset. With this patch, I'm back to seeing all the correct attributes in /sys/firmware/vpd/... Fixes: 4b708b7b1a2c ("firmware: google: check if size is valid when decoding VPD data") Cc: <stable@vger.kernel.org> Cc: Hung-Te Lin <hungte@chromium.org> Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Link: https://lore.kernel.org/r/20190930214522.240680-1-briannorris@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | w1: ds250x: Fix build error without CRC16YueHaibing2019-10-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CRC16 is not set, building will fails: drivers/w1/slaves/w1_ds250x.o: In function `w1_ds2505_read_page': w1_ds250x.c:(.text+0x82f): undefined reference to `crc16' w1_ds250x.c:(.text+0x90a): undefined reference to `crc16' w1_ds250x.c:(.text+0x91a): undefined reference to `crc16' Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 25ec8710d9c2 ("w1: add DS2501, DS2502, DS2505 EPROM device driver") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20190920060318.35020-1-yuehaibing@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | virt: vbox: fix memory leak in hgcm_call_preprocess_linaddrNavid Emamdoost2019-10-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In hgcm_call_preprocess_linaddr memory is allocated for bounce_buf but is not released if copy_form_user fails. In order to prevent memory leak in case of failure, the assignment to bounce_buf_ret is moved before the error check. This way the allocated bounce_buf will be released by the caller. Fixes: 579db9d45cb4 ("virt: Add vboxguest VMMDEV communication code") Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20190930204223.3660-1-navid.emamdoost@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | binder: Fix comment headers on binder_alloc_prepare_to_free()Joel Fernandes (Google)2019-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | binder_alloc_buffer_lookup() doesn't exist and is named "binder_alloc_prepare_to_free()". Correct the code comments to reflect this. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20190930201250.139554-1-joel@joelfernandes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | binder: prevent UAF read in print_binder_transaction_log_entry()Christian Brauner2019-10-102-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a binder transaction is initiated on a binder device coming from a binderfs instance, a pointer to the name of the binder device is stashed in the binder_transaction_log_entry's context_name member. Later on it is used to print the name in print_binder_transaction_log_entry(). By the time print_binder_transaction_log_entry() accesses context_name binderfs_evict_inode() might have already freed the associated memory thereby causing a UAF. Do the simple thing and prevent this by copying the name of the binder device instead of stashing a pointer to it. Reported-by: Jann Horn <jannh@google.com> Fixes: 03e2e07e3814 ("binder: Make transaction_log available in binderfs") Link: https://lore.kernel.org/r/CAG48ez14Q0-F8LqsvcNbyR2o6gPW8SHXsm4u5jmD9MpsteM2Tw@mail.gmail.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Todd Kjos <tkjos@google.com> Reviewed-by: Hridya Valsaraju <hridya@google.com> Link: https://lore.kernel.org/r/20191008130159.10161-1-christian.brauner@ubuntu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attachNavid Emamdoost2019-10-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In fastrpc_dma_buf_attach if dma_get_sgtable fails the allocated memory for a should be released. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Link: https://lore.kernel.org/r/20190925152742.16258-1-navid.emamdoost@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | mei: avoid FW version request on Ibex Peak and earlierAlexander Usyskin2019-10-045-13/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fixed MKHI client on PCH 6 gen platforms does not support fw version retrieval. The error is not fatal, but it fills up the kernel logs and slows down the driver start. This patch disables requesting FW version on GEN6 and earlier platforms. Fixes warning: [ 15.964298] mei mei::55213584-9a29-4916-badf-0fb7ed682aeb:01: Could not read FW version [ 15.964301] mei mei::55213584-9a29-4916-badf-0fb7ed682aeb:01: version command failed -5 Cc: <stable@vger.kernel.org> +v4.18 Cc: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20191004181722.31374-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | mei: me: add comet point (lake) LP device idsTomas Winkler2019-10-042-0/+6
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Comet Point devices IDs for Comet Lake U platforms. Cc: <stable@vger.kernel.org> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20191001235958.19979-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | Merge tag 'staging-5.4-rc3' of ↵Linus Torvalds2019-10-1238-2333/+695
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO driver fixes from Greg KH: "Here are some staging and IIO driver fixes for 5.4-rc3. The "biggest" thing here is a removal of the fbtft device and flexfb code as they have been abandoned by their authors and are no longer needed for that hardware. Other than that, the usual amount of staging driver and iio driver fixes for reported issues, and some speakup sysfs file documentation, which has been long awaited for. All have been in linux-next with no reported issues" * tag 'staging-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (32 commits) iio: Fix an undefied reference error in noa1305_probe iio: light: opt3001: fix mutex unlock race iio: adc: ad799x: fix probe error handling iio: light: add missing vcnl4040 of_compatible iio: light: fix vcnl4000 devicetree hooks iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller iio: adc: axp288: Override TS pin bias current for some models iio: imu: adis16400: fix memory leak iio: imu: adis16400: release allocated memory on failure iio: adc: stm32-adc: fix a race when using several adcs with dma and irq iio: adc: stm32-adc: move registers definitions iio: accel: adxl372: Perform a reset at start up iio: accel: adxl372: Fix push to buffers lost samples iio: accel: adxl372: Fix/remove limitation for FIFO samples iio: adc: hx711: fix bug in sampling of data staging: vt6655: Fix memory leak in vt6655_probe staging: exfat: Use kvzalloc() instead of kzalloc() for exfat_sb_info Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc staging: speakup: document sysfs attributes staging: rtl8188eu: fix HighestRate check in odm_ARFBRefresh_8188E() ...
| * \ \ \ Merge tag 'iio-fixes-for-5.4a' of ↵Greg Kroah-Hartman2019-10-1016-182/+290
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: First set of IIO fixes for the 5.4 cycle. * adis16400 - Make sure to free memory on a few failure paths. * adxl372 - Fix wrong fifo depth - Fix wrong indexing of data from the fifo. - Perform a reset at startup to avoid a problem with inconsistent state. * axp288 - This is a fix for a fix. The original fix made sure we kept the configuration from some firmwares to preserve a bias current. Unfortunately it appears the previous behaviour was working around a buggy firmware by overwriting the wrong value it had. Hence a regression was seen. * bmc150 - Fix the centre temperature. This was due to an error in one of the datasheets. * hx711 - Fix an issue where a badly timed interrupt could lead to a control line being high long enough to put the device into a low power state. * meson_sar_adc - Fix a case where the irq was enabled before everything it uses was allocated. * st_lsm6dsx - Ensure we don't set the sensor sensitivity to 0 as it will force all readings to 0. - Fix a wait time for the slave i2c controller when the accelerometer is not enabled. * stm32-adc - Precursor for fix. Move a set of register definitions to a header. - Fix a race when several ADCs are in use with some using interrupts to control the dataflow and some using DMA. * vcnl4000 - Fix a garbage of_match_table in which a string was passed instead of the intended enum. * tag 'iio-fixes-for-5.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: iio: Fix an undefied reference error in noa1305_probe iio: light: opt3001: fix mutex unlock race iio: adc: ad799x: fix probe error handling iio: light: add missing vcnl4040 of_compatible iio: light: fix vcnl4000 devicetree hooks iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller iio: adc: axp288: Override TS pin bias current for some models iio: imu: adis16400: fix memory leak iio: imu: adis16400: release allocated memory on failure iio: adc: stm32-adc: fix a race when using several adcs with dma and irq iio: adc: stm32-adc: move registers definitions iio: accel: adxl372: Perform a reset at start up iio: accel: adxl372: Fix push to buffers lost samples iio: accel: adxl372: Fix/remove limitation for FIFO samples iio: adc: hx711: fix bug in sampling of data iio: fix center temperature of bmc150-accel-core iio: imu: st_lsm6dsx: forbid 0 sensor sensitivity iio: adc: meson_saradc: Fix memory allocation order
| | * | | | iio: Fix an undefied reference error in noa1305_probezhong jiang2019-10-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I hit the following error when compile the kernel. drivers/iio/light/noa1305.o: In function `noa1305_probe': noa1305.c:(.text+0x65): undefined reference to `__devm_regmap_init_i2c' make: *** [vmlinux] Error 1 Signed-off-by: zhong jiang <zhongjiang@huawei.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
| | * | | | iio: light: opt3001: fix mutex unlock raceDavid Frey2019-10-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an end-of-conversion interrupt is received after performing a single-shot reading of the light sensor, the driver was waking up the result ready queue before checking opt->ok_to_ignore_lock to determine if it should unlock the mutex. The problem occurred in the case where the other thread woke up and changed the value of opt->ok_to_ignore_lock to false prior to the interrupt thread performing its read of the variable. In this case, the mutex would be unlocked twice. Signed-off-by: David Frey <dpfrey@gmail.com> Reviewed-by: Andreas Dannenberg <dannenberg@ti.com> Fixes: 94a9b7b1809f ("iio: light: add support for TI's opt3001 light sensor") Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>