summaryrefslogtreecommitdiffstats
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* netlink: prevent potential spectre v1 gadgetsEric Dumazet2023-02-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f0950402e8c76e7dcb08563f1b4e8000fbc62455 ] Most netlink attributes are parsed and validated from __nla_validate_parse() or validate_nla() u16 type = nla_type(nla); if (type == 0 || type > maxtype) { /* error or continue */ } @type is then used as an array index and can be used as a Spectre v1 gadget. array_index_nospec() can be used to prevent leaking content of kernel memory to malicious users. This should take care of vast majority of netlink uses, but an audit is needed to take care of others where validation is not yet centralized in core netlink functions. Fixes: bfa83a9e03cf ("[NETLINK]: Type-safe netlink messages/attributes interface") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230119110150.2678537-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lockref: stop doing cpu_relax in the cmpxchg loopMateusz Guzik2023-02-061-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f5fe24ef17b5fbe6db49534163e77499fb10ae8c ] On the x86-64 architecture even a failing cmpxchg grants exclusive access to the cacheline, making it preferable to retry the failed op immediately instead of stalling with the pause instruction. To illustrate the impact, below are benchmark results obtained by running various will-it-scale tests on top of the 6.2-rc3 kernel and Cascade Lake (2 sockets * 24 cores * 2 threads) CPU. All results in ops/s. Note there is some variance in re-runs, but the code is consistently faster when contention is present. open3 ("Same file open/close"): proc stock no-pause 1 805603 814942 (+%1) 2 1054980 1054781 (-0%) 8 1544802 1822858 (+18%) 24 1191064 2199665 (+84%) 48 851582 1469860 (+72%) 96 609481 1427170 (+134%) fstat2 ("Same file fstat"): proc stock no-pause 1 3013872 3047636 (+1%) 2 4284687 4400421 (+2%) 8 3257721 5530156 (+69%) 24 2239819 5466127 (+144%) 48 1701072 5256609 (+209%) 96 1269157 6649326 (+423%) Additionally, a kernel with a private patch to help access() scalability: access2 ("Same file access"): proc stock patched patched +nopause 24 2378041 2005501 5370335 (-15% / +125%) That is, fixing the problems in access itself *reduces* scalability after the cacheline ping-pong only happens in lockref with the pause instruction. Note that fstat and access benchmarks are not currently integrated into will-it-scale, but interested parties can find them in pull requests to said project. Code at hand has a rather tortured history. First modification showed up in commit d472d9d98b46 ("lockref: Relax in cmpxchg loop"), written with Itanium in mind. Later it got patched up to use an arch-dependent macro to stop doing it on s390 where it caused a significant regression. Said macro had undergone revisions and was ultimately eliminated later, going back to cpu_relax. While I intended to only remove cpu_relax for x86-64, I got the following comment from Linus: I would actually prefer just removing it entirely and see if somebody else hollers. You have the numbers to prove it hurts on real hardware, and I don't think we have any numbers to the contrary. So I think it's better to trust the numbers and remove it as a failure, than say "let's just remove it on x86-64 and leave everybody else with the potentially broken code" Additionally, Will Deacon (maintainer of the arm64 port, one of the architectures previously benchmarked): So, from the arm64 side of the fence, I'm perfectly happy just removing the cpu_relax() calls from lockref. As such, come back full circle in history and whack it altogether. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=WdA@mail.gmail.com/ Acked-by: Tony Luck <tony.luck@intel.com> # ia64 Acked-by: Nicholas Piggin <npiggin@gmail.com> # powerpc Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mm/highmem: Lift memcpy_[to|from]_page to coreIra Weiny2023-01-181-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit bb90d4bc7b6a536b2e4db45f4763e467c2008251 ] Working through a conversion to a call kmap_local_page() instead of kmap() revealed many places where the pattern kmap/memcpy/kunmap occurred. Eric Biggers, Matthew Wilcox, Christoph Hellwig, Dan Williams, and Al Viro all suggested putting this code into helper functions. Al Viro further pointed out that these functions already existed in the iov_iter code.[1] Various locations for the lifted functions were considered. Headers like mm.h or string.h seem ok but don't really portray the functionality well. pagemap.h made some sense but is for page cache functionality.[2] Another alternative would be to create a new header for the promoted memcpy functions, but it masks the fact that these are designed to copy to/from pages using the kernel direct mappings and complicates matters with a new header. Placing these functions in 'highmem.h' is suboptimal especially with the changes being proposed in the functionality of kmap. From a caller perspective including/using 'highmem.h' implies that the functions defined in that header are only required when highmem is in use which is increasingly not the case with modern processors. However, highmem.h is where all the current functions like this reside (zero_user(), clear_highpage(), clear_user_highpage(), copy_user_highpage(), and copy_highpage()). So it makes the most sense even though it is distasteful for some.[3] Lift memcpy_to_page() and memcpy_from_page() to pagemap.h. [1] https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/ https://lore.kernel.org/lkml/20201013112544.GA5249@infradead.org/ [2] https://lore.kernel.org/lkml/20201208122316.GH7338@casper.infradead.org/ [3] https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/#t https://lore.kernel.org/lkml/20201208163814.GN1563847@iweiny-DESK2.sc.intel.com/ Cc: Boris Pismenny <borisp@mellanox.com> Cc: Or Gerlitz <gerlitz.or@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 956510c0c743 ("fs: ext4: initialize fsdata in pagecache_write()") Signed-off-by: Sasha Levin <sashal@kernel.org>
* test_firmware: fix memory leak in test_firmware_init()Zhengchao Shao2023-01-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7610615e8cdb3f6f5bbd9d8e7a5d8a63e3cabf2e ] When misc_register() failed in test_firmware_init(), the memory pointed by test_fw_config->name is not released. The memory leak information is as follows: unreferenced object 0xffff88810a34cb00 (size 32): comm "insmod", pid 7952, jiffies 4294948236 (age 49.060s) hex dump (first 32 bytes): 74 65 73 74 2d 66 69 72 6d 77 61 72 65 2e 62 69 test-firmware.bi 6e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 n............... backtrace: [<ffffffff81b21fcb>] __kmalloc_node_track_caller+0x4b/0xc0 [<ffffffff81affb96>] kstrndup+0x46/0xc0 [<ffffffffa0403a49>] __test_firmware_config_init+0x29/0x380 [test_firmware] [<ffffffffa040f068>] 0xffffffffa040f068 [<ffffffff81002c41>] do_one_initcall+0x141/0x780 [<ffffffff816a72c3>] do_init_module+0x1c3/0x630 [<ffffffff816adb9e>] load_module+0x623e/0x76a0 [<ffffffff816af471>] __do_sys_finit_module+0x181/0x240 [<ffffffff89978f99>] do_syscall_64+0x39/0xb0 [<ffffffff89a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20221119035721.18268-1-shaozhengchao@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/notifier-error-inject: fix error when writing -errno to debugfs fileAkinobu Mita2023-01-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f883c3edd2c432a2931ec8773c70a570115a50fe ] The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"). This restores the previous behaviour by using newly introduced DEFINE_SIMPLE_ATTRIBUTE_SIGNED instead of DEFINE_SIMPLE_ATTRIBUTE. Link: https://lkml.kernel.org/r/20220919172418.45257-3-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/fonts: fix undefined behavior in bit shift for get_default_fontGaosheng Cui2023-01-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6fe888c4d2fb174408e4540bb2d5602b9f507f90 ] Shifting signed 32-bit value by 31 bits is undefined, so changing significant bit to unsigned. The UBSAN warning calltrace like below: UBSAN: shift-out-of-bounds in lib/fonts/fonts.c:139:20 left shift of 1 by 31 places cannot be represented in type 'int' <TASK> dump_stack_lvl+0x7d/0xa5 dump_stack+0x15/0x1b ubsan_epilogue+0xe/0x4e __ubsan_handle_shift_out_of_bounds+0x1e7/0x20c get_default_font+0x1c7/0x1f0 fbcon_startup+0x347/0x3a0 do_take_over_console+0xce/0x270 do_fbcon_takeover+0xa1/0x170 do_fb_registered+0x2a8/0x340 fbcon_fb_registered+0x47/0xe0 register_framebuffer+0x294/0x4a0 __drm_fb_helper_initial_config_and_unlock+0x43c/0x880 [drm_kms_helper] drm_fb_helper_initial_config+0x52/0x80 [drm_kms_helper] drm_fbdev_client_hotplug+0x156/0x1b0 [drm_kms_helper] drm_fbdev_generic_setup+0xfc/0x290 [drm_kms_helper] bochs_pci_probe+0x6ca/0x772 [bochs] local_pci_probe+0x4d/0xb0 pci_device_probe+0x119/0x320 really_probe+0x181/0x550 __driver_probe_device+0xc6/0x220 driver_probe_device+0x32/0x100 __driver_attach+0x195/0x200 bus_for_each_dev+0xbb/0x120 driver_attach+0x27/0x30 bus_add_driver+0x22e/0x2f0 driver_register+0xa9/0x190 __pci_register_driver+0x90/0xa0 bochs_pci_driver_init+0x52/0x1000 [bochs] do_one_initcall+0x76/0x430 do_init_module+0x61/0x28a load_module+0x1f82/0x2e50 __do_sys_finit_module+0xf8/0x190 __x64_sys_finit_module+0x23/0x30 do_syscall_64+0x58/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Link: https://lkml.kernel.org/r/20221031113829.4183153-1-cuigaosheng1@huawei.com Fixes: c81f717cb9e0 ("fbcon: Fix typo and bogus logic in get_default_font") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabledLee Jones2022-12-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 152fe65f300e1819d59b80477d3e0999b4d5d7d2 ] When enabled, KASAN enlarges function's stack-frames. Pushing quite a few over the current threshold. This can mainly be seen on 32-bit architectures where the present limit (when !GCC) is a lowly 1024-Bytes. Link: https://lkml.kernel.org/r/20221125120750.3537134-3-lee@kernel.org Signed-off-by: Lee Jones <lee@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* parisc: Increase FRAME_WARN to 2048 bytes on pariscHelge Deller2022-12-081-2/+3
| | | | | | | | | | | | | [ Upstream commit 8d192bec534bd5b778135769a12e5f04580771f7 ] PA-RISC uses a much bigger frame size for functions than other architectures. So increase it to 2048 for 32- and 64-bit kernels. This fixes e.g. a warning in lib/xxhash.c. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Helge Deller <deller@gmx.de> Stable-dep-of: 152fe65f300e ("Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled") Signed-off-by: Sasha Levin <sashal@kernel.org>
* xtensa: increase size of gcc stack frame checkGuenter Roeck2022-12-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 867050247e295cf20fce046a92a7e6491fcfe066 ] xtensa frame size is larger than the frame size for almost all other architectures. This results in more than 50 "the frame size of <n> is larger than 1024 bytes" errors when trying to build xtensa:allmodconfig. Increase frame size for xtensa to 1536 bytes to avoid compile errors due to frame size limits. Link: https://lkml.kernel.org/r/20210912025235.3514761-1-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: 152fe65f300e ("Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled") Signed-off-by: Sasha Levin <sashal@kernel.org>
* parisc: Increase size of gcc stack frame checkHelge Deller2022-12-081-1/+1
| | | | | | | | | | | | | | [ Upstream commit 55b70eed81cba1331773d4aaf5cba2bb07475cd8 ] parisc uses much bigger frames than other architectures, so increase the stack frame check value to avoid compiler warnings. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Helge Deller <deller@gmx.de> Stable-dep-of: 152fe65f300e ("Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled") Signed-off-by: Sasha Levin <sashal@kernel.org>
* error-injection: Add prompt for function error injectionSteven Rostedt (Google)2022-12-081-1/+7
| | | | | | | | | | | | | | | | | | | | | | | commit a4412fdd49dc011bcc2c0d81ac4cab7457092650 upstream. The config to be able to inject error codes into any function annotated with ALLOW_ERROR_INJECTION() is enabled when FUNCTION_ERROR_INJECTION is enabled. But unfortunately, this is always enabled on x86 when KPROBES is enabled, and there's no way to turn it off. As kprobes is useful for observability of the kernel, it is useful to have it enabled in production environments. But error injection should be avoided. Add a prompt to the config to allow it to be disabled even when kprobes is enabled, and get rid of the "def_bool y". This is a kernel debug feature (it's in Kconfig.debug), and should have never been something enabled by default. Cc: stable@vger.kernel.org Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lib/vdso: use "grep -E" instead of "egrep"Greg Kroah-Hartman2022-12-081-1/+1
| | | | | | | | | | | | | | | commit 8ac3b5cd3e0521d92f9755e90d140382fc292510 upstream. The latest version of grep claims the egrep is now obsolete so the build now contains warnings that look like: egrep: warning: egrep is obsolescent; using grep -E fix this up by moving the vdso Makefile to use "grep -E" instead. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20220920170633.3133829-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dyndbg: let query-modname override actual module nameJim Cromie2022-10-261-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e75ef56f74965f426dd819a41336b640ffdd8fbc ] dyndbg's control-parser: ddebug_parse_query(), requires that search terms: module, func, file, lineno, are used only once in a query; a thing cannot be named both foo and bar. The cited commit added an overriding module modname, taken from the module loader, which is authoritative. So it set query.module 1st, which disallowed its use in the query-string. But now, its useful to allow a module-load to enable classes across a whole (or part of) a subsystem at once. # enable (dynamic-debug in) drm only modprobe drm dyndbg="class DRM_UT_CORE +p" # get drm_helper too modprobe drm dyndbg="class DRM_UT_CORE module drm* +p" # get everything that knows DRM_UT_CORE modprobe drm dyndbg="class DRM_UT_CORE module * +p" # also for boot-args: drm.dyndbg="class DRM_UT_CORE module * +p" So convert the override into a default, by filling it only when/after the query-string omitted the module. NB: the query class FOO handling is forthcoming. Fixes: 8e59b5cfb9a6 dynamic_debug: add modname arg to exec_query callchain Acked-by: Jason Baron <jbaron@akamai.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20220904214134.408619-8-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* once: add DO_ONCE_SLOW() for sleepable contextsEric Dumazet2022-10-261-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 62c07983bef9d3e78e71189441e1a470f0d1e653 ] Christophe Leroy reported a ~80ms latency spike happening at first TCP connect() time. This is because __inet_hash_connect() uses get_random_once() to populate a perturbation table which became quite big after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") get_random_once() uses DO_ONCE(), which block hard irqs for the duration of the operation. This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock for operations where we prefer to stay in process context. Then __inet_hash_connect() can use get_random_slow_once() to populate its perturbation table. Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time") Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/vdso: Mark do_hres() and do_coarse() as __always_inlineAndrei Vagin2022-09-051-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c966533f8c6c45f93c52599f8460e7695f0b7eaa ] Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (more clock_gettime() cycles - the better): clock | before | after | diff ---------------------------------------------------------- monotonic | 153222105 | 166775025 | 8.8% monotonic-coarse | 671557054 | 691513017 | 3.0% monotonic-raw | 147116067 | 161057395 | 9.5% boottime | 153446224 | 166962668 | 9.1% The improvement for arm64 for monotonic and boottime is around 3.5%. clock | before | after | diff ================================================== monotonic 17326692 17951770 3.6% monotonic-coarse 43624027 44215292 1.3% monotonic-raw 17541809 17554932 0.1% boottime 17334982 17954361 3.5% [ tglx: Avoid the goto ] Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-3-dima@arista.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/vdso: Let do_coarse() return 0 to simplify the callsiteChristophe Leroy2022-09-051-7/+8
| | | | | | | | | | | | | | [ Upstream commit 8463cf80529d0fd80b84cd5ab8b9b952b01c7eb9 ] do_coarse() is similar to do_hres() except that it never fails. Change its type to int instead of void and let it always return success (0) to simplify the call site. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/21e8afa38c02ca8672c2690307383507fe63b454.1577111367.git.christophe.leroy@c-s.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
* ratelimit: Fix data-races in ___ratelimit().Kuniyuki Iwashima2022-09-051-3/+9
| | | | | | | | | | | | | [ Upstream commit 6bae8ceb90ba76cdba39496db936164fa672b9be ] While reading rs->interval and rs->burst, they can be changed concurrently via sysctl (e.g. net_ratelimit_state). Thus, we need to add READ_ONCE() to their readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/list_debug.c: Detect uninitialized listsGuenter Roeck2022-08-251-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0cc011c576aaa4de505046f7a6c90933d7c749a9 ] In some circumstances, attempts are made to add entries to or to remove entries from an uninitialized list. A prime example is amdgpu_bo_vm_destroy(): It is indirectly called from ttm_bo_init_reserved() if that function fails, and tries to remove an entry from a list. However, that list is only initialized in amdgpu_bo_create_vm() after the call to ttm_bo_init_reserved() returned success. This results in crashes such as BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 1479 Comm: chrome Not tainted 5.10.110-15768-g29a72e65dae5 Hardware name: Google Grunt/Grunt, BIOS Google_Grunt.11031.149.0 07/15/2020 RIP: 0010:__list_del_entry_valid+0x26/0x7d ... Call Trace: amdgpu_bo_vm_destroy+0x48/0x8b ttm_bo_init_reserved+0x1d7/0x1e0 amdgpu_bo_create+0x212/0x476 ? amdgpu_bo_user_destroy+0x23/0x23 ? kmem_cache_alloc+0x60/0x271 amdgpu_bo_create_vm+0x40/0x7d amdgpu_vm_pt_create+0xe8/0x24b ... Check if the list's prev and next pointers are NULL to catch such problems. Link: https://lkml.kernel.org/r/20220531222951.92073-1-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* locking/refcount: Consolidate implementations of refcount_tWill Deacon2022-07-291-1/+1
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit fb041bb7c0a918b95c6889fc965cdc4a75b4c0ca ] The generic implementation of refcount_t should be good enough for everybody, so remove ARCH_HAS_REFCOUNT and REFCOUNT_FULL entirely, leaving the generic implementation enabled unconditionally. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191121115902.2551-9-will@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* locking/refcount: Move saturation warnings out of lineWill Deacon2022-07-291-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1eb085d94256aaa69b00cf5a86e3c5f5bb2bc460 ] Having the refcount saturation and warnings inline bloats the text, despite the fact that these paths should never be executed in normal operation. Move the refcount saturation and warnings out of line to reduce the image size when refcount_t checking is enabled. Relative to an x86_64 defconfig, the sizes reported by bloat-o-meter are: # defconfig+REFCOUNT_FULL, inline saturation (i.e. before this patch) Total: Before=14762076, After=14915442, chg +1.04% # defconfig+REFCOUNT_FULL, out-of-line saturation (i.e. after this patch) Total: Before=14762076, After=14835497, chg +0.50% A side-effect of this change is that we now only get one warning per refcount saturation type, rather than one per problematic call-site. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191121115902.2551-7-will@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the ↵Will Deacon2022-07-291-237/+1
| | | | | | | | | | | | | | | | | | | | | | | | <linux/refcount.h> header [ Upstream commit 77e9971c79c29542ab7dd4140f9343bf2ff36158 ] In an effort to improve performance of the REFCOUNT_FULL implementation, move the bulk of its functions into linux/refcount.h. This allows them to be inlined in the same way as if they had been provided via CONFIG_ARCH_HAS_REFCOUNT. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191121115902.2551-5-will@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* locking/refcount: Remove unused refcount_*_checked() variantsWill Deacon2022-07-291-25/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7221762c48c6bbbcc6cc51d8b803c06930215e34 ] The full-fat refcount implementation is exposed via a set of functions suffixed with "_checked()", the idea being that code can choose to use the more expensive, yet more secure implementation on a case-by-case basis. In reality, this hasn't happened, so with a grand total of zero users, let's remove the checked variants for now by simply dropping the suffix and predicating the out-of-line functions on CONFIG_REFCOUNT_FULL=y. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191121115902.2551-4-will@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* locking/refcount: Ensure integer operands are treated as signedWill Deacon2022-07-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 97a1420adf0cdf0cf6f41bab0b2acf658c96b94b ] In preparation for changing the saturation point of REFCOUNT_FULL to INT_MIN/2, change the type of integer operands passed into the API from 'unsigned int' to 'int' so that we can avoid casting during comparisons when we don't want to fall foul of C integral conversion rules for signed and unsigned types. Since the kernel is compiled with '-fno-strict-overflow', we don't need to worry about the UB introduced by signed overflow here. Furthermore, we're already making heavy use of the atomic_t API, which operates exclusively on signed types. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191121115902.2551-3-will@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* locking/refcount: Define constants for saturation and max refcount valuesWill Deacon2022-07-291-17/+20
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 23e6b169c9917fbd77534f8c5f378cb073f548bd ] The REFCOUNT_FULL implementation uses a different saturation point than the x86 implementation, which means that the shared refcount code in lib/refcount.c (e.g. refcount_dec_not_one()) needs to be aware of the difference. Rather than duplicate the definitions from the lkdtm driver, instead move them into <linux/refcount.h> and update all references accordingly. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191121115902.2551-2-will@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ida: don't use BUG_ON() for debuggingLinus Torvalds2022-07-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fc82bbf4dede758007763867d0282353c06d1121 upstream. This is another old BUG_ON() that just shouldn't exist (see also commit a382f8fee42c: "signal handling: don't use BUG_ON() for debugging"). In fact, as Matthew Wilcox points out, this condition shouldn't really even result in a warning, since a negative id allocation result is just a normal allocation failure: "I wonder if we should even warn here -- sure, the caller is trying to free something that wasn't allocated, but we don't warn for kfree(NULL)" and goes on to point out how that current error check is only causing people to unnecessarily do their own index range checking before freeing it. This was noted by Itay Iellin, because the bluetooth HCI socket cookie code does *not* do that range checking, and ends up just freeing the error case too, triggering the BUG_ON(). The HCI code requires CAP_NET_RAW, and seems to just result in an ugly splat, but there really is no reason to BUG_ON() here, and we have generally striven for allocation models where it's always ok to just do free(alloc()); even if the allocation were to fail for some random reason (usually obviously that "random" reason being some resource limit). Fixes: 88eca0207cf1 ("ida: simplified functions for id allocation") Reported-by: Itay Iellin <ieitayie@gmail.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* random: remove ratelimiting for in-kernel unseeded randomnessJason A. Donenfeld2022-06-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit cc1e127bfa95b5fb2f9307e7168bf8b2b45b4c5e upstream. The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the kernel warns about all unseeded randomness or just the first instance. There's some complicated rate limiting and comparison to the previous caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled, developers still don't see all the messages or even an accurate count of how many were missed. This is the result of basically parallel mechanisms aimed at accomplishing more or less the same thing, added at different points in random.c history, which sort of compete with the first-instance-only limiting we have now. It turns out, however, that nobody cares about the first unseeded randomness instance of in-kernel users. The same first user has been there for ages now, and nobody is doing anything about it. It isn't even clear that anybody _can_ do anything about it. Most places that can do something about it have switched over to using get_random_bytes_wait() or wait_for_random_bytes(), which is the right thing to do, but there is still much code that needs randomness sometimes during init, and as a geeneral rule, if you're not using one of the _wait functions or the readiness notifier callback, you're bound to be doing it wrong just based on that fact alone. So warning about this same first user that can't easily change is simply not an effective mechanism for anything at all. Users can't do anything about it, as the Kconfig text points out -- the problem isn't in userspace code -- and kernel developers don't or more often can't react to it. Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM is set, so that developers can debug things need be, or if it isn't set, don't show a warning at all. At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting random.ratelimit_disable=1 on by default, since if you care about one you probably care about the other too. And we can clean up usage around the related urandom_warning ratelimiter as well (whose behavior isn't changing), so that it properly counts missed messages after the 10 message threshold is reached. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* siphash: use one source of truth for siphash permutationsJason A. Donenfeld2022-06-221-22/+10
| | | | | | | | | | | | | | | | | | | | | commit e73aaae2fa9024832e1f42e30c787c7baf61d014 upstream. The SipHash family of permutations is currently used in three places: - siphash.c itself, used in the ordinary way it was intended. - random32.c, in a construction from an anonymous contributor. - random.c, as part of its fast_mix function. Each one of these places reinvents the wheel with the same C code, same rotation constants, and same symmetry-breaking constants. This commit tidies things up a bit by placing macros for the permutations and constants into siphash.h, where each of the three .c users can access them. It also leaves a note dissuading more users of them from emerging. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* random: replace custom notifier chain with standard oneJason A. Donenfeld2022-06-222-9/+14
| | | | | | | | | | | | | | | | | | | | | commit 5acd35487dc911541672b3ffc322851769c32a56 upstream. We previously rolled our own randomness readiness notifier, which only has two users in the whole kernel. Replace this with a more standard atomic notifier block that serves the same purpose with less code. Also unexport the symbols, because no modules use it, only unconditional builtins. The only drawback is that it's possible for a notification handler returning the "stop" code to prevent further processing, but given that there are only two users, and that we're unexporting this anyway, that doesn't seem like a significant drawback for the simplification we receive here. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> [Jason: for stable, also backported to crypto/drbg.c, not unexporting.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* random: remove unused tracepointsJason A. Donenfeld2022-06-221-0/+2
| | | | | | | | | | | | | | | commit 14c174633f349cb41ea90c2c0aaddac157012f74 upstream. These explicit tracepoints aren't really used and show sign of aging. It's work to keep these up to date, and before I attempted to keep them up to date, they weren't up to date, which indicates that they're not really used. These days there are better ways of introspecting anyway. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lib/crypto: sha1: re-roll loops to reduce code sizeJason A. Donenfeld2022-06-221-81/+14
| | | | | | | | | | | | | | | | | commit 9a1536b093bb5bf60689021275fd24d513bb8db0 upstream. With SHA-1 no longer being used for anything performance oriented, and also soon to be phased out entirely, we can make up for the space added by unrolled BLAKE2s by simply re-rolling SHA-1. Since SHA-1 is so much more complex, re-rolling it more or less takes care of the code size added by BLAKE2s. And eventually, hopefully we'll see SHA-1 removed entirely from most small kernel builds. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ard Biesheuvel <ardb@kernel.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lib/crypto: blake2s: move hmac construction into wireguardJason A. Donenfeld2022-06-222-68/+0
| | | | | | | | | | | | | | | | | | | | commit d8d83d8ab0a453e17e68b3a3bed1f940c34b8646 upstream. Basically nobody should use blake2s in an HMAC construction; it already has a keyed variant. But unfortunately for historical reasons, Noise, used by WireGuard, uses HKDF quite strictly, which means we have to use this. Because this really shouldn't be used by others, this commit moves it into wireguard's noise.c locally, so that kernels that aren't using WireGuard don't get this superfluous code baked in. On m68k systems, this shaves off ~314 bytes. Cc: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> [Jason: for stable, skip the wireguard changes, since this kernel doesn't have wireguard.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* crypto: blake2s - generic C library implementation and selftestJason A. Donenfeld2022-06-224-0/+854
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 66d7fb94e4ffe5acc589e0b2b4710aecc1f07a28 upstream. The C implementation was originally based on Samuel Neves' public domain reference implementation but has since been heavily modified for the kernel. We're able to do compile-time optimizations by moving some scaffolding around the final function into the header file. Information: https://blake2.net/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Co-developed-by: Samuel Neves <sneves@dei.uc.pt> [ardb: - move from lib/zinc to lib/crypto - remove simd handling - rewrote selftest for better coverage - use fixed digest length for blake2s_hmac() and rename to blake2s256_hmac() ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [Jason: for stable, skip kconfig and wire up directly, and skip the arch hooks; optimized implementations need not be backported.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nodemask: Fix return values to be unsignedKees Cook2022-06-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0dfe54071d7c828a02917b595456bfde1afdddc9 ] The nodemask routines had mixed return values that provided potentially signed return values that could never happen. This was leading to the compiler getting confusing about the range of possible return values (it was thinking things could be negative where they could not be). Fix all the nodemask routines that should be returning unsigned (or bool) values. Silences: mm/swapfile.c: In function ‘setup_swap_info’: mm/swapfile.c:2291:47: error: array subscript -1 is below array bounds of ‘struct plist_node[]’ [-Werror=array-bounds] 2291 | p->avail_lists[i].prio = 1; | ~~~~~~~~~~~~~~^~~ In file included from mm/swapfile.c:16: ./include/linux/swap.h:292:27: note: while referencing ‘avail_lists’ 292 | struct plist_node avail_lists[]; /* | ^~~~~~~~~~~ Reported-by: Christophe de Dinechin <dinechin@redhat.com> Link: https://lore.kernel.org/lkml/20220414150855.2407137-3-dinechin@redhat.com/ Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* assoc_array: Fix BUG_ON during garbage collectStephen Brennan2022-06-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d1dc87763f406d4e67caf16dbe438a5647692395 upstream. A rare BUG_ON triggered in assoc_array_gc: [3430308.818153] kernel BUG at lib/assoc_array.c:1609! Which corresponded to the statement currently at line 1593 upstream: BUG_ON(assoc_array_ptr_is_meta(p)); Using the data from the core dump, I was able to generate a userspace reproducer[1] and determine the cause of the bug. [1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc After running the iterator on the entire branch, an internal tree node looked like the following: NODE (nr_leaves_on_branch: 3) SLOT [0] NODE (2 leaves) SLOT [1] NODE (1 leaf) SLOT [2..f] NODE (empty) In the userspace reproducer, the pr_devel output when compressing this node was: -- compress node 0x5607cc089380 -- free=0, leaves=0 [0] retain node 2/1 [nx 0] [1] fold node 1/1 [nx 0] [2] fold node 0/1 [nx 2] [3] fold node 0/2 [nx 2] [4] fold node 0/3 [nx 2] [5] fold node 0/4 [nx 2] [6] fold node 0/5 [nx 2] [7] fold node 0/6 [nx 2] [8] fold node 0/7 [nx 2] [9] fold node 0/8 [nx 2] [10] fold node 0/9 [nx 2] [11] fold node 0/10 [nx 2] [12] fold node 0/11 [nx 2] [13] fold node 0/12 [nx 2] [14] fold node 0/13 [nx 2] [15] fold node 0/14 [nx 2] after: 3 At slot 0, an internal node with 2 leaves could not be folded into the node, because there was only one available slot (slot 0). Thus, the internal node was retained. At slot 1, the node had one leaf, and was able to be folded in successfully. The remaining nodes had no leaves, and so were removed. By the end of the compression stage, there were 14 free slots, and only 3 leaf nodes. The tree was ascended and then its parent node was compressed. When this node was seen, it could not be folded, due to the internal node it contained. The invariant for compression in this function is: whenever nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all leaf nodes. The compression step currently cannot guarantee this, given the corner case shown above. To fix this issue, retry compression whenever we have retained a node, and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second compression will then allow the node in slot 1 to be folded in, satisfying the invariant. Below is the output of the reproducer once the fix is applied: -- compress node 0x560e9c562380 -- free=0, leaves=0 [0] retain node 2/1 [nx 0] [1] fold node 1/1 [nx 0] [2] fold node 0/1 [nx 2] [3] fold node 0/2 [nx 2] [4] fold node 0/3 [nx 2] [5] fold node 0/4 [nx 2] [6] fold node 0/5 [nx 2] [7] fold node 0/6 [nx 2] [8] fold node 0/7 [nx 2] [9] fold node 0/8 [nx 2] [10] fold node 0/9 [nx 2] [11] fold node 0/10 [nx 2] [12] fold node 0/11 [nx 2] [13] fold node 0/12 [nx 2] [14] fold node 0/13 [nx 2] [15] fold node 0/14 [nx 2] internal nodes remain despite enough space, retrying -- compress node 0x560e9c562380 -- free=14, leaves=1 [0] fold node 2/15 [nx 0] after: 3 Changes ======= DH: - Use false instead of 0. - Reorder the inserted lines in a couple of places to put retained before next_slot. ver #2) - Fix typo in pr_devel, correct comparison to "<=" Fixes: 3cb989501c26 ("Add a generic associative array implementation.") Cc: <stable@vger.kernel.org> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: keyrings@vger.kernel.org Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.com/ # v1 Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.com/ # v2 Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dim: initialize all struct fieldsJesse Brandeburg2022-05-181-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ee1444b5e1df4155b591d0d9b1e72853a99ea861 ] The W=2 build pointed out that the code wasn't initializing all the variables in the dim_cq_moder declarations with the struct initializers. The net change here is zero since these structs were already static const globals and were initialized with zeros by the compiler, but removing compiler warnings has value in and of itself. lib/dim/net_dim.c: At top level: lib/dim/net_dim.c:54:9: warning: missing initializer for field ‘comps’ of ‘const struct dim_cq_moder’ [-Wmissing-field-initializers] 54 | NET_DIM_RX_EQE_PROFILES, | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from lib/dim/net_dim.c:6: ./include/linux/dim.h:45:13: note: ‘comps’ declared here 45 | u16 comps; | ^~~~~ and repeats for the tx struct, and once you fix the comps entry then the cq_period_mode field needs the same treatment. Use the commonly accepted style to indicate to the compiler that we know what we're doing, and add a comma at the end of each struct initializer to clean up the issue, and use explicit initializers for the fields we are initializing which makes the compiler happy. While here and fixing these lines, clean up the code slightly with a fix for the super long lines by removing the word "_MODERATION" from a couple defines only used in this file. Fixes: f8be17b81d44 ("lib/dim: Fix -Wunused-const-variable warnings") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20220507011038.14568-1-jesse.brandeburg@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* hex2bin: fix access beyond string endMikulas Patocka2022-05-091-3/+6
| | | | | | | | | | | | | | | | | | | | | commit e4d8a29997731b3bb14059024b24df9f784288d0 upstream. If we pass too short string to "hex2bin" (and the string size without the terminating NUL character is even), "hex2bin" reads one byte after the terminating NUL character. This patch fixes it. Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on error - so we can't just return the variable "hi" or "lo" on error. This inconsistency may be fixed in the next merge window, but for the purpose of fixing this bug, we just preserve the existing behavior and return -1 and -EINVAL. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Fixes: b78049831ffe ("lib: add error checking to hex2bin") Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hex2bin: make the function hex_to_bin constant-timeMikulas Patocka2022-05-091-7/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e5be15767e7e284351853cbaba80cde8620341fb upstream. The function hex2bin is used to load cryptographic keys into device mapper targets dm-crypt and dm-integrity. It should take constant time independent on the processed data, so that concurrently running unprivileged code can't infer any information about the keys via microarchitectural convert channels. This patch changes the function hex_to_bin so that it contains no branches and no memory accesses. Note that this shouldn't cause performance degradation because the size of the new function is the same as the size of the old function (on x86-64) - and the new function causes no branch misprediction penalties. I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64 i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32 sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64 powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are no branches in the generated code. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lz4: fix LZ4_decompress_safe_partial read out of boundGuo Xuenan2022-04-151-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit eafc0a02391b7b36617b36c97c4b5d6832cf5e24 upstream. When partialDecoding, it is EOF if we've either filled the output buffer or can't proceed with reading an offset for following match. In some extreme corner cases when compressed data is suitably corrupted, UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial may lead to read out of bound problem during decoding. lz4 upstream has fixed it [2] and this issue has been disscussed here [3] before. current decompression routine was ported from lz4 v1.8.3, bumping lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd better fix it first. [1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/ [2] https://github.com/lz4/lz4/commit/c5d6f8a8be3927c0bec91bcc58667a6cfad244ad# [3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/ Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Reviewed-by: Nick Terrell <terrelln@fb.com> Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Yann Collet <cyan@fb.com> Cc: Chengyang Fan <cy.fan@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* XArray: Update the LRU list in xas_split()Matthew Wilcox (Oracle)2022-04-151-0/+2
| | | | | | | | | | | | | | | | | | commit 3ed4bb77156da0bc732847c8c9df92454c1fbeea upstream. When splitting a value entry, we may need to add the new nodes to the LRU list and remove the parent node from the LRU list. The WARN_ON checks in shadow_lru_isolate() catch this oversight. This bug was latent until we stopped splitting folios in shrink_page_list() with commit 820c4e2e6f51 ("mm/vmscan: Free non-shmem folios without splitting them"). That allows the creation of large shadow entries, and subsequently when trying to page in a small page, we will split the large shadow entry in __filemap_add_folio(). Fixes: 8fc75643c5e1 ("XArray: add xas_split") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* XArray: Fix xas_create_range() when multi-order entry presentMatthew Wilcox (Oracle)2022-04-152-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3e3c658055c002900982513e289398a1aad4a488 upstream. If there is already an entry present that is of order >= XA_CHUNK_SHIFT when we call xas_create_range(), xas_create_range() will misinterpret that entry as a node and dereference xa_node->parent, generally leading to a crash that looks something like this: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0 RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline] RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725 It's deterministically reproducable once you know what the problem is, but producing it in a live kernel requires khugepaged to hit a race. While the problem has been present since xas_create_range() was introduced, I'm not aware of a way to hit it before the page cache was converted to use multi-index entries. Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Reported-by: syzbot+0d2b0bf32ca5cfd09f2e@syzkaller.appspotmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3Paul Menzel2022-04-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 633174a7046ec3b4572bec24ef98e6ee89bce14b ] Buidling raid6test on Ubuntu 21.10 (ppc64le) with GNU Make 4.3 shows the errors below: $ cd lib/raid6/test/ $ make <stdin>:1:1: error: stray ‘\’ in program <stdin>:1:2: error: stray ‘#’ in program <stdin>:1:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ \ before ‘<’ token [...] The errors come from the HAS_ALTIVEC test, which fails, and the POWER optimized versions are not built. That’s also reason nobody noticed on the other architectures. GNU Make 4.3 does not remove the backslash anymore. From the 4.3 release announcment: > * WARNING: Backward-incompatibility! > Number signs (#) appearing inside a macro reference or function invocation > no longer introduce comments and should not be escaped with backslashes: > thus a call such as: > foo := $(shell echo '#') > is legal. Previously the number sign needed to be escaped, for example: > foo := $(shell echo '\#') > Now this latter will resolve to "\#". If you want to write makefiles > portable to both versions, assign the number sign to a variable: > H := \# > foo := $(shell echo '$H') > This was claimed to be fixed in 3.81, but wasn't, for some reason. > To detect this change search for 'nocomment' in the .FEATURES variable. So, do the same as commit 9564a8cf422d ("Kbuild: fix # escaping in .cmd files for future Make") and commit 929bef467771 ("bpf: Use $(pound) instead of \# in Makefiles") and define and use a $(pound) variable. Reference for the change in make: https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57 Cc: Matt Brown <matthew.brown.dev@gmail.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/test: use after free in register_test_dev_kmod()Dan Carpenter2022-04-151-0/+1
| | | | | | | | | | | [ Upstream commit dc0ce6cc4b133f5f2beb8b47dacae13a7d283c2c ] The "test_dev" pointer is freed but then returned to the caller. Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/raid6/test: fix multiple definition linking errorDirk Müller2022-04-151-1/+0
| | | | | | | | | | | | | | | | | | | | commit a5359ddd052860bacf957e65fe819c63e974b3a6 upstream. GCC 10+ defaults to -fno-common, which enforces proper declaration of external references using "extern". without this change a link would fail with: lib/raid6/test/algos.c:28: multiple definition of `raid6_call'; lib/raid6/test/test.c:22: first defined here the pq.h header that is included already includes an extern declaration so we can just remove the redundant one here. Cc: <stable@vger.kernel.org> Signed-off-by: Dirk Müller <dmueller@suse.de> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSEJulian Braha2022-03-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 11c57c3ba94da74c3446924260e34e0b1950b5d7 ] Resending this to properly add it to the patch tracker - thanks for letting me know, Arnd :) When ARM is enabled, and BITREVERSE is disabled, Kbuild gives the following warning: WARNING: unmet direct dependencies detected for HAVE_ARCH_BITREVERSE Depends on [n]: BITREVERSE [=n] Selected by [y]: - ARM [=y] && (CPU_32v7M [=n] || CPU_32v7 [=y]) && !CPU_32v6 [=n] This is because ARM selects HAVE_ARCH_BITREVERSE without selecting BITREVERSE, despite HAVE_ARCH_BITREVERSE depending on BITREVERSE. This unmet dependency bug was found by Kismet, a static analysis tool for Kconfig. Please advise if this is not the appropriate solution. Signed-off-by: Julian Braha <julianbraha@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/iov_iter: initialize "flags" in new pipe_bufferMax Kellermann2022-02-231-0/+2
| | | | | | | | | | | | | | | | | commit 9d2231c5d74e13b2a0546fee6737ee4446017903 upstream. The functions copy_page_to_iter_pipe() and push_pipe() can both allocate a new pipe_buffer, but the "flags" member initializer is missing. Fixes: 241699cd72a8 ("new iov_iter flavour: pipe-backed") To: Alexander Viro <viro@zeniv.linux.org.uk> To: linux-fsdevel@vger.kernel.org To: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lib/test_meminit: destroy cache in kmem_cache_alloc_bulk() testAndrey Konovalov2022-01-271-0/+1
| | | | | | | | | | | | | | | | | commit e073e5ef90298d2d6e5e7f04b545a0815e92110c upstream. Make do_kmem_cache_size_bulk() destroy the cache it creates. Link: https://lkml.kernel.org/r/aced20a94bf04159a139f0846e41d38a1537debb.1640018297.git.andreyknvl@google.com Fixes: 03a9349ac0e0 ("lib/test_meminit: add a kmem_cache_alloc_bulk() test") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* siphash: use _unaligned version by defaultArnd Bergmann2021-12-081-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d upstream. On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because the ordinary load/store instructions (ldr, ldrh, ldrb) can tolerate any misalignment of the memory address. However, load/store double and load/store multiple instructions (ldrd, ldm) may still only be used on memory addresses that are 32-bit aligned, and so we have to use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we may end up with a severe performance hit due to alignment traps that require fixups by the kernel. Testing shows that this currently happens with clang-13 but not gcc-11. In theory, any compiler version can produce this bug or other problems, as we are dealing with undefined behavior in C99 even on architectures that support this in hardware, see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363. Fortunately, the get_unaligned() accessors do the right thing: when building for ARMv6 or later, the compiler will emit unaligned accesses using the ordinary load/store instructions (but avoid the ones that require 32-bit alignment). When building for older ARM, those accessors will emit the appropriate sequence of ldrb/mov/orr instructions. And on architectures that can truly tolerate any kind of misalignment, the get_unaligned() accessors resolve to the leXX_to_cpup accessors that operate on aligned addresses. Since the compiler will in fact emit ldrd or ldm instructions when building this code for ARM v6 or later, the solution is to use the unaligned accessors unconditionally on architectures where this is known to be fast. The _aligned version of the hash function is however still needed to get the best performance on architectures that cannot do any unaligned access in hardware. This new version avoids the undefined behavior and should produce the fastest hash on all architectures we support. Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuvel@linaro.org/ Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/ Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Fixes: 2c956a60778c ("siphash: add cryptographically secure PRF") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lib/xz: Validate the value before assigning it to an enum variableLasse Collin2021-11-171-3/+3
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 4f8d7abaa413c34da9d751289849dbfb7c977d05 ] This might matter, for example, if the underlying type of enum xz_check was a signed char. In such a case the validation wouldn't have caught an unsupported header. I don't know if this problem can occur in the kernel on any arch but it's still good to fix it because some people might copy the XZ code to their own projects from Linux instead of the upstream XZ Embedded repository. This change may increase the code size by a few bytes. An alternative would have been to use an unsigned int instead of enum xz_check but using an enumeration looks cleaner. Link: https://lore.kernel.org/r/20211010213145.17462-3-xiang@kernel.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib/xz: Avoid overlapping memcpy() with invalid input with in-place ↵Lasse Collin2021-11-172-3/+20
| | | | | | | | | | | | | | | | | | | | | decompression [ Upstream commit 83d3c4f22a36d005b55f44628f46cc0d319a75e8 ] With valid files, the safety margin described in lib/decompress_unxz.c ensures that these buffers cannot overlap. But if the uncompressed size of the input is larger than the caller thought, which is possible when the input file is invalid/corrupt, the buffers can overlap. Obviously the result will then be garbage (and usually the decoder will return an error too) but no other harm will happen when such an over-run occurs. This change only affects uncompressed LZMA2 chunks and so this should have no effect on performance. Link: https://lore.kernel.org/r/20211010213145.17462-2-xiang@kernel.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return valueAndreas Gruenbacher2021-11-171-2/+3
| | | | | | | | | | | | | | | | | [ Upstream commit 814a66741b9ffb5e1ba119e368b178edb0b7322d ] Both iov_iter_get_pages and iov_iter_get_pages_alloc return the number of bytes of the iovec they could get the pages for. When they cannot get any pages, they're supposed to return 0, but when the start of the iovec isn't page aligned, the calculation goes wrong and they return a negative value. Fix both functions. In addition, change iov_iter_get_pages_alloc to return NULL in that case to prevent resource leaks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>