summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Linux 4.9.265v4.9.265Greg Kroah-Hartman2021-04-071-1/+1
| | | | | | | | | | | Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Jason Self <jason@bluehome.net> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/r/20210405085018.871387942@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* audit: fix a net reference leak in audit_list_rules_send()Paul Moore2021-04-073-9/+8
| | | | | | | | | | | | | | | | | commit 3054d06719079388a543de6adb812638675ad8f5 upstream. If audit_list_rules_send() fails when trying to create a new thread to send the rules it also fails to cleanup properly, leaking a reference to a net structure. This patch fixes the error patch and renames audit_send_list() to audit_send_list_thread() to better match its cousin, audit_send_reply_thread(). Reported-by: teroincn@gmail.com Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* audit: fix a net reference leak in audit_send_reply()Paul Moore2021-04-071-18/+28
| | | | | | | | | | | | | | | | commit a48b284b403a4a073d8beb72d2bb33e54df67fb6 upstream. If audit_send_reply() fails when trying to create a new thread to send the reply it also fails to cleanup properly, leaking a reference to a net structure. This patch fixes the error path and makes a handful of other cleanups that came up while fixing the code. Reported-by: teroincn@gmail.com Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* staging: rtl8192e: Change state information from u16 to u8Atul Gopinathan2021-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e78836ae76d20f38eed8c8c67f21db97529949da upstream. The "u16 CcxRmState[2];" array field in struct "rtllib_network" has 4 bytes in total while the operations performed on this array through-out the code base are only 2 bytes. The "CcxRmState" field is fed only 2 bytes of data using memcpy(): (In rtllib_rx.c:1972) memcpy(network->CcxRmState, &info_element->data[4], 2) With "info_element->data[]" being a u8 array, if 2 bytes are written into "CcxRmState" (whose one element is u16 size), then the 2 u8 elements from "data[]" gets squashed and written into the first element ("CcxRmState[0]") while the second element ("CcxRmState[1]") is never fed with any data. Same in file rtllib_rx.c:2522: memcpy(dst->CcxRmState, src->CcxRmState, 2); The above line duplicates "src" data to "dst" but only writes 2 bytes (and not 4, which is the actual size). Again, only 1st element gets the value while the 2nd element remains uninitialized. This later makes operations done with CcxRmState unpredictable in the following lines as the 1st element is having a squashed number while the 2nd element is having an uninitialized random number. rtllib_rx.c:1973: if (network->CcxRmState[0] != 0) rtllib_rx.c:1977: network->MBssidMask = network->CcxRmState[1] & 0x07; network->MBssidMask is also of type u8 and not u16. Fix this by changing the type of "CcxRmState" from u16 to u8 so that the data written into this array and read from it make sense and are not random values. NOTE: The wrong initialization of "CcxRmState" can be seen in the following commit: commit ecdfa44610fa ("Staging: add Realtek 8192 PCI wireless driver") The above commit created a file `rtl8192e/ieee80211.h` which used to have the faulty line. The file has been deleted (or possibly renamed) with the contents copied in to a new file `rtl8192e/rtllib.h` along with additional code in the commit 94a799425eee (tagged in Fixes). Fixes: 94a799425eee ("From: wlanfae <wlanfae@realtek.com> [PATCH 1/8] rtl8192e: Import new version of driver from realtek") Cc: stable@vger.kernel.org Signed-off-by: Atul Gopinathan <atulgopinathan@gmail.com> Link: https://lore.kernel.org/r/20210323113413.29179-2-atulgopinathan@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* staging: rtl8192e: Fix incorrect source in memcpy()Atul Gopinathan2021-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 72ad25fbbb78930f892b191637359ab5b94b3190 upstream. The variable "info_element" is of the following type: struct rtllib_info_element *info_element defined in drivers/staging/rtl8192e/rtllib.h: struct rtllib_info_element { u8 id; u8 len; u8 data[]; } __packed; The "len" field defines the size of the "data[]" array. The code is supposed to check if "info_element->len" is greater than 4 and later equal to 6. If this is satisfied then, the last two bytes (the 4th and 5th element of u8 "data[]" array) are copied into "network->CcxRmState". Right now the code uses "memcpy()" with the source as "&info_element[4]" which would copy in wrong and unintended information. The struct "rtllib_info_element" has a size of 2 bytes for "id" and "len", therefore indexing will be done in interval of 2 bytes. So, "info_element[4]" would point to data which is beyond the memory allocated for this pointer (that is, at x+8, while "info_element" has been allocated only from x to x+7 (2 + 6 => 8 bytes)). This patch rectifies this error by using "&info_element->data[4]" which correctly copies the last two bytes of "data[]". NOTE: The faulty line of code came from the following commit: commit ecdfa44610fa ("Staging: add Realtek 8192 PCI wireless driver") The above commit created the file `rtl8192e/ieee80211/ieee80211_rx.c` which had the faulty line of code. This file has been deleted (or possibly renamed) with the contents copied in to a new file `rtl8192e/rtllib_rx.c` along with additional code in the commit 94a799425eee (tagged in Fixes). Fixes: 94a799425eee ("From: wlanfae <wlanfae@realtek.com> [PATCH 1/8] rtl8192e: Import new version of driver from realtek") Cc: stable@vger.kernel.org Signed-off-by: Atul Gopinathan <atulgopinathan@gmail.com> Link: https://lore.kernel.org/r/20210323113413.29179-1-atulgopinathan@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* USB: cdc-acm: fix use-after-free after probe failureJohan Hovold2021-04-071-0/+5
| | | | | | | | | | | | | | | | | commit 4e49bf376c0451ad2eae2592e093659cde12be9a upstream. If tty-device registration fails the driver would fail to release the data interface. When the device is later disconnected, the disconnect callback would still be called for the data interface and would go about releasing already freed resources. Fixes: c93d81955005 ("usb: cdc-acm: fix error handling in acm_probe()") Cc: stable@vger.kernel.org # 3.9 Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20210322155318.9837-3-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* USB: cdc-acm: downgrade message to debugOliver Neukum2021-04-071-1/+2
| | | | | | | | | | | | | commit e4c77070ad45fc940af1d7fb1e637c349e848951 upstream. This failure is so common that logging an error here amounts to spamming log files. Reviewed-by: Bruno Thomsen <bruno.thomsen@gmail.com> Signed-off-by: Oliver Neukum <oneukum@suse.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210311130126.15972-2-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* cdc-acm: fix BREAK rx code path adding necessary callsOliver Neukum2021-04-071-1/+3
| | | | | | | | | | | | | commit 08dff274edda54310d6f1cf27b62fddf0f8d146e upstream. Counting break events is nice but we should actually report them to the tty layer. Fixes: 5a6a62bdb9257 ("cdc-acm: add TIOCMIWAIT") Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/20210311133714.31881-1-oneukum@suse.com Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: xhci-mtk: fix broken streams issue on 0.96 xHCIChunfeng Yun2021-04-071-1/+9
| | | | | | | | | | | | | | | commit 6f978a30c9bb12dab1302d0f06951ee290f5e600 upstream. The MediaTek 0.96 xHCI controller on some platforms does not support bulk stream even HCCPARAMS says supporting, due to MaxPSASize is set a default value 1 by mistake, here use XHCI_BROKEN_STREAMS quirk to fix it. Fixes: 94a631d91ad3 ("usb: xhci-mtk: check hcc_params after adding primary hcd") Cc: stable <stable@vger.kernel.org> Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com> Link: https://lore.kernel.org/r/1616482975-17841-4-git-send-email-chunfeng.yun@mediatek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modemVincent Palatin2021-04-071-0/+4
| | | | | | | | | | | | | | | | | | | | | commit 0bd860493f81eb2a46173f6f5e44cc38331c8dbd upstream. This LTE modem (M.2 card) has a bug in its power management: there is some kind of race condition for U3 wake-up between the host and the device. The modem firmware sometimes crashes/locks when both events happen at the same time and the modem fully drops off the USB bus (and sometimes re-enumerates, sometimes just gets stuck until the next reboot). Tested with the modem wired to the XHCI controller on an AMD 3015Ce platform. Without the patch, the modem dropped of the USB bus 5 times in 3 days. With the quirk, it stayed connected for a week while the 'runtime_suspended_time' counter incremented as excepted. Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Link: https://lore.kernel.org/r/20210319124802.2315195-1-vpalatin@chromium.org Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* firewire: nosy: Fix a use-after-free bug in nosy_ioctl()Zheyu Ma2021-04-071-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 829933ef05a951c8ff140e814656d73e74915faf ] For each device, the nosy driver allocates a pcilynx structure. A use-after-free might happen in the following scenario: 1. Open nosy device for the first time and call ioctl with command NOSY_IOC_START, then a new client A will be malloced and added to doubly linked list. 2. Open nosy device for the second time and call ioctl with command NOSY_IOC_START, then a new client B will be malloced and added to doubly linked list. 3. Call ioctl with command NOSY_IOC_START for client A, then client A will be readded to the doubly linked list. Now the doubly linked list is messed up. 4. Close the first nosy device and nosy_release will be called. In nosy_release, client A will be unlinked and freed. 5. Close the second nosy device, and client A will be referenced, resulting in UAF. The root cause of this bug is that the element in the doubly linked list is reentered into the list. Fix this bug by adding a check before inserting a client. If a client is already in the linked list, don't insert it. The following KASAN report reveals it: BUG: KASAN: use-after-free in nosy_release+0x1ea/0x210 Write of size 8 at addr ffff888102ad7360 by task poc CPU: 3 PID: 337 Comm: poc Not tainted 5.12.0-rc5+ #6 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: nosy_release+0x1ea/0x210 __fput+0x1e2/0x840 task_work_run+0xe8/0x180 exit_to_user_mode_prepare+0x114/0x120 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Allocated by task 337: nosy_open+0x154/0x4d0 misc_open+0x2ec/0x410 chrdev_open+0x20d/0x5a0 do_dentry_open+0x40f/0xe80 path_openat+0x1cf9/0x37b0 do_filp_open+0x16d/0x390 do_sys_openat2+0x11d/0x360 __x64_sys_open+0xfd/0x1a0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 337: kfree+0x8f/0x210 nosy_release+0x158/0x210 __fput+0x1e2/0x840 task_work_run+0xe8/0x180 exit_to_user_mode_prepare+0x114/0x120 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff888102ad7300 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 96 bytes inside of 128-byte region [ffff888102ad7300, ffff888102ad7380) [ Modified to use 'list_empty()' inside proper lock - Linus ] Link: https://lore.kernel.org/lkml/1617433116-5930-1-git-send-email-zheyuma97@gmail.com/ Reported-and-tested-by: 马哲宇 (Zheyu Ma) <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* extcon: Fix error handling in extcon_dev_registerDinghao Liu2021-04-071-0/+1
| | | | | | | | | | | | [ Upstream commit d3bdd1c3140724967ca4136755538fa7c05c2b4e ] When devm_kcalloc() fails, we should execute device_unregister() to unregister edev->dev from system. Fixes: 046050f6e623e ("extcon: Update the prototype of extcon_register_notifier() with enum extcon") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* pinctrl: rockchip: fix restore error in resumeWang Panzhenzhuan2021-04-071-5/+8
| | | | | | | | | | | | | | | | commit c971af25cda94afe71617790826a86253e88eab0 upstream. The restore in resume should match to suspend which only set for RK3288 SoCs pinctrl. Fixes: 8dca933127024 ("pinctrl: rockchip: save and restore gpio6_c6 pinmux in suspend/resume") Reviewed-by: Jianqun Xu <jay.xu@rock-chips.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Wang Panzhenzhuan <randy.wang@rock-chips.com> Signed-off-by: Jianqun Xu <jay.xu@rock-chips.com> Link: https://lore.kernel.org/r/20210223100725.269240-1-jay.xu@rock-chips.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* reiserfs: update reiserfs_xattrs_initialized() conditionTetsuo Handa2021-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5e46d1b78a03d52306f21f77a4e4a144b6d31486 upstream. syzbot is reporting NULL pointer dereference at reiserfs_security_init() [1], for commit ab17c4f02156c4f7 ("reiserfs: fixup xattr_root caching") is assuming that REISERFS_SB(s)->xattr_root != NULL in reiserfs_xattr_jcreate_nblocks() despite that commit made REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL case possible. I guess that commit 6cb4aff0a77cc0e6 ("reiserfs: fix oops while creating privroot with selinux enabled") wanted to check xattr_root != NULL before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking about the xattr root. The issue is that while creating the privroot during mount reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which dereferences the xattr root. The xattr root doesn't exist, so we get an oops. Therefore, update reiserfs_xattrs_initialized() to check both the privroot and the xattr root. Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde # [1] Reported-and-tested-by: syzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 6cb4aff0a77c ("reiserfs: fix oops while creating privroot with selinux enabled") Acked-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jan Kara <jack@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: fix race by making init_zero_pfn() early_initcallIlya Lipnitskiy2021-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e720e7d0e983bf05de80b231bccc39f1487f0f16 upstream. There are code paths that rely on zero_pfn to be fully initialized before core_initcall. For example, wq_sysfs_init() is a core_initcall function that eventually results in a call to kernel_execve, which causes a page fault with a subsequent mmput. If zero_pfn is not initialized by then it may not get cleaned up properly and result in an error: BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1 Here is an analysis of the race as seen on a MIPS device. On this particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until initialized, at which point it becomes PFN 5120: 1. wq_sysfs_init calls into kobject_uevent_env at core_initcall: kobject_uevent_env+0x7e4/0x7ec kset_register+0x68/0x88 bus_register+0xdc/0x34c subsys_virtual_register+0x34/0x78 wq_sysfs_init+0x1c/0x4c do_one_initcall+0x50/0x1a8 kernel_init_freeable+0x230/0x2c8 kernel_init+0x10/0x100 ret_from_kernel_thread+0x14/0x1c 2. kobject_uevent_env() calls call_usermodehelper_exec() which executes kernel_execve asynchronously. 3. Memory allocations in kernel_execve cause a page fault, bumping the MM reference counter: add_mm_counter_fast+0xb4/0xc0 handle_mm_fault+0x6e4/0xea0 __get_user_pages.part.78+0x190/0x37c __get_user_pages_remote+0x128/0x360 get_arg_page+0x34/0xa0 copy_string_kernel+0x194/0x2a4 kernel_execve+0x11c/0x298 call_usermodehelper_exec_async+0x114/0x194 4. In case zero_pfn has not been initialized yet, zap_pte_range does not decrement the MM_ANONPAGES RSS counter and the BUG message is triggered shortly afterwards when __mmdrop checks the ref counters: __mmdrop+0x98/0x1d0 free_bprm+0x44/0x118 kernel_execve+0x160/0x1d8 call_usermodehelper_exec_async+0x114/0x194 ret_from_kernel_thread+0x14/0x1c To avoid races such as described above, initialize init_zero_pfn at early_initcall level. Depending on the architecture, ZERO_PAGE is either constant or gets initialized even earlier, at paging_init, so there is no issue with initializing zero_pfn earlier. Link: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: stable@vger.kernel.org Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tracing: Fix stack trace event sizeSteven Rostedt (VMware)2021-04-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9deb193af69d3fd6dd8e47f292b67c805a787010 upstream. Commit cbc3b92ce037 fixed an issue to modify the macros of the stack trace event so that user space could parse it properly. Originally the stack trace format to user space showed that the called stack was a dynamic array. But it is not actually a dynamic array, in the way that other dynamic event arrays worked, and this broke user space parsing for it. The update was to make the array look to have 8 entries in it. Helper functions were added to make it parse it correctly, as the stack was dynamic, but was determined by the size of the event stored. Although this fixed user space on how it read the event, it changed the internal structure used for the stack trace event. It changed the array size from [0] to [8] (added 8 entries). This increased the size of the stack trace event by 8 words. The size reserved on the ring buffer was the size of the stack trace event plus the number of stack entries found in the stack trace. That commit caused the amount to be 8 more than what was needed because it did not expect the caller field to have any size. This produced 8 entries of garbage (and reading random data) from the stack trace event: <idle>-0 [002] d... 1976396.837549: <stack trace> => trace_event_raw_event_sched_switch => __traceiter_sched_switch => __schedule => schedule_idle => do_idle => cpu_startup_entry => secondary_startup_64_no_verify => 0xc8c5e150ffff93de => 0xffff93de => 0 => 0 => 0xc8c5e17800000000 => 0x1f30affff93de => 0x00000004 => 0x200000000 Instead, subtract the size of the caller field from the size of the event to make sure that only the amount needed to store the stack trace is reserved. Link: https://lore.kernel.org/lkml/your-ad-here.call-01617191565-ext-9692@work.hours/ Cc: stable@vger.kernel.org Fixes: cbc3b92ce037 ("tracing: Set kernel_stack's caller size properly") Reported-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ALSA: hda/realtek: call alc_update_headset_mode() in hp_automute_hookHui Wang2021-04-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e54f30befa7990b897189b44a56c1138c6bfdbb5 upstream. We found the alc_update_headset_mode() is not called on some machines when unplugging the headset, as a result, the mode of the ALC_HEADSET_MODE_UNPLUGGED can't be set, then the current_headset_type is not cleared, if users plug a differnt type of headset next time, the determine_headset_type() will not be called and the audio jack is set to the headset type of previous time. On the Dell machines which connect the dmic to the PCH, if we open the gnome-sound-setting and unplug the headset, this issue will happen. Those machines disable the auto-mute by ucm and has no internal mic in the input source, so the update_headset_mode() will not be called by cap_sync_hook or automute_hook when unplugging, and because the gnome-sound-setting is opened, the codec will not enter the runtime_suspend state, so the update_headset_mode() will not be called by alc_resume when unplugging. In this case the hp_automute_hook is called when unplugging, so add update_headset_mode() calling to this function. Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://lore.kernel.org/r/20210320091542.6748-2-hui.wang@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ALSA: usb-audio: Apply sample rate quirk to Logitech ConnectIkjoon Jang2021-04-071-0/+1
| | | | | | | | | | | | | | | | | commit 625bd5a616ceda4840cd28f82e957c8ced394b6a upstream. Logitech ConferenceCam Connect is a compound USB device with UVC and UAC. Not 100% reproducible but sometimes it keeps responding STALL to every control transfer once it receives get_freq request. This patch adds 046d:0x084c to a snd_usb_get_sample_rate_quirk list. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203419 Signed-off-by: Ikjoon Jang <ikjn@chromium.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210324105153.2322881-1-ikjn@chromium.org Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* bpf: Remove MTU check in __bpf_skb_max_lenJesper Dangaard Brouer2021-04-071-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6306c1189e77a513bf02720450bb43bd4ba5d8ae upstream. Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu). When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device. This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit is elsewhere in the system. Jesper's testing[1] showed it was not possible to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is in-effect due to this being called from softirq context see code __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed that frames above 16KiB can cause NICs to reset (but not crash). Keep this sanity limit at this level as memory layer can differ based on kernel config. [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@firesoul Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: wan/lmc: unregister device when no matching device is foundTong Zhang2021-04-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 62e69bc419772638369eff8ff81340bde8aceb61 ] lmc set sc->lmc_media pointer when there is a matching device. However, when no matching device is found, this pointer is NULL and the following dereference will result in a null-ptr-deref. To fix this issue, unregister the hdlc device and return an error. [ 4.569359] BUG: KASAN: null-ptr-deref in lmc_init_one.cold+0x2b6/0x55d [lmc] [ 4.569748] Read of size 8 at addr 0000000000000008 by task modprobe/95 [ 4.570102] [ 4.570187] CPU: 0 PID: 95 Comm: modprobe Not tainted 5.11.0-rc7 #94 [ 4.570527] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-preb4 [ 4.571125] Call Trace: [ 4.571261] dump_stack+0x7d/0xa3 [ 4.571445] kasan_report.cold+0x10c/0x10e [ 4.571667] ? lmc_init_one.cold+0x2b6/0x55d [lmc] [ 4.571932] lmc_init_one.cold+0x2b6/0x55d [lmc] [ 4.572186] ? lmc_mii_readreg+0xa0/0xa0 [lmc] [ 4.572432] local_pci_probe+0x6f/0xb0 [ 4.572639] pci_device_probe+0x171/0x240 [ 4.572857] ? pci_device_remove+0xe0/0xe0 [ 4.573080] ? kernfs_create_link+0xb6/0x110 [ 4.573315] ? sysfs_do_create_link_sd.isra.0+0x76/0xe0 [ 4.573598] really_probe+0x161/0x420 [ 4.573799] driver_probe_device+0x6d/0xd0 [ 4.574022] device_driver_attach+0x82/0x90 [ 4.574249] ? device_driver_attach+0x90/0x90 [ 4.574485] __driver_attach+0x60/0x100 [ 4.574694] ? device_driver_attach+0x90/0x90 [ 4.574931] bus_for_each_dev+0xe1/0x140 [ 4.575146] ? subsys_dev_iter_exit+0x10/0x10 [ 4.575387] ? klist_node_init+0x61/0x80 [ 4.575602] bus_add_driver+0x254/0x2a0 [ 4.575812] driver_register+0xd3/0x150 [ 4.576021] ? 0xffffffffc0018000 [ 4.576202] do_one_initcall+0x84/0x250 [ 4.576411] ? trace_event_raw_event_initcall_finish+0x150/0x150 [ 4.576733] ? unpoison_range+0xf/0x30 [ 4.576938] ? ____kasan_kmalloc.constprop.0+0x84/0xa0 [ 4.577219] ? unpoison_range+0xf/0x30 [ 4.577423] ? unpoison_range+0xf/0x30 [ 4.577628] do_init_module+0xf8/0x350 [ 4.577833] load_module+0x3fe6/0x4340 [ 4.578038] ? vm_unmap_ram+0x1d0/0x1d0 [ 4.578247] ? ____kasan_kmalloc.constprop.0+0x84/0xa0 [ 4.578526] ? module_frob_arch_sections+0x20/0x20 [ 4.578787] ? __do_sys_finit_module+0x108/0x170 [ 4.579037] __do_sys_finit_module+0x108/0x170 [ 4.579278] ? __ia32_sys_init_module+0x40/0x40 [ 4.579523] ? file_open_root+0x200/0x200 [ 4.579742] ? do_sys_open+0x85/0xe0 [ 4.579938] ? filp_open+0x50/0x50 [ 4.580125] ? exit_to_user_mode_prepare+0xfc/0x130 [ 4.580390] do_syscall_64+0x33/0x40 [ 4.580586] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 4.580859] RIP: 0033:0x7f1a724c3cf7 [ 4.581054] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f7 48 89 d6 48 891 [ 4.582043] RSP: 002b:00007fff44941c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 4.582447] RAX: ffffffffffffffda RBX: 00000000012ada70 RCX: 00007f1a724c3cf7 [ 4.582827] RDX: 0000000000000000 RSI: 00000000012ac9e0 RDI: 0000000000000003 [ 4.583207] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001 [ 4.583587] R10: 00007f1a72527300 R11: 0000000000000246 R12: 00000000012ac9e0 [ 4.583968] R13: 0000000000000000 R14: 00000000012acc90 R15: 0000000000000001 [ 4.584349] ================================================================== Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* appletalk: Fix skb allocation size in loopback caseDoug Brown2021-04-071-12/+21
| | | | | | | | | | | | | | | | | | | [ Upstream commit 39935dccb21c60f9bbf1bb72d22ab6fd14ae7705 ] If a DDP broadcast packet is sent out to a non-gateway target, it is also looped back. There is a potential for the loopback device to have a longer hardware header length than the original target route's device, which can result in the skb not being created with enough room for the loopback device's hardware header. This patch fixes the issue by determining that a loopback will be necessary prior to allocating the skb, and if so, ensuring the skb has enough room. This was discovered while testing a new driver that creates a LocalTalk network interface (LTALK_HLEN = 1). It caused an skb_under_panic. Signed-off-by: Doug Brown <doug@schmorgal.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ext4: do not iput inode under running transaction in ext4_rename()zhangyi (F)2021-04-071-9/+9
| | | | | | | | | | | | | | | | | | [ Upstream commit 5dccdc5a1916d4266edd251f20bbbb113a5c495f ] In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into directory, it ends up dropping new created whiteout inode under the running transaction. After commit <9b88f9fb0d2> ("ext4: Do not iput inode under running transaction"), we follow the assumptions that evict() does not get called from a transaction context but in ext4_rename() it breaks this suggestion. Although it's not a real problem, better to obey it, so this patch add inode to orphan list and stop transaction before final iput(). Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210303131703.330415-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ASoC: rt5659: Update MCLK rate in set_sysclk()Sameer Pujar2021-04-071-0/+5
| | | | | | | | | | | | | | | | [ Upstream commit dbf54a9534350d6aebbb34f5c1c606b81a4f35dd ] Simple-card/audio-graph-card drivers do not handle MCLK clock when it is specified in the codec device node. The expectation here is that, the codec should actually own up the MCLK clock and do necessary setup in the driver. Suggested-by: Mark Brown <broonie@kernel.org> Suggested-by: Michael Walle <michael@walle.cc> Signed-off-by: Sameer Pujar <spujar@nvidia.com> Link: https://lore.kernel.org/r/1615829492-8972-3-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* staging: comedi: cb_pcidas64: fix request_irq() warnTong Zhang2021-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d2d106fe3badfc3bf0dd3899d1c3f210c7203eab ] request_irq() wont accept a name which contains slash so we need to repalce it with something else -- otherwise it will trigger a warning and the entry in /proc/irq/ will not be created since the .name might be used by userspace and we don't want to break userspace, so we are changing the parameters passed to request_irq() [ 1.565966] name 'pci-das6402/16' [ 1.566149] WARNING: CPU: 0 PID: 184 at fs/proc/generic.c:180 __xlate_proc_name+0x93/0xb0 [ 1.568923] RIP: 0010:__xlate_proc_name+0x93/0xb0 [ 1.574200] Call Trace: [ 1.574722] proc_mkdir+0x18/0x20 [ 1.576629] request_threaded_irq+0xfe/0x160 [ 1.576859] auto_attach+0x60a/0xc40 [cb_pcidas64] Suggested-by: Ian Abbott <abbotti@mev.co.uk> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Link: https://lore.kernel.org/r/20210315195814.4692-1-ztong0001@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* staging: comedi: cb_pcidas: fix request_irq() warnTong Zhang2021-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2e5848a3d86f03024ae096478bdb892ab3d79131 ] request_irq() wont accept a name which contains slash so we need to repalce it with something else -- otherwise it will trigger a warning and the entry in /proc/irq/ will not be created since the .name might be used by userspace and we don't want to break userspace, so we are changing the parameters passed to request_irq() [ 1.630764] name 'pci-das1602/16' [ 1.630950] WARNING: CPU: 0 PID: 181 at fs/proc/generic.c:180 __xlate_proc_name+0x93/0xb0 [ 1.634009] RIP: 0010:__xlate_proc_name+0x93/0xb0 [ 1.639441] Call Trace: [ 1.639976] proc_mkdir+0x18/0x20 [ 1.641946] request_threaded_irq+0xfe/0x160 [ 1.642186] cb_pcidas_auto_attach+0xf4/0x610 [cb_pcidas] Suggested-by: Ian Abbott <abbotti@mev.co.uk> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Link: https://lore.kernel.org/r/20210315195914.4801-1-ztong0001@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* scsi: qla2xxx: Fix broken #endif placementAlexey Dobriyan2021-04-071-1/+1
| | | | | | | | | | | | | [ Upstream commit 5999b9e5b1f8a2f5417b755130919b3ac96f5550 ] Only half of the file is under include guard because terminating #endif is placed too early. Link: https://lore.kernel.org/r/YE4snvoW1SuwcXAn@localhost.localdomain Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* scsi: st: Fix a use after free in st_open()Lv Yunlong2021-04-071-1/+1
| | | | | | | | | | | | | | [ Upstream commit c8c165dea4c8f5ad67b1240861e4f6c5395fa4ac ] In st_open(), if STp->in_use is true, STp will be freed by scsi_tape_put(). However, STp is still used by DEBC_printk() after. It is better to DEBC_printk() before scsi_tape_put(). Link: https://lore.kernel.org/r/20210311064636.10522-1-lyl2019@mail.ustc.edu.cn Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi> Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vhost: Fix vhost_vq_reset()Laurent Vivier2021-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit beb691e69f4dec7bfe8b81b509848acfd1f0dbf9 ] vhost_reset_is_le() is vhost_init_is_le(), and in the case of cross-endian legacy, vhost_init_is_le() depends on vq->user_be. vq->user_be is set by vhost_disable_cross_endian(). But in vhost_vq_reset(), we have: vhost_reset_is_le(vq); vhost_disable_cross_endian(vq); And so user_be is used before being set. To fix that, reverse the lines order as there is no other dependency between them. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Link: https://lore.kernel.org/r/20210312140913.788592-1-lvivier@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc: Force inlining of cpu_has_feature() to avoid build failureChristophe Leroy2021-04-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit eed5fae00593ab9d261a0c1ffc1bdb786a87a55a ] The code relies on constant folding of cpu_has_feature() based on possible and always true values as defined per CPU_FTRS_ALWAYS and CPU_FTRS_POSSIBLE. Build failure is encountered with for instance book3e_all_defconfig on kisskb in the AMDGPU driver which uses cpu_has_feature(CPU_FTR_VSX_COMP) to decide whether calling kernel_enable_vsx() or not. The failure is due to cpu_has_feature() not being inlined with that configuration with gcc 4.9. In the same way as commit acdad8fb4a15 ("powerpc: Force inlining of mmu_has_feature to fix build failure"), for inlining of cpu_has_feature(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b231dfa040ce4cc37f702f5c3a595fdeabfe0462.1615378209.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
* ASoC: sgtl5000: set DAP_AVC_CTRL register to correct default value on probeBenjamin Rood2021-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f86f58e3594fb0ab1993d833d3b9a2496f3c928c ] According to the SGTL5000 datasheet [1], the DAP_AVC_CTRL register has the following bit field definitions: | BITS | FIELD | RW | RESET | DEFINITION | | 15 | RSVD | RO | 0x0 | Reserved | | 14 | RSVD | RW | 0x1 | Reserved | | 13:12 | MAX_GAIN | RW | 0x1 | Max Gain of AVC in expander mode | | 11:10 | RSVD | RO | 0x0 | Reserved | | 9:8 | LBI_RESP | RW | 0x1 | Integrator Response | | 7:6 | RSVD | RO | 0x0 | Reserved | | 5 | HARD_LMT_EN | RW | 0x0 | Enable hard limiter mode | | 4:1 | RSVD | RO | 0x0 | Reserved | | 0 | EN | RW | 0x0 | Enable/Disable AVC | The original default value written to the DAP_AVC_CTRL register during sgtl5000_i2c_probe() was 0x0510. This would incorrectly write values to bits 4 and 10, which are defined as RESERVED. It would also not set bits 12 and 14 to their correct RESET values of 0x1, and instead set them to 0x0. While the DAP_AVC module is effectively disabled because the EN bit is 0, this default value is still writing invalid values to registers that are marked as read-only and RESERVED as well as not setting bits 12 and 14 to their correct default values as defined by the datasheet. The correct value that should be written to the DAP_AVC_CTRL register is 0x5100, which configures the register bits to the default values defined by the datasheet, and prevents any writes to bits defined as 'read-only'. Generally speaking, it is best practice to NOT attempt to write values to registers/bits defined as RESERVED, as it generally produces unwanted/undefined behavior, or errors. Also, all credit for this patch should go to my colleague Dan MacDonald <dmacdonald@curbellmedical.com> for finding this error in the first place. [1] https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf Signed-off-by: Benjamin Rood <benjaminjrood@gmail.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Link: https://lore.kernel.org/r/20210219183308.GA2117@ubuntu-dev Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ASoC: rt5651: Fix dac- and adc- vol-tlv values being off by a factor of 10Hans de Goede2021-04-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit eee51df776bd6cac10a76b2779a9fdee3f622b2b ] The adc_vol_tlv volume-control has a range from -17.625 dB to +30 dB, not -176.25 dB to + 300 dB. This wrong scale is esp. a problem in userspace apps which translate the dB scale to a linear scale. With the logarithmic dB scale being of by a factor of 10 we loose all precision in the lower area of the range when apps translate things to a linear scale. E.g. the 0 dB default, which corresponds with a value of 47 of the 0 - 127 range for the control, would be shown as 0/100 in alsa-mixer. Since the centi-dB values used in the TLV struct cannot represent the 0.375 dB step size used by these controls, change the TLV definition for them to specify a min and max value instead of min + stepsize. Note this mirrors commit 3f31f7d9b540 ("ASoC: rt5670: Fix dac- and adc- vol-tlv values being off by a factor of 10") which made the exact same change to the rt5670 codec driver. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210226143817.84287-3-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ASoC: rt5640: Fix dac- and adc- vol-tlv values being off by a factor of 10Hans de Goede2021-04-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cfa26ed1f9f885c2fd8f53ca492989d1e16d0199 ] The adc_vol_tlv volume-control has a range from -17.625 dB to +30 dB, not -176.25 dB to + 300 dB. This wrong scale is esp. a problem in userspace apps which translate the dB scale to a linear scale. With the logarithmic dB scale being of by a factor of 10 we loose all precision in the lower area of the range when apps translate things to a linear scale. E.g. the 0 dB default, which corresponds with a value of 47 of the 0 - 127 range for the control, would be shown as 0/100 in alsa-mixer. Since the centi-dB values used in the TLV struct cannot represent the 0.375 dB step size used by these controls, change the TLV definition for them to specify a min and max value instead of min + stepsize. Note this mirrors commit 3f31f7d9b540 ("ASoC: rt5670: Fix dac- and adc- vol-tlv values being off by a factor of 10") which made the exact same change to the rt5670 codec driver. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210226143817.84287-2-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* rpc: fix NULL dereference on kmalloc failureJ. Bruce Fields2021-04-071-4/+7
| | | | | | | | | | | | | | | | | | | [ Upstream commit 0ddc942394013f08992fc379ca04cffacbbe3dae ] I think this is unlikely but possible: svc_authenticate sets rq_authop and calls svcauth_gss_accept. The kmalloc(sizeof(*svcdata), GFP_KERNEL) fails, leaving rq_auth_data NULL, and returning SVC_DENIED. This causes svc_process_common to go to err_bad_auth, and eventually call svc_authorise. That calls ->release == svcauth_gss_release, which tries to dereference rq_auth_data. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Link: https://lore.kernel.org/linux-nfs/3F1B347F-B809-478F-A1E9-0BE98E22B0F0@oracle.com/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ext4: fix bh ref count on error pathsZhaolong Zhang2021-04-071-3/+3
| | | | | | | | | | | [ Upstream commit c915fb80eaa6194fa9bd0a4487705cd5b0dda2f1 ] __ext4_journalled_writepage should drop bhs' ref count on error paths Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> Link: https://lore.kernel.org/r/1614678151-70481-1-git-send-email-zhangzl2013@126.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ipv6: weaken the v4mapped source checkJakub Kicinski2021-04-073-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit dcc32f4f183ab8479041b23a1525d48233df1d43 ] This reverts commit 6af1799aaf3f1bc8defedddfa00df3192445bbf3. Commit 6af1799aaf3f ("ipv6: drop incoming packets having a v4mapped source address") introduced an input check against v4mapped addresses. Use of such addresses on the wire is indeed questionable and not allowed on public Internet. As the commit pointed out https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02 lists potential issues. Unfortunately there are applications which use v4mapped addresses, and breaking them is a clear regression. For example v4mapped addresses (or any semi-valid addresses, really) may be used for uni-direction event streams or packet export. Since the issue which sparked the addition of the check was with TCP and request_socks in particular push the check down to TCPv6 and DCCP. This restores the ability to receive UDPv6 packets with v4mapped address as the source. Keep using the IPSTATS_MIB_INHDRERRORS statistic to minimize the user-visible changes. Fixes: 6af1799aaf3f ("ipv6: drop incoming packets having a v4mapped source address") Reported-by: Sunyi Shao <sunyishao@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selinux: vsock: Set SID for socket returned by accept()David Brazdil2021-04-071-0/+1
| | | | | | | | | | | | | | | | | | [ Upstream commit 1f935e8e72ec28dddb2dc0650b3b6626a293d94b ] For AF_VSOCK, accept() currently returns sockets that are unlabelled. Other socket families derive the child's SID from the SID of the parent and the SID of the incoming packet. This is typically done as the connected socket is placed in the queue that accept() removes from. Reuse the existing 'security_sk_clone' hook to copy the SID from the parent (server) socket to the child. There is no packet SID in this case. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Linux 4.9.264v4.9.264Greg Kroah-Hartman2021-03-301-1/+1
| | | | | | | | | | | Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Jason Self <jason@bluehome.net> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/r/20210329075607.561619583@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xen-blkback: don't leak persistent grants from xen_blkbk_map()Jan Beulich2021-03-301-1/+1
| | | | | | | | | | | | | | | | | commit a846738f8c3788d846ed1f587270d2f2e3d32432 upstream. The fix for XSA-365 zapped too many of the ->persistent_gnt[] entries. Ones successfully obtained should not be overwritten, but instead left for xen_blkbk_unmap_prepare() to pick up and put. This is XSA-371. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mac80211: fix double free in ibss_leaveMarkus Theil2021-03-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3bd801b14e0c5d29eeddc7336558beb3344efaa3 upstream. Clear beacon ie pointer and ie length after free in order to prevent double free. ================================================================== BUG: KASAN: double-free or invalid-free \ in ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876 CPU: 0 PID: 8472 Comm: syz-executor100 Not tainted 5.11.0-rc6-syzkaller #0 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2c6 mm/kasan/report.c:230 kasan_report_invalid_free+0x51/0x80 mm/kasan/report.c:355 ____kasan_slab_free+0xcc/0xe0 mm/kasan/common.c:341 kasan_slab_free include/linux/kasan.h:192 [inline] __cache_free mm/slab.c:3424 [inline] kfree+0xed/0x270 mm/slab.c:3760 ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876 rdev_leave_ibss net/wireless/rdev-ops.h:545 [inline] __cfg80211_leave_ibss+0x19a/0x4c0 net/wireless/ibss.c:212 __cfg80211_leave+0x327/0x430 net/wireless/core.c:1172 cfg80211_leave net/wireless/core.c:1221 [inline] cfg80211_netdev_notifier_call+0x9e8/0x12c0 net/wireless/core.c:1335 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2040 call_netdevice_notifiers_extack net/core/dev.c:2052 [inline] call_netdevice_notifiers net/core/dev.c:2066 [inline] __dev_close_many+0xee/0x2e0 net/core/dev.c:1586 __dev_close net/core/dev.c:1624 [inline] __dev_change_flags+0x2cb/0x730 net/core/dev.c:8476 dev_change_flags+0x8a/0x160 net/core/dev.c:8549 dev_ifsioc+0x210/0xa70 net/core/dev_ioctl.c:265 dev_ioctl+0x1b1/0xc40 net/core/dev_ioctl.c:511 sock_do_ioctl+0x148/0x2d0 net/socket.c:1060 sock_ioctl+0x477/0x6a0 net/socket.c:1177 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+93976391bf299d425f44@syzkaller.appspotmail.com Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20210213133653.367130-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: qrtr: fix a kernel-infoleak in qrtr_recvmsg()Eric Dumazet2021-03-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 50535249f624d0072cd885bcdce4e4b6fb770160 upstream. struct sockaddr_qrtr has a 2-byte hole, and qrtr_recvmsg() currently does not clear it before copying kernel data to user space. It might be too late to name the hole since sockaddr_qrtr structure is uapi. BUG: KMSAN: kernel-infoleak in kmsan_copy_to_user+0x9c/0xb0 mm/kmsan/kmsan_hooks.c:249 CPU: 0 PID: 29705 Comm: syz-executor.3 Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:120 kmsan_report+0xfb/0x1e0 mm/kmsan/kmsan_report.c:118 kmsan_internal_check_memory+0x202/0x520 mm/kmsan/kmsan.c:402 kmsan_copy_to_user+0x9c/0xb0 mm/kmsan/kmsan_hooks.c:249 instrument_copy_to_user include/linux/instrumented.h:121 [inline] _copy_to_user+0x1ac/0x270 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:209 [inline] move_addr_to_user+0x3a2/0x640 net/socket.c:237 ____sys_recvmsg+0x696/0xd50 net/socket.c:2575 ___sys_recvmsg net/socket.c:2610 [inline] do_recvmmsg+0xa97/0x22d0 net/socket.c:2710 __sys_recvmmsg net/socket.c:2789 [inline] __do_sys_recvmmsg net/socket.c:2812 [inline] __se_sys_recvmmsg+0x24a/0x410 net/socket.c:2805 __x64_sys_recvmmsg+0x62/0x80 net/socket.c:2805 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465f69 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f43659d6188 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 0000000000465f69 RDX: 0000000000000008 RSI: 0000000020003e40 RDI: 0000000000000003 RBP: 00000000004bfa8f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000010060 R11: 0000000000000246 R12: 000000000056bf60 R13: 0000000000a9fb1f R14: 00007f43659d6300 R15: 0000000000022000 Local variable ----addr@____sys_recvmsg created at: ____sys_recvmsg+0x168/0xd50 net/socket.c:2550 ____sys_recvmsg+0x168/0xd50 net/socket.c:2550 Bytes 2-3 of 12 are uninitialized Memory access of size 12 starts at ffff88817c627b40 Data copied to user address 0000000020000140 Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Courtney Cavin <courtney.cavin@sonymobile.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: sched: validate stab valuesEric Dumazet2021-03-305-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e323d865b36134e8c5c82c834df89109a5c60dab upstream. iproute2 package is well behaved, but malicious user space can provide illegal shift values and trigger UBSAN reports. Add stab parameter to red_check_params() to validate user input. syzbot reported: UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18 shift exponent 111 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 14662 Comm: syz-executor.3 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 red_calc_qavg_from_idle_time include/net/red.h:312 [inline] red_calc_qavg include/net/red.h:353 [inline] choke_enqueue.cold+0x18/0x3dd net/sched/sch_choke.c:221 __dev_xmit_skb net/core/dev.c:3837 [inline] __dev_queue_xmit+0x1943/0x2e00 net/core/dev.c:4150 neigh_hh_output include/net/neighbour.h:499 [inline] neigh_output include/net/neighbour.h:508 [inline] ip6_finish_output2+0x911/0x1700 net/ipv6/ip6_output.c:117 __ip6_finish_output net/ipv6/ip6_output.c:182 [inline] __ip6_finish_output+0x4c1/0xe10 net/ipv6/ip6_output.c:161 ip6_finish_output+0x35/0x200 net/ipv6/ip6_output.c:192 NF_HOOK_COND include/linux/netfilter.h:290 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:215 dst_output include/net/dst.h:448 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip6_xmit+0x127e/0x1eb0 net/ipv6/ip6_output.c:320 inet6_csk_xmit+0x358/0x630 net/ipv6/inet6_connection_sock.c:135 dccp_transmit_skb+0x973/0x12c0 net/dccp/output.c:138 dccp_send_reset+0x21b/0x2b0 net/dccp/output.c:535 dccp_finish_passive_close net/dccp/proto.c:123 [inline] dccp_finish_passive_close+0xed/0x140 net/dccp/proto.c:118 dccp_terminate_connection net/dccp/proto.c:958 [inline] dccp_close+0xb3c/0xe60 net/dccp/proto.c:1028 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478 __sock_release+0xcd/0x280 net/socket.c:599 sock_close+0x18/0x20 net/socket.c:1258 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* can: dev: Move device back to init netns on owning netns deleteMartin Willi2021-03-303-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3a5ca857079ea022e0b1b17fc154f7ad7dbc150f upstream. When a non-initial netns is destroyed, the usual policy is to delete all virtual network interfaces contained, but move physical interfaces back to the initial netns. This keeps the physical interface visible on the system. CAN devices are somewhat special, as they define rtnl_link_ops even if they are physical devices. If a CAN interface is moved into a non-initial netns, destroying that netns lets the interface vanish instead of moving it back to the initial netns. default_device_exit() skips CAN interfaces due to having rtnl_link_ops set. Reproducer: ip netns add foo ip link set can0 netns foo ip netns delete foo WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60 CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1 Workqueue: netns cleanup_net [<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14) [<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8) [<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114) [<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac) [<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60) [<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380) [<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438) [<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8) [<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c) [<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c) To properly restore physical CAN devices to the initial netns on owning netns exit, introduce a flag on rtnl_link_ops that can be set by drivers. For CAN devices setting this flag, default_device_exit() considers them non-virtual, applying the usual namespace move. The issue was introduced in the commit mentioned below, as at that time CAN devices did not have a dellink() operation. Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.") Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* futex: Handle transient "ownerless" rtmutex state correctlyMike Galbraith2021-03-301-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9f5d1c336a10c0d24e83e40b4c1b9539f7dba627 upstream. Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner(). This is one possible chain of events leading to this: Task Prio Operation T1 120 lock(F) T2 120 lock(F) -> blocks (top waiter) T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter) XX timeout/ -> wakes T2 signal T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set) T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter and the lower priority T2 cannot steal the lock. -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON() The comment states that this is invalid and rt_mutex_real_owner() must return a non NULL owner when the trylock failed, but in case of a queued and woken up waiter rt_mutex_real_owner() == NULL is a valid transient state. The higher priority waiter has simply not yet managed to take over the rtmutex. The BUG_ON() is therefore wrong and this is just another retry condition in fixup_pi_state_owner(). Drop the locks, so that T3 can make progress, and then try the fixup again. Gratian provided a great analysis, traces and a reproducer. The analysis is to the point, but it confused the hell out of that tglx dude who had to page in all the futex horrors again. Condensed version is above. [ tglx: Wrote comment and changelog ] Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Reported-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* futex: Fix incorrect should_fail_futex() handlingMateusz Nosek2021-03-301-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | commit 921c7ebd1337d1a46783d7e15a850e12aed2eaa0 upstream. If should_futex_fail() returns true in futex_wake_pi(), then the 'ret' variable is set to -EFAULT and then immediately overwritten. So the failure injection is non-functional. Fix it by actually leaving the function and returning -EFAULT. The Fixes tag is kinda blury because the initial commit which introduced failure injection was already sloppy, but the below mentioned commit broke it completely. [ tglx: Massaged changelog ] Fixes: 6b4f4bc9cb22 ("locking/futex: Allow low-level atomic operations to return -EAGAIN") Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200927000858.24219-1-mateusznosek0@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* futex: Prevent robust futex exit raceYang Tao2021-03-301-7/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ca16d5bee59807bf04deaab0a8eccecd5061528c upstream. Robust futexes utilize the robust_list mechanism to allow the kernel to release futexes which are held when a task exits. The exit can be voluntary or caused by a signal or fault. This prevents that waiters block forever. The futex operations in user space store a pointer to the futex they are either locking or unlocking in the op_pending member of the per task robust list. After a lock operation has succeeded the futex is queued in the robust list linked list and the op_pending pointer is cleared. After an unlock operation has succeeded the futex is removed from the robust list linked list and the op_pending pointer is cleared. The robust list exit code checks for the pending operation and any futex which is queued in the linked list. It carefully checks whether the futex value is the TID of the exiting task. If so, it sets the OWNER_DIED bit and tries to wake up a potential waiter. This is race free for the lock operation but unlock has two race scenarios where waiters might not be woken up. These issues can be observed with regular robust pthread mutexes. PI aware pthread mutexes are not affected. (1) Unlocking task is killed after unlocking the futex value in user space before being able to wake a waiter. pthread_mutex_unlock() | V atomic_exchange_rel (&mutex->__data.__lock, 0) <------------------------killed lll_futex_wake () | | |(__lock = 0) |(enter kernel) | V do_exit() exit_mm() mm_release() exit_robust_list() handle_futex_death() | |(__lock = 0) |(uval = 0) | V if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr)) return 0; The sanity check which ensures that the user space futex is owned by the exiting task prevents the wakeup of waiters which in consequence block infinitely. (2) Waiting task is killed after a wakeup and before it can acquire the futex in user space. OWNER WAITER futex_wait() pthread_mutex_unlock() | | | |(__lock = 0) | | | V | futex_wake() ------------> wakeup() | |(return to userspace) |(__lock = 0) | V oldval = mutex->__data.__lock <-----------------killed atomic_compare_and_exchange_val_acq (&mutex->__data.__lock, | id | assume_other_futex_waiters, 0) | | | (enter kernel)| | V do_exit() | | V handle_futex_death() | |(__lock = 0) |(uval = 0) | V if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr)) return 0; The sanity check which ensures that the user space futex is owned by the exiting task prevents the wakeup of waiters, which seems to be correct as the exiting task does not own the futex value, but the consequence is that other waiters wont be woken up and block infinitely. In both scenarios the following conditions are true: - task->robust_list->list_op_pending != NULL - user space futex value == 0 - Regular futex (not PI) If these conditions are met then it is reasonably safe to wake up a potential waiter in order to prevent the above problems. As this might be a false positive it can cause spurious wakeups, but the waiter side has to handle other types of unrelated wakeups, e.g. signals gracefully anyway. So such a spurious wakeup will not affect the correctness of these operations. This workaround must not touch the user space futex value and cannot set the OWNER_DIED bit because the lock value is 0, i.e. uncontended. Setting OWNER_DIED in this case would result in inconsistent state and subsequently in malfunction of the owner died handling in user space. The rest of the user space state is still consistent as no other task can observe the list_op_pending entry in the exiting tasks robust list. The eventually woken up waiter will observe the uncontended lock value and take it over. [ tglx: Massaged changelog and comment. Made the return explicit and not depend on the subsequent check and added constants to hand into handle_futex_death() instead of plain numbers. Fixed a few coding style issues. ] Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core") Signed-off-by: Yang Tao <yang.tao172@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1573010582-35297-1-git-send-email-wang.yi59@zte.com.cn Link: https://lkml.kernel.org/r/20191106224555.943191378@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: futex: Bound number of LDXR/STXR loops in FUTEX_WAKE_OPWill Deacon2021-03-301-22/+37
| | | | | | | | | | | | | | | | | | | commit 03110a5cb2161690ae5ac04994d47ed0cd6cef75 upstream. Our futex implementation makes use of LDXR/STXR loops to perform atomic updates to user memory from atomic context. This can lead to latency problems if we end up spinning around the LL/SC sequence at the expense of doing something useful. Rework our futex atomic operations so that we return -EAGAIN if we fail to update the futex word after 128 attempts. The core futex code will reschedule if necessary and we'll try again later. Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon <will.deacon@arm.com> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* locking/futex: Allow low-level atomic operations to return -EAGAINWill Deacon2021-03-301-70/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6b4f4bc9cb22875f97023984a625386f0c7cc1c0 upstream. Some futex() operations, including FUTEX_WAKE_OP, require the kernel to perform an atomic read-modify-write of the futex word via the userspace mapping. These operations are implemented by each architecture in arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which are called in atomic context with the relevant hash bucket locks held. Although these routines may return -EFAULT in response to a page fault generated when accessing userspace, they are expected to succeed (i.e. return 0) in all other cases. This poses a problem for architectures that do not provide bounded forward progress guarantees or fairness of contended atomic operations and can lead to starvation in some cases. In these problematic scenarios, we must return back to the core futex code so that we can drop the hash bucket locks and reschedule if necessary, much like we do in the case of a page fault. Allow architectures to return -EAGAIN from their implementations of arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which will cause the core futex code to reschedule if necessary and return back to the architecture code later on. Cc: <stable@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* futex: Fix (possible) missed wakeupPeter Zijlstra2021-03-301-5/+8
| | | | | | | | | | | | | | | | | | | | | | | commit b061c38bef43406df8e73c5be06cbfacad5ee6ad upstream. We must not rely on wake_q_add() to delay the wakeup; in particular commit: 1d0dcb3ad9d3 ("futex: Implement lockless wakeups") moved wake_q_add() before smp_store_release(&q->lock_ptr, NULL), which could result in futex_wait() waking before observing ->lock_ptr == NULL and going back to sleep again. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 1d0dcb3ad9d3 ("futex: Implement lockless wakeups") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* futex: Handle early deadlock return correctlyThomas Gleixner2021-03-302-15/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1a1fb985f2e2b85ec0d3dc2e519ee48389ec2434 upstream. commit 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex") changed the locking rules in the futex code so that the hash bucket lock is not longer held while the waiter is enqueued into the rtmutex wait list. This made the lock and the unlock path symmetric, but unfortunately the possible early exit from __rt_mutex_proxy_start() due to a detected deadlock was not updated accordingly. That allows a concurrent unlocker to observe inconsitent state which triggers the warning in the unlock path. futex_lock_pi() futex_unlock_pi() lock(hb->lock) queue(hb_waiter) lock(hb->lock) lock(rtmutex->wait_lock) unlock(hb->lock) // acquired hb->lock hb_waiter = futex_top_waiter() lock(rtmutex->wait_lock) __rt_mutex_proxy_start() ---> fail remove(rtmutex_waiter); ---> returns -EDEADLOCK unlock(rtmutex->wait_lock) // acquired wait_lock wake_futex_pi() rt_mutex_next_owner() --> returns NULL --> WARN lock(hb->lock) unqueue(hb_waiter) The problem is caused by the remove(rtmutex_waiter) in the failure case of __rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the hash bucket but no waiter on the rtmutex, i.e. inconsistent state. The original commit handles this correctly for the other early return cases (timeout, signal) by delaying the removal of the rtmutex waiter until the returning task reacquired the hash bucket lock. Treat the failure case of __rt_mutex_proxy_start() in the same way and let the existing cleanup code handle the eventual handover of the rtmutex gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter removal for the failure case, so that the other callsites are still operating correctly. Add proper comments to the code so all these details are fully documented. Thanks to Peter for helping with the analysis and writing the really valuable code comments. Fixes: 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Stefan Liebler <stli@linux.ibm.com> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()Peter Zijlstra2021-03-301-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 04dc1b2fff4e96cb4142227fbdc63c8871ad4ed9 upstream. Markus reported that the glibc/nptl/tst-robustpi8 test was failing after commit: cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()") The following trace shows the problem: ld-linux-x86-64-2161 [019] .... 410.760971: SyS_futex: 00007ffbeb76b028: 80000875 op=FUTEX_LOCK_PI ld-linux-x86-64-2161 [019] ...1 410.760972: lock_pi_update_atomic: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000875 ret=0 ld-linux-x86-64-2165 [011] .... 410.760978: SyS_futex: 00007ffbeb76b028: 80000875 op=FUTEX_UNLOCK_PI ld-linux-x86-64-2165 [011] d..1 410.760979: do_futex: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000871 ret=0 ld-linux-x86-64-2165 [011] .... 410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=0000 ld-linux-x86-64-2161 [019] .... 410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=ETIMEDOUT Task 2165 does an UNLOCK_PI, assigning the lock to the waiter task 2161 which then returns with -ETIMEDOUT. That wrecks the lock state, because now the owner isn't aware it acquired the lock and removes the pending robust list entry. If 2161 is killed, the robust list will not clear out this futex and the subsequent acquire on this futex will then (correctly) result in -ESRCH which is unexpected by glibc, triggers an internal assertion and dies. Task 2161 Task 2165 rt_mutex_wait_proxy_lock() timeout(); /* T2161 is still queued in the waiter list */ return -ETIMEDOUT; futex_unlock_pi() spin_lock(hb->lock); rtmutex_unlock() remove_rtmutex_waiter(T2161); mark_lock_available(); /* Make the next waiter owner of the user space side */ futex_uval = 2161; spin_unlock(hb->lock); spin_lock(hb->lock); rt_mutex_cleanup_proxy_lock() if (rtmutex_owner() !== current) ... return FAIL; .... return -ETIMEOUT; This means that rt_mutex_cleanup_proxy_lock() needs to call try_to_take_rt_mutex() so it can take over the rtmutex correctly which was assigned by the waker. If the rtmutex is owned by some other task then this call is harmless and just confirmes that the waiter is not able to acquire it. While there, fix what looks like a merge error which resulted in rt_mutex_cleanup_proxy_lock() having two calls to fixup_rt_mutex_waiters() and rt_mutex_wait_proxy_lock() not having any. Both should have one, since both potentially touch the waiter list. Fixes: 38d589f2fd08 ("futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()") Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Bug-Spotted-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Link: http://lkml.kernel.org/r/20170519154850.mlomgdsd26drq5j6@hirez.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>