summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Linux 5.4.222v5.4.222Greg Kroah-Hartman2022-11-011-1/+1
| | | | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* once: fix section mismatch on clang buildsGreg Kroah-Hartman2022-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | On older kernels (5.4 and older), building the kernel with clang can cause the section name to end up with "" in them, which can cause lots of runtime issues as that is not normally a valid portion of the string. This was fixed up in newer kernels with commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") but that's too heavy-handed for older kernels. So for now, fix up the problem that commit 62c07983bef9 ("once: add DO_ONCE_SLOW() for sleepable contexts") caused by being backported by removing the "" characters from the section definition. Reported-by: Oleksandr Tymoshenko <ovt@google.com> Reported-by: Yongqin Liu <yongqin.liu@linaro.org> Tested-by: Yongqin Liu <yongqin.liu@linaro.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/r/20221029011211.4049810-1-ovt@google.com Link: https://lore.kernel.org/r/CAMSo37XApZ_F5nSQYWFsSqKdMv_gBpfdKG3KN1TDB+QNXqSh0A@mail.gmail.com Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David S. Miller <davem@davemloft.net> Cc: Sasha Levin <sashal@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Linux 5.4.221v5.4.221Greg Kroah-Hartman2022-10-291-1/+1
| | | | | | | | | | Link: https://lore.kernel.org/r/20221027165049.817124510@linuxfoundation.org Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: /proc/pid/smaps_rollup: fix no vma's null-derefSeth Jenkins2022-10-291-1/+1
| | | | | | | | | | | | Commit 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") introduced a null-deref if there are no vma's in the task in show_smaps_rollup. Fixes: 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") Signed-off-by: Seth Jenkins <sethjenkins@google.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Tested-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hv_netvsc: Fix race between VF offering and VF association message from hostGaurav Kohli2022-10-293-0/+27
| | | | | | | | | | | | | | | | | commit 365e1ececb2905f94cc10a5817c5b644a32a3ae2 upstream. During vm boot, there might be possibility that vf registration call comes before the vf association from host to vm. And this might break netvsc vf path, To prevent the same block vf registration until vf bind message comes from host. Cc: stable@vger.kernel.org Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Makefile.debug: re-enable debug info for .S filesNick Desaulniers2022-10-291-1/+3
| | | | | | | | | | | | | | | | | | | | This is _not_ an upstream commit and just for 5.4.y only. It is based on commit 32ef9e5054ec0321b9336058c58ec749e9c6b0fe upstream. Alexey reported that the fraction of unknown filename instances in kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down to assembler defined symbols, which regressed as a result of: commit b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1") In that commit, I allude to restoring debug info for assembler defined symbols in a follow up patch, but it seems I forgot to do so in commit a66049e2cf0e ("Kbuild: make DWARF version a choice") Fixes: b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1") Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ACPI: video: Force backlight native for more TongFang devicesWerner Sembach2022-10-291-0/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3dbc80a3e4c55c4a5b89ef207bed7b7de36157b4 upstream. This commit is very different from the upstream commit! It fixes the same issue by adding more quirks, rather then the general fix from the 6.1 kernel, because the general fix from the 6.1 kernel is part of a larger refactoring of the backlight code which is not suitable for the stable series. As described in "ACPI: video: Drop NL5x?U, PF4NU1F and PF5?U?? acpi_backlight=native quirks" (10212754a0d2) the upstream commit "ACPI: video: Make backlight class device registration a separate step (v2)" (3dbc80a3e4c5) makes these quirks unnecessary. However as mentioned in this bugtracker ticket https://bugzilla.kernel.org/show_bug.cgi?id=215683#c17 the upstream fix is part of a larger patchset that is overall too complex for stable. The TongFang GKxNRxx, GMxNGxx, GMxZGxx, and GMxRGxx / TUXEDO Stellaris/Polaris Gen 1-4, have the same problem as the Clevo NL5xRU and NL5xNU / TUXEDO Aura 15 Gen1 and Gen2: They have a working native and video interface for screen backlight. However the default detection mechanism first registers the video interface before unregistering it again and switching to the native interface during boot. This results in a dangling SBIOS request for backlight change for some reason, causing the backlight to switch to ~2% once per boot on the first power cord connect or disconnect event. Setting the native interface explicitly circumvents this buggy behaviour by avoiding the unregistering process. Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* riscv: topology: fix default topology reportingConor Dooley2022-10-292-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fbd92809997a391f28075f1c8b5ee314c225557c upstream. RISC-V has no sane defaults to fall back on where there is no cpu-map in the devicetree. Without sane defaults, the package, core and thread IDs are all set to -1. This causes user-visible inaccuracies for tools like hwloc/lstopo which rely on the sysfs cpu topology files to detect a system's topology. On a PolarFire SoC, which should have 4 harts with a thread each, lstopo currently reports: Machine (793MB total) Package L#0 NUMANode L#0 (P#0 793MB) Core L#0 L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0) L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1) L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2) L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3) Adding calls to store_cpu_topology() in {boot,smp} hart bringup code results in the correct topolgy being reported: Machine (793MB total) Package L#0 NUMANode L#0 (P#0 793MB) L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3) CC: stable@vger.kernel.org # 456797da792f: arm64: topology: move store_cpu_topology() to shared code Fixes: 03f11f03dbfe ("RISC-V: Parse cpu topology during boot.") Reported-by: Brice Goglin <Brice.Goglin@inria.fr> Link: https://github.com/open-mpi/hwloc/issues/536 Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: topology: move store_cpu_topology() to shared codeConor Dooley2022-10-292-40/+19
| | | | | | | | | | | | | | | | | | | | | | commit 456797da792fa7cbf6698febf275fe9b36691f78 upstream. arm64's method of defining a default cpu topology requires only minimal changes to apply to RISC-V also. The current arm64 implementation exits early in a uniprocessor configuration by reading MPIDR & claiming that uniprocessor can rely on the default values. This is appears to be a hangover from prior to '3102bc0e6ac7 ("arm64: topology: Stop using MPIDR for topology information")', because the current code just assigns default values for multiprocessor systems. With the MPIDR references removed, store_cpu_topolgy() can be moved to the common arch_topology code. Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iommu/vt-d: Clean up si_domain in the init_dmars() error pathJerry Snitselaar2022-10-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 620bf9f981365c18cc2766c53d92bf8131c63f32 ] A splat from kmem_cache_destroy() was seen with a kernel prior to commit ee2653bbe89d ("iommu/vt-d: Remove domain and devinfo mempool") when there was a failure in init_dmars(), because the iommu_domain cache still had objects. While the mempool code is now gone, there still is a leak of the si_domain memory if init_dmars() fails. So clean up si_domain in the init_dmars() error path. Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Fixes: 86080ccc223a ("iommu/vt-d: Allocate si_domain in init_dmars()") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: hns: fix possible memory leak in hnae_ae_register()Yang Yingliang2022-10-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ff2f5ec5d009844ec28f171123f9e58750cef4bf ] Inject fault while probing module, if device_register() fails, but the refcount of kobject is not decreased to 0, the name allocated in dev_set_name() is leaked. Fix this by calling put_device(), so that name can be freed in callback function kobject_cleanup(). unreferenced object 0xffff00c01aba2100 (size 128): comm "systemd-udevd", pid 1259, jiffies 4294903284 (age 294.152s) hex dump (first 32 bytes): 68 6e 61 65 30 00 00 00 18 21 ba 1a c0 00 ff ff hnae0....!...... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000034783f26>] slab_post_alloc_hook+0xa0/0x3e0 [<00000000748188f2>] __kmem_cache_alloc_node+0x164/0x2b0 [<00000000ab0743e8>] __kmalloc_node_track_caller+0x6c/0x390 [<000000006c0ffb13>] kvasprintf+0x8c/0x118 [<00000000fa27bfe1>] kvasprintf_const+0x60/0xc8 [<0000000083e10ed7>] kobject_set_name_vargs+0x3c/0xc0 [<000000000b87affc>] dev_set_name+0x7c/0xa0 [<000000003fd8fe26>] hnae_ae_register+0xcc/0x190 [hnae] [<00000000fe97edc9>] hns_dsaf_ae_init+0x9c/0x108 [hns_dsaf] [<00000000c36ff1eb>] hns_dsaf_probe+0x548/0x748 [hns_dsaf] Fixes: 6fe6611ff275 ("net: add Hisilicon Network Subsystem hnae framework support") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221018122451.1749171-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: sched: cake: fix null pointer access issue when cake_init() failsZhengchao Shao2022-10-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 51f9a8921ceacd7bf0d3f47fa867a64988ba1dcb ] When the default qdisc is cake, if the qdisc of dev_queue fails to be inited during mqprio_init(), cake_reset() is invoked to clear resources. In this case, the tins is NULL, and it will cause gpf issue. The process is as follows: qdisc_create_dflt() cake_init() q->tins = kvcalloc(...) --->failed, q->tins is NULL ... qdisc_put() ... cake_reset() ... cake_dequeue_one() b = &q->tins[...] --->q->tins is NULL The following is the Call Trace information: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:cake_dequeue_one+0xc9/0x3c0 Call Trace: <TASK> cake_reset+0xb1/0x140 qdisc_reset+0xed/0x6f0 qdisc_destroy+0x82/0x4c0 qdisc_put+0x9e/0xb0 qdisc_create_dflt+0x2c3/0x4a0 mqprio_init+0xa71/0x1760 qdisc_create+0x3eb/0x1000 tc_modify_qdisc+0x408/0x1720 rtnetlink_rcv_msg+0x38e/0xac0 netlink_rcv_skb+0x12d/0x3a0 netlink_unicast+0x4a2/0x740 netlink_sendmsg+0x826/0xcc0 sock_sendmsg+0xc5/0x100 ____sys_sendmsg+0x583/0x690 ___sys_sendmsg+0xe8/0x160 __sys_sendmsg+0xbf/0x160 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f89e5122d04 </TASK> Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: phy: dp83867: Extend RX strap quirk for SGMII modeHarini Katakam2022-10-291-0/+8
| | | | | | | | | | | | | | [ Upstream commit 0c9efbd5c50c64ead434960a404c9c9a097b0403 ] When RX strap in HW is not set to MODE 3 or 4, bit 7 and 8 in CF4 register should be set. The former is already handled in dp83867_config_init; add the latter in SGMII specific initialization. Fixes: 2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy") Signed-off-by: Harini Katakam <harini.katakam@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net/atm: fix proc_mpc_write incorrect return valueXiaobo Liu2022-10-291-1/+2
| | | | | | | | | | | | [ Upstream commit d8bde3bf7f82dac5fc68a62c2816793a12cafa2a ] Then the input contains '\0' or '\n', proc_mpc_write has read them, so the return value needs +1. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xiaobo Liu <cppcoffee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* HID: magicmouse: Do not set BTN_MOUSE on double reportJosé Expósito2022-10-291-1/+1
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit bb5f0c855dcfc893ae5ed90e4c646bde9e4498bf ] Under certain conditions the Magic Trackpad can group 2 reports in a single packet. The packet is split and the raw event function is invoked recursively for each part. However, after processing each part, the BTN_MOUSE status is updated, sending multiple click events. [1] Return after processing double reports to avoid this issue. Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/811 # [1] Fixes: a462230e16ac ("HID: magicmouse: enable Magic Trackpad support") Reported-by: Nulo <git@nulo.in> Signed-off-by: José Expósito <jose.exposito89@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20221009182747.90730-1-jose.exposito89@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* tipc: fix an information leak in tipc_topsrv_kern_subscrAlexander Potapenko2022-10-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 777ecaabd614d47c482a5c9031579e66da13989a ] Use a 8-byte write to initialize sub.usr_handle in tipc_topsrv_kern_subscr(), otherwise four bytes remain uninitialized when issuing setsockopt(..., SOL_TIPC, ...). This resulted in an infoleak reported by KMSAN when the packet was received: ===================================================== BUG: KMSAN: kernel-infoleak in copyout+0xbc/0x100 lib/iov_iter.c:169 instrument_copy_to_user ./include/linux/instrumented.h:121 copyout+0xbc/0x100 lib/iov_iter.c:169 _copy_to_iter+0x5c0/0x20a0 lib/iov_iter.c:527 copy_to_iter ./include/linux/uio.h:176 simple_copy_to_iter+0x64/0xa0 net/core/datagram.c:513 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:419 skb_copy_datagram_iter+0x58/0x200 net/core/datagram.c:527 skb_copy_datagram_msg ./include/linux/skbuff.h:3903 packet_recvmsg+0x521/0x1e70 net/packet/af_packet.c:3469 ____sys_recvmsg+0x2c4/0x810 net/socket.c:? ___sys_recvmsg+0x217/0x840 net/socket.c:2743 __sys_recvmsg net/socket.c:2773 __do_sys_recvmsg net/socket.c:2783 __se_sys_recvmsg net/socket.c:2780 __x64_sys_recvmsg+0x364/0x540 net/socket.c:2780 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120 ... Uninit was stored to memory at: tipc_sub_subscribe+0x42d/0xb50 net/tipc/subscr.c:156 tipc_conn_rcv_sub+0x246/0x620 net/tipc/topsrv.c:375 tipc_topsrv_kern_subscr+0x2e8/0x400 net/tipc/topsrv.c:579 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190 tipc_sk_join+0x2a8/0x770 net/tipc/socket.c:3084 tipc_setsockopt+0xae5/0xe40 net/tipc/socket.c:3201 __sys_setsockopt+0x87f/0xdc0 net/socket.c:2252 __do_sys_setsockopt net/socket.c:2263 __se_sys_setsockopt net/socket.c:2260 __x64_sys_setsockopt+0xe0/0x160 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120 Local variable sub created at: tipc_topsrv_kern_subscr+0x57/0x400 net/tipc/topsrv.c:562 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190 Bytes 84-87 of 88 are uninitialized Memory access of size 88 starts at ffff88801ed57cd0 Data copied to user address 0000000020000400 ... ===================================================== Signed-off-by: Alexander Potapenko <glider@google.com> Fixes: 026321c6d056a5 ("tipc: rename tipc_server to tipc_topsrv") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tipc: Fix recognition of trial periodMark Tomlinson2022-10-291-1/+1
| | | | | | | | | | | | | | [ Upstream commit 28be7ca4fcfd69a2d52aaa331adbf9dbe91f9e6e ] The trial period exists until jiffies is after addr_trial_end. But as jiffies will eventually overflow, just using time_after will eventually give incorrect results. As the node address is set once the trial period ends, this can be used to know that we are not in the trial period. Fixes: e415577f57f4 ("tipc: correct discovery message handling during address trial period") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ACPI: extlog: Handle multiple recordsTony Luck2022-10-291-13/+20
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f6ec01da40e4139b41179f046044ee7c4f6370dc ] If there is no user space consumer of extlog_mem trace records, then Linux properly handles multiple error records in an ELOG block extlog_print() print_extlog_rcd() __print_extlog_rcd() cper_estatus_print() apei_estatus_for_each_section() But the other code path hard codes looking for a single record to output a trace record. Fix by using the same apei_estatus_for_each_section() iterator to step over all records. Fixes: 2dfb7d51a61d ("trace, RAS: Add eMCA trace event interface") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* btrfs: fix processing of delayed tree block refs during backref walkingFilipe Manana2022-10-291-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 943553ef9b51db303ab2b955c1025261abfdf6fb ] During backref walking, when processing a delayed reference with a type of BTRFS_TREE_BLOCK_REF_KEY, we have two bugs there: 1) We are accessing the delayed references extent_op, and its key, without the protection of the delayed ref head's lock; 2) If there's no extent op for the delayed ref head, we end up with an uninitialized key in the stack, variable 'tmp_op_key', and then pass it to add_indirect_ref(), which adds the reference to the indirect refs rb tree. This is wrong, because indirect references should have a NULL key when we don't have access to the key, and in that case they should be added to the indirect_missing_keys rb tree and not to the indirect rb tree. This means that if have BTRFS_TREE_BLOCK_REF_KEY delayed ref resulting from freeing an extent buffer, therefore with a count of -1, it will not cancel out the corresponding reference we have in the extent tree (with a count of 1), since both references end up in different rb trees. When using fiemap, where we often need to check if extents are shared through shared subtrees resulting from snapshots, it means we can incorrectly report an extent as shared when it's no longer shared. However this is temporary because after the transaction is committed the extent is no longer reported as shared, as running the delayed reference results in deleting the tree block reference from the extent tree. Outside the fiemap context, the result is unpredictable, as the key was not initialized but it's used when navigating the rb trees to insert and search for references (prelim_ref_compare()), and we expect all references in the indirect rb tree to have valid keys. The following reproducer triggers the second bug: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount -o compress $DEV $MNT # With a compressed 128M file we get a tree height of 2 (level 1 root). xfs_io -f -c "pwrite -b 1M 0 128M" $MNT/foo btrfs subvolume snapshot $MNT $MNT/snap # Fiemap should output 0x2008 in the flags column. # 0x2000 means shared extent # 0x8 means encoded extent (because it's compressed) echo echo "fiemap after snapshot, range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo # Overwrite one extent and fsync to flush delalloc and COW a new path # in the snapshot's tree. # # After this we have a BTRFS_DROP_DELAYED_REF delayed ref of type # BTRFS_TREE_BLOCK_REF_KEY with a count of -1 for every COWed extent # buffer in the path. # # In the extent tree we have inline references of type # BTRFS_TREE_BLOCK_REF_KEY, with a count of 1, for the same extent # buffers, so they should cancel each other, and the extent buffers in # the fs tree should no longer be considered as shared. # echo "Overwriting file range [120M, 120M + 128K)..." xfs_io -c "pwrite -b 128K 120M 128K" $MNT/snap/foo xfs_io -c "fsync" $MNT/snap/foo # Fiemap should output 0x8 in the flags column. The extent in the range # [120M, 120M + 128K) is no longer shared, it's now exclusive to the fs # tree. echo echo "fiemap after overwrite range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo umount $MNT Running it before this patch: $ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1152 sec (1.085 GiB/sec and 1110.5809 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap' fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (683.060 MiB/sec and 5464.4809 ops/sec) fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 The extent in the range [120M, 120M + 128K) is still reported as shared (0x2000 bit set) after overwriting that range and flushing delalloc, which is not correct - an entire path was COWed in the snapshot's tree and the extent is now only referenced by the original fs tree. Running it after this patch: $ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1198 sec (1.043 GiB/sec and 1068.2067 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap' fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008 Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (694.444 MiB/sec and 5555.5556 ops/sec) fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x8 Now the extent is not reported as shared anymore. So fix this by passing a NULL key pointer to add_indirect_ref() when processing a delayed reference for a tree block if there's no extent op for our delayed ref head with a defined key. Also access the extent op only after locking the delayed ref head's lock. The reproducer will be converted later to a test case for fstests. Fixes: 86d5f994425252 ("btrfs: convert prelimary reference tracking to use rbtrees") Fixes: a6dbceafb915e8 ("btrfs: Remove unused op_key var from add_delayed_refs") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* btrfs: fix processing of delayed data refs during backref walkingFilipe Manana2022-10-291-9/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4fc7b57228243d09c0d878873bf24fa64a90fa01 ] When processing delayed data references during backref walking and we are using a share context (we are being called through fiemap), whenever we find a delayed data reference for an inode different from the one we are interested in, then we immediately exit and consider the data extent as shared. This is wrong, because: 1) This might be a DROP reference that will cancel out a reference in the extent tree; 2) Even if it's an ADD reference, it may be followed by a DROP reference that cancels it out. In either case we should not exit immediately. Fix this by never exiting when we find a delayed data reference for another inode - instead add the reference and if it does not cancel out other delayed reference, we will exit early when we call extent_is_shared() after processing all delayed references. If we find a drop reference, then signal the code that processes references from the extent tree (add_inline_refs() and add_keyed_refs()) to not exit immediately if it finds there a reference for another inode, since we have delayed drop references that may cancel it out. In this later case we exit once we don't have references in the rb trees that cancel out each other and have two references for different inodes. Example reproducer for case 1): $ cat test-1.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount $DEV $MNT xfs_io -f -c "pwrite 0 64K" $MNT/foo cp --reflink=always $MNT/foo $MNT/bar echo echo "fiemap after cloning:" xfs_io -c "fiemap -v" $MNT/foo rm -f $MNT/bar echo echo "fiemap after removing file bar:" xfs_io -c "fiemap -v" $MNT/foo umount $MNT Running it before this patch, the extent is still listed as shared, it has the flag 0x2000 (FIEMAP_EXTENT_SHARED) set: $ ./test-1.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 Example reproducer for case 2): $ cat test-2.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount $DEV $MNT xfs_io -f -c "pwrite 0 64K" $MNT/foo cp --reflink=always $MNT/foo $MNT/bar # Flush delayed references to the extent tree and commit current # transaction. sync echo echo "fiemap after cloning:" xfs_io -c "fiemap -v" $MNT/foo rm -f $MNT/bar echo echo "fiemap after removing file bar:" xfs_io -c "fiemap -v" $MNT/foo umount $MNT Running it before this patch, the extent is still listed as shared, it has the flag 0x2000 (FIEMAP_EXTENT_SHARED) set: $ ./test-2.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 After this patch, after deleting bar in both tests, the extent is not reported with the 0x2000 flag anymore, it gets only the flag 0x1 (which is FIEMAP_EXTENT_LAST): $ ./test-1.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x1 $ ./test-2.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001 fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x1 These tests will later be converted to a test case for fstests. Fixes: dc046b10c8b7d4 ("Btrfs: make fiemap not blow when you have lots of snapshots") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* r8152: add PID for the Lenovo OneLink+ DockJean-Francois Le Fillatre2022-10-292-0/+8
| | | | | | | | | | | | | | | | | | | | | commit 1bd3a383075c64d638e65d263c9267b08ee7733c upstream. The Lenovo OneLink+ Dock contains an RTL8153 controller that behaves as a broken CDC device by default. Add the custom Lenovo PID to the r8152 driver to support it properly. Also, systems compatible with this dock provide a BIOS option to enable MAC address passthrough (as per Lenovo document "ThinkPad Docking Solutions 2017"). Add the custom PID to the MAC passthrough list too. Tested on a ThinkPad 13 1st gen with the expected results: passthrough disabled: Invalid header when reading pass-thru MAC addr passthrough enabled: Using pass-thru MAC addr XX:XX:XX:XX:XX:XX Signed-off-by: Jean-Francois Le Fillatre <jflf_kernel@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: errata: Remove AES hwcap for COMPAT tasksJames Morse2022-10-295-2/+50
| | | | | | | | | | | | | | | | | | | | | | | | commit 44b3834b2eed595af07021b1c64e6f9bc396398b upstream. Cortex-A57 and Cortex-A72 have an erratum where an interrupt that occurs between a pair of AES instructions in aarch32 mode may corrupt the ELR. The task will subsequently produce the wrong AES result. The AES instructions are part of the cryptographic extensions, which are optional. User-space software will detect the support for these instructions from the hwcaps. If the platform doesn't support these instructions a software implementation should be used. Remove the hwcap bits on affected parts to indicate user-space should not use the AES instructions. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20220714161523.279570-3-james.morse@arm.com Signed-off-by: Will Deacon <will@kernel.org> [florian: resolved conflicts in arch/arm64/tools/cpucaps and cpu_errata.c] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* media: venus: dec: Handle the case where find_format failsBryan O'Donoghue2022-10-291-0/+2
| | | | | | | | | | | | | | | | | commit 06a2da340f762addc5935bf851d95b14d4692db2 upstream. Debugging the decoder on msm8916 I noticed the vdec probe was crashing if the fmt pointer was NULL. A similar fix from Colin Ian King found by Coverity was implemented for the encoder. Implement the same fix on the decoder. Fixes: 7472c1c69138 ("[media] media: venus: vdec: add video decoder files") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: arm64: vgic: Fix exit condition in scan_its_table()Eric Ren2022-10-291-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c000a2607145d28b06c697f968491372ea56c23a upstream. With some PCIe topologies, restoring a guest fails while parsing the ITS device tables. Reproducer hints: 1. Create ARM virt VM with pxb-pcie bus which adds extra host bridges, with qemu command like: ``` -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.x \ ... -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.y \ ... ``` 2. Ensure the guest uses 2-level device table 3. Perform VM migration which calls save/restore device tables In that setup, we get a big "offset" between 2 device_ids, which makes unsigned "len" round up a big positive number, causing the scan loop to continue with a bad GPA. For example: 1. L1 table has 2 entries; 2. and we are now scanning at L2 table entry index 2075 (pointed to by L1 first entry) 3. if next device id is 9472, we will get a big offset: 7397; 4. with unsigned 'len', 'len -= offset * esz', len will underflow to a positive number, mistakenly into next iteration with a bad GPA; (It should break out of the current L2 table scanning, and jump into the next L1 table entry) 5. that bad GPA fails the guest read. Fix it by stopping the L2 table scan when the next device id is outside of the current table, allowing the scan to continue from the next L1 table entry. Thanks to Eric Auger for the fix suggestion. Fixes: 920a7a8fa92a ("KVM: arm64: vgic-its: Add infrastructure for tableookup") Suggested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Eric Ren <renzhengeek@gmail.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/d9c3a564af9e2c5bf63f48a7dcbf08cd593c5c0b.1665802985.git.renzhengeek@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTSKai-Heng Feng2022-10-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1e41e693f458eef2d5728207dbd327cd3b16580a upstream. UBSAN complains about array-index-out-of-bounds: [ 1.980703] kernel: UBSAN: array-index-out-of-bounds in /build/linux-9H675w/linux-5.15.0/drivers/ata/libahci.c:968:41 [ 1.980709] kernel: index 15 is out of range for type 'ahci_em_priv [8]' [ 1.980713] kernel: CPU: 0 PID: 209 Comm: scsi_eh_8 Not tainted 5.15.0-25-generic #25-Ubuntu [ 1.980716] kernel: Hardware name: System manufacturer System Product Name/P5Q3, BIOS 1102 06/11/2010 [ 1.980718] kernel: Call Trace: [ 1.980721] kernel: <TASK> [ 1.980723] kernel: show_stack+0x52/0x58 [ 1.980729] kernel: dump_stack_lvl+0x4a/0x5f [ 1.980734] kernel: dump_stack+0x10/0x12 [ 1.980736] kernel: ubsan_epilogue+0x9/0x45 [ 1.980739] kernel: __ubsan_handle_out_of_bounds.cold+0x44/0x49 [ 1.980742] kernel: ahci_qc_issue+0x166/0x170 [libahci] [ 1.980748] kernel: ata_qc_issue+0x135/0x240 [ 1.980752] kernel: ata_exec_internal_sg+0x2c4/0x580 [ 1.980754] kernel: ? vprintk_default+0x1d/0x20 [ 1.980759] kernel: ata_exec_internal+0x67/0xa0 [ 1.980762] kernel: sata_pmp_read+0x8d/0xc0 [ 1.980765] kernel: sata_pmp_read_gscr+0x3c/0x90 [ 1.980768] kernel: sata_pmp_attach+0x8b/0x310 [ 1.980771] kernel: ata_eh_revalidate_and_attach+0x28c/0x4b0 [ 1.980775] kernel: ata_eh_recover+0x6b6/0xb30 [ 1.980778] kernel: ? ahci_do_hardreset+0x180/0x180 [libahci] [ 1.980783] kernel: ? ahci_stop_engine+0xb0/0xb0 [libahci] [ 1.980787] kernel: ? ahci_do_softreset+0x290/0x290 [libahci] [ 1.980792] kernel: ? trace_event_raw_event_ata_eh_link_autopsy_qc+0xe0/0xe0 [ 1.980795] kernel: sata_pmp_eh_recover.isra.0+0x214/0x560 [ 1.980799] kernel: sata_pmp_error_handler+0x23/0x40 [ 1.980802] kernel: ahci_error_handler+0x43/0x80 [libahci] [ 1.980806] kernel: ata_scsi_port_error_handler+0x2b1/0x600 [ 1.980810] kernel: ata_scsi_error+0x9c/0xd0 [ 1.980813] kernel: scsi_error_handler+0xa1/0x180 [ 1.980817] kernel: ? scsi_unjam_host+0x1c0/0x1c0 [ 1.980820] kernel: kthread+0x12a/0x150 [ 1.980823] kernel: ? set_kthread_struct+0x50/0x50 [ 1.980826] kernel: ret_from_fork+0x22/0x30 [ 1.980831] kernel: </TASK> This happens because sata_pmp_init_links() initialize link->pmp up to SATA_PMP_MAX_PORTS while em_priv is declared as 8 elements array. I can't find the maximum Enclosure Management ports specified in AHCI spec v1.3.1, but "12.2.1 LED message type" states that "Port Multiplier Information" can utilize 4 bits, which implies it can support up to 16 ports. Hence, use SATA_PMP_MAX_PORTS as EM_MAX_SLOTS to resolve the issue. BugLink: https://bugs.launchpad.net/bugs/1970074 Cc: stable@vger.kernel.org Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ata: ahci-imx: Fix MODULE_ALIASAlexander Stein2022-10-291-1/+1
| | | | | | | | | | | | | | commit 979556f1521a835a059de3b117b9c6c6642c7d58 upstream. 'ahci:' is an invalid prefix, preventing the module from autoloading. Fix this by using the 'platform:' prefix and DRV_NAME. Fixes: 9e54eae23bc9 ("ahci_imx: add ahci sata support on imx platforms") Cc: stable@vger.kernel.org Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hwmon/coretemp: Handle large core ID valueZhang Rui2022-10-291-15/+41
| | | | | | | | | | | | | | | | | | | | | | commit 7108b80a542b9d65e44b36d64a700a83658c0b73 upstream. The coretemp driver supports up to a hard-coded limit of 128 cores. Today, the driver can not support a core with an ID above that limit. Yet, the encoding of core ID's is arbitrary (BIOS APIC-ID) and so they may be sparse and they may be large. Update the driver to map arbitrary core ID numbers into appropriate array indexes so that 128 cores can be supported, no matter the encoding of core ID's. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Len Brown <len.brown@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221014090147.1836-3-rui.zhang@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/microcode/AMD: Apply the patch early on every logical threadBorislav Petkov2022-10-291-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e7ad18d1169c62e6c78c01ff693fd362d9d65278 upstream. Currently, the patch application logic checks whether the revision needs to be applied on each logical CPU (SMT thread). Therefore, on SMT designs where the microcode engine is shared between the two threads, the application happens only on one of them as that is enough to update the shared microcode engine. However, there are microcode patches which do per-thread modification, see Link tag below. Therefore, drop the revision check and try applying on each thread. This is what the BIOS does too so this method is very much tested. Btw, change only the early paths. On the late loading paths, there's no point in doing per-thread modification because if is it some case like in the bugzilla below - removing a CPUID flag - the kernel cannot go and un-use features it has detected are there early. For that, one should use early loading anyway. [ bp: Fixes does not contain the oldest commit which did check for equality but that is good enough. ] Fixes: 8801b3fcb574 ("x86/microcode/AMD: Rework container parsing") Reported-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com> Cc: <stable@vger.kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ocfs2: fix BUG when iput after ocfs2_mknod failsJoseph Qi2022-10-291-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 759a7c6126eef5635506453e9b9d55a6a3ac2084 upstream. Commit b1529a41f777 "ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed inode if __ocfs2_mknod_locked() fails later. But this introduce a race, the freed bit may be reused immediately by another thread, which will update dinode, e.g. i_generation. Then iput this inode will lead to BUG: inode->i_generation != le32_to_cpu(fe->i_generation) We could make this inode as bad, but we did want to do operations like wipe in some cases. Since the claimed inode bit can only affect that an dinode is missing and will return back after fsck, it seems not a big problem. So just leave it as is by revert the reclaim logic. Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.com Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Yan Wang <wangyan122@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ocfs2: clear dinode links count in case of errorJoseph Qi2022-10-291-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | commit 28f4821b1b53e0649706912e810c6c232fc506f9 upstream. In ocfs2_mknod(), if error occurs after dinode successfully allocated, ocfs2 i_links_count will not be 0. So even though we clear inode i_nlink before iput in error handling, it still won't wipe inode since we'll refresh inode from dinode during inode lock. So just like clear inode i_nlink, we clear ocfs2 i_links_count as well. Also do the same change for ocfs2_symlink(). Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Yan Wang <wangyan122@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: fix use-after-free on CIL context on shutdownDave Chinner2022-10-292-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c7f87f3984cfa1e6d32806a715f35c5947ad9c09 upstream. xlog_wait() on the CIL context can reference a freed context if the waiter doesn't get scheduled before the CIL context is freed. This can happen when a task is on the hard throttle and the CIL push aborts due to a shutdown. This was detected by generic/019: thread 1 thread 2 __xfs_trans_commit xfs_log_commit_cil <CIL size over hard throttle limit> xlog_wait schedule xlog_cil_push_work wake_up_all <shutdown aborts commit> xlog_cil_committed kmem_free remove_wait_queue spin_lock_irqsave --> UAF Fix it by moving the wait queue to the CIL rather than keeping it in in the CIL context that gets freed on push completion. Because the wait queue is now independent of the CIL context and we might have multiple contexts in flight at once, only wake the waiters on the push throttle when the context we are pushing is over the hard throttle size threshold. Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push") Reported-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: move inode flush to the sync workqueueDarrick J. Wong2022-10-292-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | commit f0f7a674d4df1510d8ca050a669e1420cf7d7fab upstream. [ Modify fs/xfs/xfs_super.c to include the changes at locations suitable for 5.4-lts kernel ] Move the inode dirty data flushing to a workqueue so that multiple threads can take advantage of a single thread's flushing work. The ratelimiting technique used in bdd4ee4 was not successful, because threads that skipped the inode flush scan due to ratelimiting would ENOSPC early, which caused occasional (but noticeable) changes in behavior and sporadic fstest regressions. Therefore, make all the writer threads wait on a single inode flush, which eliminates both the stampeding hordes of flushers and the small window in which a write could fail with ENOSPC because it lost the ratelimit race after even another thread freed space. Fixes: c6425702f21e ("xfs: ratelimit inode flush on buffered write ENOSPC") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: reflink should force the log out if mounted with wsyncChristoph Hellwig2022-10-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | commit 5833112df7e9a306af9af09c60127b92ed723962 upstream. Reflink should force the log out to disk if the filesystem was mounted with wsync, the same as most other operations in xfs. [Note: XFS_MOUNT_WSYNC is set when the admin mounts the filesystem with either the 'wsync' or 'sync' mount options, which effectively means that we're classifying reflink/dedupe as IO operations and making them synchronous when required.] Fixes: 3fc9f5e409319 ("xfs: remove xfs_reflink_remap_range") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> [darrick: add more to the changelog] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: factor out a new xfs_log_force_inode helperChristoph Hellwig2022-10-294-24/+22
| | | | | | | | | | | | | | | commit 54fbdd1035e3a4e4f4082c335b095426cdefd092 upstream. Create a new helper to force the log up to the last LSN touching an inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: trylock underlying buffer on dquot flushBrian Foster2022-10-293-9/+14
| | | | | | | | | | | | | | | | | | | commit 8d3d7e2b35ea7d91d6e085c93b5efecfb0fba307 upstream. A dquot flush currently blocks on the buffer lock for the underlying dquot buffer. In turn, this causes xfsaild to block rather than continue processing other items in the meantime. Update xfs_qm_dqflush() to trylock the buffer, similar to how inode buffers are handled, and return -EAGAIN if the lock fails. Fix up any callers that don't currently handle the error properly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: don't write a corrupt unmount record to force summary counter recalcDarrick J. Wong2022-10-291-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5cc3c006eb45524860c4d1dd4dd7ad4a506bf3f5 upstream. [ Modify fs/xfs/xfs_log.c to include the changes at locations suitable for 5.4-lts kernel ] In commit f467cad95f5e3, I added the ability to force a recalculation of the filesystem summary counters if they seemed incorrect. This was done (not entirely correctly) by tweaking the log code to write an unmount record without the UMOUNT_TRANS flag set. At next mount, the log recovery code will fail to find the unmount record and go into recovery, which triggers the recalculation. What actually gets written to the log is what ought to be an unmount record, but without any flags set to indicate what kind of record it actually is. This worked to trigger the recalculation, but we shouldn't write bogus log records when we could simply write nothing. Fixes: f467cad95f5e3 ("xfs: force summary counter recalc at next mount") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: tail updates only need to occur when LSN changesDave Chinner2022-10-293-23/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8eb807bd839938b45bf7a97f0568d2a845ba6929 upstream. We currently wake anything waiting on the log tail to move whenever the log item at the tail of the log is removed. Historically this was fine behaviour because there were very few items at any given LSN. But with delayed logging, there may be thousands of items at any given LSN, and we can't move the tail until they are all gone. Hence if we are removing them in near tail-first order, we might be waking up processes waiting on the tail LSN to change (e.g. log space waiters) repeatedly without them being able to make progress. This also occurs with the new sync push waiters, and can result in thousands of spurious wakeups every second when under heavy direct reclaim pressure. To fix this, check that the tail LSN has actually changed on the AIL before triggering wakeups. This will reduce the number of spurious wakeups when doing bulk AIL removal and make this code much more efficient. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: factor common AIL item deletion codeDave Chinner2022-10-293-34/+30
| | | | | | | | | | | | | | | | | | | commit 4165994ac9672d91134675caa6de3645a9ace6c8 upstream. Factor the common AIL deletion code that does all the wakeups into a helper so we only have one copy of this somewhat tricky code to interface with all the wakeups necessary when the LSN of the log tail changes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: Throttle commits on delayed background CIL pushDave Chinner2022-10-293-4/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0e7ab7efe77451cba4cbecb6c9f5ef83cf32b36b upstream. In certain situations the background CIL push can be indefinitely delayed. While we have workarounds from the obvious cases now, it doesn't solve the underlying issue. This issue is that there is no upper limit on the CIL where we will either force or wait for a background push to start, hence allowing the CIL to grow without bound until it consumes all log space. To fix this, add a new wait queue to the CIL which allows background pushes to wait for the CIL context to be switched out. This happens when the push starts, so it will allow us to block incoming transaction commit completion until the push has started. This will only affect processes that are running modifications, and only when the CIL threshold has been significantly overrun. This has no apparent impact on performance, and doesn't even trigger until over 45 million inodes had been created in a 16-way fsmark test on a 2GB log. That was limiting at 64MB of log space used, so the active CIL size is only about 3% of the total log in that case. The concurrent removal of those files did not trigger the background sleep at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: Lower CIL flush limit for large logsDave Chinner2022-10-291-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 108a42358a05312b2128533c6462a3fdeb410bdf upstream. The current CIL size aggregation limit is 1/8th the log size. This means for large logs we might be aggregating at least 250MB of dirty objects in memory before the CIL is flushed to the journal. With CIL shadow buffers sitting around, this means the CIL is often consuming >500MB of temporary memory that is all allocated under GFP_NOFS conditions. Flushing the CIL can take some time to do if there is other IO ongoing, and can introduce substantial log force latency by itself. It also pins the memory until the objects are in the AIL and can be written back and reclaimed by shrinkers. Hence this threshold also tends to determine the minimum amount of memory XFS can operate in under heavy modification without triggering the OOM killer. Modify the CIL space limit to prevent such huge amounts of pinned metadata from aggregating. We can have 2MB of log IO in flight at once, so limit aggregation to 16x this size. This threshold was chosen as it little impact on performance (on 16-way fsmark) or log traffic but pins a lot less memory on large logs especially under heavy memory pressure. An aggregation limit of 8x had 5-10% performance degradation and a 50% increase in log throughput for the same workload, so clearly that was too small for highly concurrent workloads on large logs. This was found via trace analysis of AIL behaviour. e.g. insertion from a single CIL flush: xfs_ail_insert: old lsn 0/0 new lsn 1/3033090 type XFS_LI_INODE flags IN_AIL $ grep xfs_ail_insert /mnt/scratch/s.t |grep "new lsn 1/3033090" |wc -l 1721823 $ So there were 1.7 million objects inserted into the AIL from this CIL checkpoint, the first at 2323.392108, the last at 2325.667566 which was the end of the trace (i.e. it hadn't finished). Clearly a major problem. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: preserve default grace interval during quotacheckDarrick J. Wong2022-10-291-6/+14
| | | | | | | | | | | | | | | | | | | | | commit 5885539f0af371024d07afd14974bfdc3fff84c5 upstream. When quotacheck runs, it zeroes all the timer fields in every dquot. Unfortunately, it also does this to the root dquot, which erases any preconfigured grace intervals and warning limits that the administrator may have set. Worse yet, the incore copies of those variables remain set. This cache coherence problem manifests itself as the grace interval mysteriously being reset back to the defaults at the /next/ mount. Fix it by not resetting the root disk dquot's timer and warning fields. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: fix unmount hang and memory leak on shutdown during quotaoffBrian Foster2022-10-292-6/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8a62714313391b9b2297d67c341b35edbf46c279 upstream. AIL removal of the quotaoff start intent and free of both quotaoff intents is currently limited to the ->iop_committed() handler of the end intent. This executes when the end intent is committed to the on-disk log and marks the completion of the operation. The problem with this is it assumes the success of the operation. If a shutdown or other error occurs during the quotaoff, it's possible for the quotaoff task to exit without removing the start intent from the AIL. This results in an unmount hang as the AIL cannot be emptied. Further, no other codepath frees the intents and so this is also a memory leak vector. First, update the high level quotaoff error path to directly remove and free the quotaoff start intent if it still exists in the AIL at the time of the error. Next, update both of the start and end quotaoff intents with an ->iop_release() callback to properly handle transaction abort. This means that If the quotaoff start transaction aborts, it frees the start intent in the transaction commit path. If the filesystem shuts down before the end transaction allocates, the quotaoff sequence removes and frees the start intent. If the end transaction aborts, it removes the start intent and frees both. This ensures that a shutdown does not result in a hung unmount and that memory is not leaked regardless of when a quotaoff error occurs. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: factor out quotaoff intent AIL removal and memory freeBrian Foster2022-10-292-9/+21
| | | | | | | | | | | | | | | | | | | | commit 854f82b1f6039a418b7d1407513f8640e05fd73f upstream. AIL removal of the quotaoff start intent and free of both intents is hardcoded to the ->iop_committed() handler of the end intent. Factor out the start intent handling code so it can be used in a future patch to properly handle quotaoff errors. Use xfs_trans_ail_remove() instead of the _delete() variant to acquire the AIL lock and also handle cases where an intent might not reside in the AIL at the time of a failure. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: Replace function declaration by actual definitionPavel Reichl2022-10-291-74/+66
| | | | | | | | | | | | commit 1cc95e6f0d7cfd61c9d3c5cdd4e7345b173f764f upstream. Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix typo in subject line] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: remove the xfs_qoff_logitem_t typedefPavel Reichl2022-10-294-34/+39
| | | | | | | | | | | | commit d0bdfb106907e4a3ef4f25f6d27e392abf41f3a0 upstream. Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix a comment] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: remove the xfs_dq_logitem_t typedefPavel Reichl2022-10-293-7/+7
| | | | | | | | | | | commit fd8b81dbbb23d4a3508cfac83256b4f5e770941c upstream. Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: remove the xfs_disk_dquot_t and xfs_dquot_tPavel Reichl2022-10-299-108/+111
| | | | | | | | | | | | commit aefe69a45d84901c702f87672ec1e93de1d03f73 upstream. Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix some of the comments] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: Use scnprintf() for avoiding potential buffer overflowTakashi Iwai2022-10-291-5/+5
| | | | | | | | | | | | | | | commit 17bb60b74124e9491d593e2601e3afe14daa2f57 upstream. Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: check owner of dir3 blocksDarrick J. Wong2022-10-291-2/+31
| | | | | | | | | | | | | commit 1b2c1a63b678d63e9c98314d44413f5af79c9c80 upstream. Check the owner field of dir3 block headers. If it's corrupt, release the buffer and return EFSCORRUPTED. All callers handle this properly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: check owner of dir3 data blocksDarrick J. Wong2022-10-291-2/+30
| | | | | | | | | | | | | | | | | commit a10c21ed5d5241d11cf1d5a4556730840572900b upstream. [Slightly edit xfs_dir3_data_read() to work with existing mapped_bno argument instead of flag values introduced in later kernels] Check the owner field of dir3 data block headers. If it's corrupt, release the buffer and return EFSCORRUPTED. All callers handle this properly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>