summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2023-01-072-1/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client fixes from Trond Myklebust: - Fix a race in the RPCSEC_GSS upcall code that causes hung RPC calls - Fix a broken coalescing test in the pNFS file layout driver - Ensure that the access cache rcu path also applies the login test - Fix up for a sparse warning * tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix up a sparse warning NFS: Judge the file access cache's timestamp in rcu path pNFS/filelayout: Fix coalescing test for single DS SUNRPC: ensure the matching upcall is in-flight upon downcall
| * NFS: Fix up a sparse warningTrond Myklebust2023-01-011-1/+3
| | | | | | | | | | | | | | | | | | sparse is warning about an incorrect RCU dereference. fs/nfs/dir.c:2965:56: warning: incorrect type in argument 1 (different address spaces) fs/nfs/dir.c:2965:56: expected struct cred const * fs/nfs/dir.c:2965:56: got struct cred const [noderef] __rcu *const cred Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFS: Judge the file access cache's timestamp in rcu pathChengen Du2023-01-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | If the user's login time is newer than the cache's timestamp, we expect the cache may be stale and need to clear. The stale cache will remain in the list's tail if no other users operate on that inode. Once the user accesses the inode, the stale cache will be returned in rcu path. Signed-off-by: Chengen Du <chengen.du@canonical.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * pNFS/filelayout: Fix coalescing test for single DSOlga Kornievskaia2022-12-201-0/+8
| | | | | | | | | | | | | | | | | | When there is a single DS no striping constraints need to be placed on the IO. When such constraint is applied then buffered reads don't coalesce to the DS's rsize. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | Merge tag '6.2-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2023-01-075-26/+27
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "cifs/smb3 client fixes: - two multichannel fixes - three reconnect fixes - unmap fix" * tag '6.2-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix interface count calculation during refresh cifs: refcount only the selected iface during interface update cifs: protect access of TCP_Server_Info::{dstaddr,hostname} cifs: fix race in assemble_neg_contexts() cifs: ignore ipc reconnect failures during dfs failover cifs: Fix kmap_local_page() unmapping
| * | cifs: fix interface count calculation during refreshShyam Prasad N2023-01-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last fix to iface_count did fix the overcounting issue. However, during each refresh, we could end up undercounting the iface_count, if a match was found. Fixing this by doing increments and decrements instead of setting it to 0 before each parsing of server interfaces. Fixes: 096bbeec7bd6 ("smb3: interface count displayed incorrectly") Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: refcount only the selected iface during interface updateShyam Prasad N2023-01-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the server interface for a channel is not active anymore, we have the logic to select an alternative interface. However this was not breaking out of the loop as soon as a new alternative was found. As a result, some interfaces may get refcounted unintentionally. There was also a bug in checking if we found an alternate iface. Fixed that too. Fixes: b54034a73baf ("cifs: during reconnect, update interface if necessary") Cc: stable@vger.kernel.org # 5.19+ Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: protect access of TCP_Server_Info::{dstaddr,hostname}Paulo Alcantara2023-01-042-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the appropriate locks to protect access of hostname and dstaddr fields in cifs_tree_connect() as they might get changed by other tasks. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: fix race in assemble_neg_contexts()Paulo Alcantara2023-01-041-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Serialise access of TCP_Server_Info::hostname in assemble_neg_contexts() by holding the server's mutex otherwise it might end up accessing an already-freed hostname pointer from cifs_reconnect() or cifs_resolve_server(). Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: ignore ipc reconnect failures during dfs failoverPaulo Alcantara2023-01-041-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If it failed to reconnect ipc used for getting referrals, we can just ignore it as it is not required for reconnecting the share. The worst case would be not being able to detect or chase nested links as long as dfs root server is unreachable. Before patch: $ mount.cifs //root/dfs/link /mnt -o echo_interval=10,... -> target share: /fs0/share disconnect root & fs0 $ ls /mnt ls: cannot access '/mnt': Host is down connect fs0 $ ls /mnt ls: cannot access '/mnt': Resource temporarily unavailable After patch: $ mount.cifs //root/dfs/link /mnt -o echo_interval=10,... -> target share: /fs0/share disconnect root & fs0 $ ls /mnt ls: cannot access '/mnt': Host is down connect fs0 $ ls /mnt bar.rtf dir1 foo Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: Fix kmap_local_page() unmappingIra Weiny2023-01-041-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmap_local_page() requires kunmap_local() to unmap the mapping. In addition memcpy_page() is provided to perform this common memcpy pattern. Replace the kmap_local_page() and broken kunmap() with memcpy_page() Fixes: d406d26745ab ("cifs: skip alloc when request has no pages") Reviewed-by: Paulo Alcantara <pc@cjr.nz> Reviewed-by: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | hfs/hfsplus: avoid WARN_ON() for sanity check, use proper error handlingLinus Torvalds2023-01-061-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 55d1cbbbb29e ("hfs/hfsplus: use WARN_ON for sanity check") fixed a build warning by turning a comment into a WARN_ON(), but it turns out that syzbot then complains because it can trigger said warning with a corrupted hfs image. The warning actually does warn about a bad situation, but we are much better off just handling it as the error it is. So rather than warn about us doing bad things, stop doing the bad things and return -EIO. While at it, also fix a memory leak that was introduced by an earlier fix for a similar syzbot warning situation, and add a check for one case that historically wasn't handled at all (ie neither comment nor subsequent WARN_ON). Reported-by: syzbot+7bb7cd3595533513a9e7@syzkaller.appspotmail.com Fixes: 55d1cbbbb29e ("hfs/hfsplus: use WARN_ON for sanity check") Fixes: 8d824e69d9f3 ("hfs: fix OOB Read in __hfs_brec_find") Link: https://lore.kernel.org/lkml/000000000000dbce4e05f170f289@google.com/ Tested-by: Michael Schmitz <schmitzmic@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'ceph-for-6.2-rc3' of https://github.com/ceph/ceph-clientLinus Torvalds2023-01-063-8/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph fixes from Ilya Dryomov: "Two file locking fixes from Xiubo" * tag 'ceph-for-6.2-rc3' of https://github.com/ceph/ceph-client: ceph: avoid use-after-free in ceph_fl_release_lock() ceph: switch to vfs_inode_has_locks() to fix file lock bug
| * | | ceph: avoid use-after-free in ceph_fl_release_lock()Xiubo Li2023-01-021-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ceph releasing the file_lock it will try to get the inode pointer from the fl->fl_file, which the memory could already be released by another thread in filp_close(). Because in VFS layer the fl->fl_file doesn't increase the file's reference counter. Will switch to use ceph dedicate lock info to track the inode. And in ceph_fl_release_lock() we should skip all the operations if the fl->fl_u.ceph.inode is not set, which should come from the request file_lock. And we will set fl->fl_u.ceph.inode when inserting it to the inode lock list, which is when copying the lock. Link: https://tracker.ceph.com/issues/57986 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: switch to vfs_inode_has_locks() to fix file lock bugXiubo Li2023-01-023-6/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the POSIX locks they are using the same owner, which is the thread id. And multiple POSIX locks could be merged into single one, so when checking whether the 'file' has locks may fail. For a file where some openers use locking and others don't is a really odd usage pattern though. Locks are like stoplights -- they only work if everyone pays attention to them. Just switch ceph_get_caps() to check whether any locks are set on the inode. If there are POSIX/OFD/FLOCK locks on the file at the time, we should set CHECK_FILELOCK, regardless of what fd was used to set the lock. Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | Merge tag 'fixes_for_v6.2-rc3' of ↵Linus Torvalds2023-01-061-4/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fixes from Jan Kara: "Two fixups of the UDF changes that went into 6.2-rc1" * tag 'fixes_for_v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: initialize newblock to 0 udf: Fix extension of the last extent in the file
| * | | udf: initialize newblock to 0Tom Rix2023-01-061-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The clang build reports this error fs/udf/inode.c:805:6: error: variable 'newblock' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (*err < 0) ^~~~~~~~ newblock is never set before error handling jump. Initialize newblock to 0 and remove redundant settings. Fixes: d8b39db5fab8 ("udf: Handle error when adding extent to a file") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20221230175341.1629734-1-trix@redhat.com>
| * | | udf: Fix extension of the last extent in the fileJan Kara2023-01-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When extending the last extent in the file within the last block, we wrongly computed the length of the last extent. This is mostly a cosmetical problem since the extent does not contain any data and the length will be fixed up by following operations but still. Fixes: 1f3868f06855 ("udf: Fix extending file within last block") Signed-off-by: Jan Kara <jack@suse.cz>
* | | | Merge tag 'for-6.2-rc2-tag' of ↵Linus Torvalds2023-01-069-16/+53
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more regression and regular fixes: - regressions: - fix assertion condition using = instead of == - fix false alert on bad tree level check - fix off-by-one error in delalloc search during lseek - fix compat ro feature check at read-write remount - handle case when read-repair happens with ongoing device replace - updated error messages" * tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix compat_ro checks against remount btrfs: always report error in run_one_delayed_ref() btrfs: handle case when repair happens with dev-replace btrfs: fix off-by-one in delalloc search during lseek btrfs: fix false alert on bad tree level check btrfs: add error message for metadata level mismatch btrfs: fix ASSERT em->len condition in btrfs_get_extent
| * | | btrfs: fix compat_ro checks against remountQu Wenruo2023-01-033-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [BUG] Even with commit 81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling"), btrfs can still mount a fs with unsupported compat_ro flags read-only, then remount it RW: # btrfs ins dump-super /dev/loop0 | grep compat_ro_flags -A 3 compat_ro_flags 0x403 ( FREE_SPACE_TREE | FREE_SPACE_TREE_VALID | unknown flag: 0x400 ) # mount /dev/loop0 /mnt/btrfs mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. ^^^ RW mount failed as expected ^^^ # dmesg -t | tail -n5 loop0: detected capacity change from 0 to 1048576 BTRFS: device fsid cb5b82f5-0fdd-4d81-9b4b-78533c324afa devid 1 transid 7 /dev/loop0 scanned by mount (1146) BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device loop0): using free space tree BTRFS error (device loop0): cannot mount read-write because of unknown compat_ro features (0x403) BTRFS error (device loop0): open_ctree failed # mount /dev/loop0 -o ro /mnt/btrfs # mount -o remount,rw /mnt/btrfs ^^^ RW remount succeeded unexpectedly ^^^ [CAUSE] Currently we use btrfs_check_features() to check compat_ro flags against our current mount flags. That function get reused between open_ctree() and btrfs_remount(). But for btrfs_remount(), the super block we passed in still has the old mount flags, thus btrfs_check_features() still believes we're mounting read-only. [FIX] Replace the existing @sb argument with @is_rw_mount. As originally we only use @sb to determine if the mount is RW. Now it's callers' responsibility to determine if the mount is RW, and since there are only two callers, the check is pretty simple: - caller in open_ctree() Just pass !sb_rdonly(). - caller in btrfs_remount() Pass !(*flags & SB_RDONLY), as our check should be against the new flags. Now we can correctly reject the RW remount: # mount /dev/loop0 -o ro /mnt/btrfs # mount -o remount,rw /mnt/btrfs mount: /mnt/btrfs: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. # dmesg -t | tail -n 1 BTRFS error (device loop0: state M): cannot mount read-write because of unknown compat_ro features (0x403) Reported-by: Chung-Chiang Cheng <shepjeng@gmail.com> Fixes: 81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | btrfs: always report error in run_one_delayed_ref()Qu Wenruo2023-01-031-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but if end users hit such problem, there will be no chance that btrfs_debug() is enabled. This can lead to very little useful info for debugging. This patch will: - Add extra info for error reporting Including: * logical bytenr * num_bytes * type * action * ref_mod - Replace the btrfs_debug() with btrfs_err() - Move the error reporting into run_one_delayed_ref() This is to avoid use-after-free, the @node can be freed in the caller. This error should only be triggered at most once. As if run_one_delayed_ref() failed, we trigger the error message, then causing the call chain to error out: btrfs_run_delayed_refs() `- btrfs_run_delayed_refs() `- btrfs_run_delayed_refs_for_head() `- run_one_delayed_ref() And we will abort the current transaction in btrfs_run_delayed_refs(). If we have to run delayed refs for the abort transaction, run_one_delayed_ref() will just cleanup the refs and do nothing, thus no new error messages would be output. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | btrfs: handle case when repair happens with dev-replaceQu Wenruo2023-01-031-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [BUG] There is a bug report that a BUG_ON() in btrfs_repair_io_failure() (originally repair_io_failure() in v6.0 kernel) got triggered when replacing a unreliable disk: BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39624704 csum 0xb0d18c75 expected csum 0x4dae9c5e mirror 3 kernel BUG at fs/btrfs/extent_io.c:2380! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 PID: 3614331 Comm: kworker/u257:2 Tainted: G OE 6.0.0-5-amd64 #1 Debian 6.0.10-2 Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021 Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs] Call Trace: <TASK> clean_io_failure+0x14d/0x180 [btrfs] end_bio_extent_readpage+0x412/0x6e0 [btrfs] ? __switch_to+0x106/0x420 process_one_work+0x1c7/0x380 worker_thread+0x4d/0x380 ? rescuer_thread+0x3a0/0x3a0 kthread+0xe9/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 [CAUSE] Before the BUG_ON(), we got some read errors from the replace target first, note the mirror number (3, which is beyond RAID1 duplication, thus it's read from the replace target device). Then at the BUG_ON() location, we are trying to writeback the repaired sectors back the failed device. The check looks like this: ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical, &map_length, &bioc, mirror_num); if (ret) goto out_counter_dec; BUG_ON(mirror_num != bioc->mirror_num); But inside btrfs_map_block(), we can modify bioc->mirror_num especially for dev-replace: if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 && !need_full_stripe(op) && dev_replace->tgtdev != NULL) { ret = get_extra_mirror_from_replace(fs_info, logical, *length, dev_replace->srcdev->devid, &mirror_num, &physical_to_patch_in_first_stripe); patch_the_first_stripe_for_dev_replace = 1; } Thus if we're repairing the replace target device, we're going to trigger that BUG_ON(). But in reality, the read failure from the replace target device may be that, our replace hasn't reached the range we're reading, thus we're reading garbage, but with replace running, the range would be properly filled later. Thus in that case, we don't need to do anything but let the replace routine to handle it. [FIX] Instead of a BUG_ON(), just skip the repair if we're repairing the device replace target device. Reported-by: 小太 <nospam@kota.moe> Link: https://lore.kernel.org/linux-btrfs/CACsxjPYyJGQZ+yvjzxA1Nn2LuqkYqTCcUH43S=+wXhyf8S00Ag@mail.gmail.com/ CC: stable@vger.kernel.org # 6.0+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | btrfs: fix off-by-one in delalloc search during lseekFilipe Manana2023-01-032-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During lseek, when searching for delalloc in a range that represents a hole and that range has a length of 1 byte, we end up not doing the actual delalloc search in the inode's io tree, resulting in not correctly reporting the offset with data or a hole. This actually only happens when the start offset is 0 because with any other start offset we round it down by sector size. Reproducer: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -c "pwrite -q 0 1" /mnt/sdc/foo $ xfs_io -c "seek -d 0" /mnt/sdc/foo Whence Result DATA EOF It should have reported an offset of 0 instead of EOF. Fix this by updating btrfs_find_delalloc_in_range() and count_range_bits() to deal with inclusive ranges properly. These functions are already supposed to work with inclusive end offsets, they just got it wrong in a couple places due to off-by-one mistakes. A test case for fstests will be added later. Reported-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Link: https://lore.kernel.org/linux-btrfs/20221223020509.457113-1-joanbrugueram@gmail.com/ Fixes: b6e833567ea1 ("btrfs: make hole and data seeking a lot more efficient") CC: stable@vger.kernel.org # 6.1 Tested-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | btrfs: fix false alert on bad tree level checkQu Wenruo2023-01-031-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [BUG] There is a bug report that on a RAID0 NVMe btrfs system, under heavy write load the filesystem can flip RO randomly. With extra debugging, it shows some tree blocks failed to pass their level checks, and if that happens at critical path of a transaction, we abort the transaction: BTRFS error (device nvme0n1p3): level verify failed on logical 5446121209856 mirror 1 wanted 0 found 1 BTRFS error (device nvme0n1p3: state A): Transaction aborted (error -5) BTRFS: error (device nvme0n1p3: state A) in btrfs_finish_ordered_io:3343: errno=-5 IO failure BTRFS info (device nvme0n1p3: state EA): forced readonly [CAUSE] The reporter has already bisected to commit 947a629988f1 ("btrfs: move tree block parentness check into validate_extent_buffer()"). And with extra debugging, it shows we can have btrfs_tree_parent_check filled with all zeros in the following call trace: submit_one_bio+0xd4/0xe0 submit_extent_page+0x142/0x550 read_extent_buffer_pages+0x584/0x9c0 ? __pfx_end_bio_extent_readpage+0x10/0x10 ? folio_unlock+0x1d/0x50 btrfs_read_extent_buffer+0x98/0x150 read_tree_block+0x43/0xa0 read_block_for_search+0x266/0x370 btrfs_search_slot+0x351/0xd30 ? lock_is_held_type+0xe8/0x140 btrfs_lookup_csum+0x63/0x150 btrfs_csum_file_blocks+0x197/0x6c0 ? sched_clock_cpu+0x9f/0xc0 ? lock_release+0x14b/0x440 ? _raw_read_unlock+0x29/0x50 btrfs_finish_ordered_io+0x441/0x860 btrfs_work_helper+0xfe/0x400 ? lock_is_held_type+0xe8/0x140 process_one_work+0x294/0x5b0 worker_thread+0x4f/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xf5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 Currently we only copy the btrfs_tree_parent_check structure into bbio at read_extent_buffer_pages() after we have assembled the bbio. But as shown above, submit_extent_page() itself can already submit the bbio, leaving the bbio->parent_check uninitialized, and cause the false alert. [FIX] Instead of copying @check into bbio after bbio is assembled, we pass @check in btrfs_bio_ctrl::parent_check, and copy the content of parent_check in submit_one_bio() for metadata read. By this we should be able to pass the needed info for metadata endio verification, and fix the false alert. Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CABXGCsNzVxo4iq-tJSGm_kO1UggHXgq6CdcHDL=z5FL4njYXSQ@mail.gmail.com/ Fixes: 947a629988f1 ("btrfs: move tree block parentness check into validate_extent_buffer()") Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | btrfs: add error message for metadata level mismatchQu Wenruo2023-01-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From a recent regression report, we found that after commit 947a629988f1 ("btrfs: move tree block parentness check into validate_extent_buffer()") if we have a level mismatch (false alert though), there is no error message at all. This makes later debugging harder. This patch will add the proper error message for such case. Link: https://lore.kernel.org/linux-btrfs/CABXGCsNzVxo4iq-tJSGm_kO1UggHXgq6CdcHDL=z5FL4njYXSQ@mail.gmail.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | btrfs: fix ASSERT em->len condition in btrfs_get_extentTanmay Bhushan2023-01-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The em->len value is supposed to be verified in the assertion condition though we expect it to be same as the sectorsize. Fixes: a196a8944f77 ("btrfs: do not reset extent map members for inline extents read") Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Tanmay Bhushan <007047221b@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | Merge tag 'f2fs-fix-6.2-rc3' of ↵Linus Torvalds2023-01-044-26/+25
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: - fix a null pointer dereference in f2fs_issue_flush, which occurs by the combination of mount/remount options. - fix a bug in per-block age-based extent_cache newly introduced in 6.2-rc1, which reported a wrong age information in extent_cache. - fix a kernel panic if extent_tree was not created, which was caught by a wrong BUG_ON * tag 'f2fs-fix-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: let's avoid panic if extent_tree is not created f2fs: should use a temp extent_info for lookup f2fs: don't mix to use union values in extent_info f2fs: initialize extent_cache parameter f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()
| * | | | f2fs: let's avoid panic if extent_tree is not createdJaegeuk Kim2023-01-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch avoids the below panic. pc : __lookup_extent_tree+0xd8/0x760 lr : f2fs_do_write_data_page+0x104/0x87c sp : ffffffc010cbb3c0 x29: ffffffc010cbb3e0 x28: 0000000000000000 x27: ffffff8803e7f020 x26: ffffff8803e7ed40 x25: ffffff8803e7f020 x24: ffffffc010cbb460 x23: ffffffc010cbb480 x22: 0000000000000000 x21: 0000000000000000 x20: ffffffff22e90900 x19: 0000000000000000 x18: ffffffc010c5d080 x17: 0000000000000000 x16: 0000000000000020 x15: ffffffdb1acdbb88 x14: ffffff888759e2b0 x13: 0000000000000000 x12: ffffff802da49000 x11: 000000000a001200 x10: ffffff8803e7ed40 x9 : ffffff8023195800 x8 : ffffff802da49078 x7 : 0000000000000001 x6 : 0000000000000000 x5 : 0000000000000006 x4 : ffffffc010cbba28 x3 : 0000000000000000 x2 : ffffffc010cbb480 x1 : 0000000000000000 x0 : ffffff8803e7ed40 Call trace: __lookup_extent_tree+0xd8/0x760 f2fs_do_write_data_page+0x104/0x87c f2fs_write_single_data_page+0x420/0xb60 f2fs_write_cache_pages+0x418/0xb1c __f2fs_write_data_pages+0x428/0x58c f2fs_write_data_pages+0x30/0x40 do_writepages+0x88/0x190 __writeback_single_inode+0x48/0x448 writeback_sb_inodes+0x468/0x9e8 __writeback_inodes_wb+0xb8/0x2a4 wb_writeback+0x33c/0x740 wb_do_writeback+0x2b4/0x400 wb_workfn+0xe4/0x34c process_one_work+0x24c/0x5bc worker_thread+0x3e8/0xa50 kthread+0x150/0x1b4 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | | | f2fs: should use a temp extent_info for lookupJaegeuk Kim2023-01-031-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise, __lookup_extent_tree() will override the given extent_info which will be used by caller. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | | | f2fs: don't mix to use union values in extent_infoJaegeuk Kim2023-01-031-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's explicitly use the defined values in block_age case only. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | | | f2fs: initialize extent_cache parameterJaegeuk Kim2023-01-034-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This can avoid confusing tracepoint values. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | | | f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()Chao Yu2023-01-031-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With below two cases, it will cause NULL pointer dereference when accessing SM_I(sbi)->fcc_info in f2fs_issue_flush(). a) If kthread_run() fails in f2fs_create_flush_cmd_control(), it will release SM_I(sbi)->fcc_info, - mount -o noflush_merge /dev/vda /mnt/f2fs - mount -o remount,flush_merge /dev/vda /mnt/f2fs -- kthread_run() fails - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync b) we will never allocate memory for SM_I(sbi)->fcc_info w/ below testcase, - mount -o ro /dev/vda /mnt/f2fs - mount -o rw,remount /dev/vda /mnt/f2fs - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync In order to fix this issue, let change as below: - fix error path handling in f2fs_create_flush_cmd_control(). - allocate SM_I(sbi)->fcc_info even if readonly is on. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | | | | Merge tag 'nfsd-6.2-2' of ↵Linus Torvalds2023-01-042-1/+12
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a filecache UAF during NFSD shutdown - Avoid exposing automounted mounts on NFS re-exports * tag 'nfsd-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix handling of readdir in v4root vs. mount upcall timeout nfsd: shut down the NFSv4 state objects before the filecache
| * | | | nfsd: fix handling of readdir in v4root vs. mount upcall timeoutJeff Layton2023-01-021-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If v4 READDIR operation hits a mountpoint and gets back an error, then it will include that entry in the reply and set RDATTR_ERROR for it to the error. That's fine for "normal" exported filesystems, but on the v4root, we need to be more careful to only expose the existence of dentries that lead to exports. If the mountd upcall times out while checking to see whether a mountpoint on the v4root is exported, then we have no recourse other than to fail the whole operation. Cc: Steve Dickson <steved@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777 Reported-by: JianHong Yin <yin-jianhong@163.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org>
| * | | | nfsd: shut down the NFSv4 state objects before the filecacheJeff Layton2022-12-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we shut down the filecache before trying to clean up the stateids that depend on it. This leads to the kernel trying to free an nfsd_file twice, and a refcount overput on the nf_mark. Change the shutdown procedure to tear down all of the stateids prior to shutting down the filecache. Reported-and-tested-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Fixes: 5e113224c17e ("nfsd: nfsd_file cache entries should be per net namespace") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | | | Merge tag 'for-6.2-rc2-tag' of ↵Linus Torvalds2023-01-027-6/+19
|\ \ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "First batch of regression and regular fixes: - regressions: - fix error handling after conversion to qstr for paths - fix raid56/scrub recovery caused by uninitialized variable after conversion to error bitmaps - restore qgroup backref lookup behaviour after recent refactoring - fix leak of device lists at module exit time - fix resolving backrefs for inline extent followed by prealloc - reset defrag ioctl buffer on memory allocation error" * tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix fscrypt name leak after failure to join log transaction btrfs: scrub: fix uninitialized return value in recover_scrub_rbio btrfs: fix resolving backrefs for inline extent followed by prealloc btrfs: fix trace event name typo for FLUSH_DELAYED_REFS btrfs: restore BTRFS_SEQ_LAST when looking up qgroup backref lookup btrfs: fix leak of fs devices after removing btrfs module btrfs: fix an error handling path in btrfs_defrag_leaves() btrfs: fix an error handling path in btrfs_rename()
| * | | | btrfs: fix fscrypt name leak after failure to join log transactionFilipe Manana2022-12-201-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When logging a new name, we don't expect to fail joining a log transaction since we know at least one of the inodes was logged before in the current transaction. However if we fail for some unexpected reason, we end up not freeing the fscrypt name we previously allocated. So fix that by freeing the name in case we failed to join a log transaction. Fixes: ab3c5c18e8fa ("btrfs: setup qstr from dentrys using fscrypt helper") Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: scrub: fix uninitialized return value in recover_scrub_rbioJosef Bacik2022-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 75b470332965 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap") introduced an uninitialized return variable. This can be caught by gcc 12.1 by -Wmaybe-uninitialized: CC [M] fs/btrfs/raid56.o fs/btrfs/raid56.c: In function ‘scrub_rbio’: fs/btrfs/raid56.c:2801:15: warning: ‘ret’ may be used uninitialized [-Wmaybe-uninitialized] 2801 | ret = recover_scrub_rbio(rbio); | ^~~~~~~~~~~~~~~~~~~~~~~~ fs/btrfs/raid56.c:2649:13: note: ‘ret’ was declared here 2649 | int ret; The warning is disabled by default so we haven't caught that. Due to the bug the raid56 scrub fstests have been failing since the patch was merged, so initialize that. Fixes: 75b470332965 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap") Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: fix resolving backrefs for inline extent followed by preallocBoris Burkov2022-12-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a file consists of an inline extent followed by a regular or prealloc extent, then a legitimate attempt to resolve a logical address in the non-inline region will result in add_all_parents reading the invalid offset field of the inline extent. If the inline extent item is placed in the leaf eb s.t. it is the first item, attempting to access the offset field will not only be meaningless, it will go past the end of the eb and cause this panic: [17.626048] BTRFS warning (device dm-2): bad eb member end: ptr 0x3fd4 start 30834688 member offset 16377 size 8 [17.631693] general protection fault, probably for non-canonical address 0x5088000000000: 0000 [#1] SMP PTI [17.635041] CPU: 2 PID: 1267 Comm: btrfs Not tainted 5.12.0-07246-g75175d5adc74-dirty #199 [17.637969] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [17.641995] RIP: 0010:btrfs_get_64+0xe7/0x110 [17.649890] RSP: 0018:ffffc90001f73a08 EFLAGS: 00010202 [17.651652] RAX: 0000000000000001 RBX: ffff88810c42d000 RCX: 0000000000000000 [17.653921] RDX: 0005088000000000 RSI: ffffc90001f73a0f RDI: 0000000000000001 [17.656174] RBP: 0000000000000ff9 R08: 0000000000000007 R09: c0000000fffeffff [17.658441] R10: ffffc90001f73790 R11: ffffc90001f73788 R12: ffff888106afe918 [17.661070] R13: 0000000000003fd4 R14: 0000000000003f6f R15: cdcdcdcdcdcdcdcd [17.663617] FS: 00007f64e7627d80(0000) GS:ffff888237c80000(0000) knlGS:0000000000000000 [17.666525] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [17.668664] CR2: 000055d4a39152e8 CR3: 000000010c596002 CR4: 0000000000770ee0 [17.671253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17.673634] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [17.676034] PKRU: 55555554 [17.677004] Call Trace: [17.677877] add_all_parents+0x276/0x480 [17.679325] find_parent_nodes+0xfae/0x1590 [17.680771] btrfs_find_all_leafs+0x5e/0xa0 [17.682217] iterate_extent_inodes+0xce/0x260 [17.683809] ? btrfs_inode_flags_to_xflags+0x50/0x50 [17.685597] ? iterate_inodes_from_logical+0xa1/0xd0 [17.687404] iterate_inodes_from_logical+0xa1/0xd0 [17.689121] ? btrfs_inode_flags_to_xflags+0x50/0x50 [17.691010] btrfs_ioctl_logical_to_ino+0x131/0x190 [17.692946] btrfs_ioctl+0x104a/0x2f60 [17.694384] ? selinux_file_ioctl+0x182/0x220 [17.695995] ? __x64_sys_ioctl+0x84/0xc0 [17.697394] __x64_sys_ioctl+0x84/0xc0 [17.698697] do_syscall_64+0x33/0x40 [17.700017] entry_SYSCALL_64_after_hwframe+0x44/0xae [17.701753] RIP: 0033:0x7f64e72761b7 [17.709355] RSP: 002b:00007ffefb067f58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [17.712088] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f64e72761b7 [17.714667] RDX: 00007ffefb067fb0 RSI: 00000000c0389424 RDI: 0000000000000003 [17.717386] RBP: 00007ffefb06d188 R08: 000055d4a390d2b0 R09: 00007f64e7340a60 [17.719938] R10: 0000000000000231 R11: 0000000000000246 R12: 0000000000000001 [17.722383] R13: 0000000000000000 R14: 00000000c0389424 R15: 000055d4a38fd2a0 [17.724839] Modules linked in: Fix the bug by detecting the inline extent item in add_all_parents and skipping to the next extent item. CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: restore BTRFS_SEQ_LAST when looking up qgroup backref lookupJosef Bacik2022-12-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the patch a2c8d27e5ee8 ("btrfs: use a structure to pass arguments to backref walking functions") Filipe converted everybody to using a new context struct to use for backref lookups, but accidentally dropped the BTRFS_SEQ_LAST usage that exists for qgroups. Add this back so we have the previous behavior. Fixes: a2c8d27e5ee8 ("btrfs: use a structure to pass arguments to backref walking functions") Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: fix leak of fs devices after removing btrfs moduleFilipe Manana2022-12-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When removing the btrfs module we are not calling btrfs_cleanup_fs_uuids() which results in leaking btrfs_fs_devices structures and other resources. This is a regression recently introduced by a refactoring of the module initialization and exit sequence, which simply removed the call to btrfs_cleanup_fs_uuids() in the exit path, resulting in the leaks. So fix this by calling btrfs_cleanup_fs_uuids() at exit_btrfs_fs(). Fixes: 5565b8e0adcd ("btrfs: make module init/exit match their sequence") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: fix an error handling path in btrfs_defrag_leaves()Christophe JAILLET2022-12-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All error handling paths end to 'out', except this memory allocation failure. This is spurious. So branch to the error handling path also in this case. It will add a call to: memset(&root->defrag_progress, 0, sizeof(root->defrag_progress)); Fixes: 6702ed490ca0 ("Btrfs: Add run time btree defrag, and an ioctl to force btree defrag") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: fix an error handling path in btrfs_rename()Christophe JAILLET2022-12-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If new_whiteout_inode() fails, some resources need to be freed. Add the missing goto to the error handling path. Fixes: ab3c5c18e8fa ("btrfs: setup qstr from dentrys using fscrypt helper") Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | | fs/ntfs3: don't hold ni_lock when calling truncate_setsize()Tetsuo Handa2023-01-021-2/+2
| |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot is reporting hung task at do_user_addr_fault() [1], for there is a silent deadlock between PG_locked bit and ni_lock lock. Since filemap_update_page() calls filemap_read_folio() after calling folio_trylock() which will set PG_locked bit, ntfs_truncate() must not call truncate_setsize() which will wait for PG_locked bit to be cleared when holding ni_lock lock. Link: https://lore.kernel.org/all/00000000000060d41f05f139aa44@google.com/ Link: https://syzkaller.appspot.com/bug?extid=bed15dbf10294aa4f2ae [1] Reported-by: syzbot <syzbot+bed15dbf10294aa4f2ae@syzkaller.appspotmail.com> Debugged-by: Linus Torvalds <torvalds@linux-foundation.org> Co-developed-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | treewide: Convert del_timer*() to timer_shutdown*()Steven Rostedt (Google)2022-12-252-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed. The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case. This was created by using a coccinelle script and the following commands: $ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); | - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); | kmem_cache_free(slab, ptr); | kfree(ptr); ) $ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'pstore-v6.2-rc1-fixes' of ↵Linus Torvalds2022-12-233-4/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fixes from Kees Cook: - Switch pmsg_lock to an rt_mutex to avoid priority inversion (John Stultz) - Correctly assign mem_type property (Luca Stefani) * tag 'pstore-v6.2-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: Properly assign mem_type property pstore: Make sure CONFIG_PSTORE_PMSG selects CONFIG_RT_MUTEXES pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion
| * | | | pstore: Properly assign mem_type propertyLuca Stefani2022-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If mem-type is specified in the device tree it would end up overriding the record_size field instead of populating mem_type. As record_size is currently parsed after the improper assignment with default size 0 it continued to work as expected regardless of the value found in the device tree. Simply changing the target field of the struct is enough to get mem-type working as expected. Fixes: 9d843e8fafc7 ("pstore: Add mem_type property DT parsing support") Cc: stable@vger.kernel.org Signed-off-by: Luca Stefani <luca@osomprivacy.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221222131049.286288-1-luca@osomprivacy.com
| * | | | pstore: Make sure CONFIG_PSTORE_PMSG selects CONFIG_RT_MUTEXESJohn Stultz2022-12-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 76d62f24db07 ("pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion") I changed a lock to an rt_mutex. However, its possible that CONFIG_RT_MUTEXES is not enabled, which then results in a build failure, as the 0day bot detected: https://lore.kernel.org/linux-mm/202212211244.TwzWZD3H-lkp@intel.com/ Thus this patch changes CONFIG_PSTORE_PMSG to select CONFIG_RT_MUTEXES, which ensures the build will not fail. Cc: Wei Wang <wvw@google.com> Cc: Midas Chien<midaschieh@google.com> Cc: Connor O'Brien <connoro@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: kernel test robot <lkp@intel.com> Cc: kernel-team@android.com Fixes: 76d62f24db07 ("pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221221051855.15761-1-jstultz@google.com
| * | | | pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversionJohn Stultz2022-12-141-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wei Wang reported seeing priority inversion caused latencies caused by contention on pmsg_lock, and suggested it be switched to a rt_mutex. I was initially hesitant this would help, as the tasks in that trace all seemed to be SCHED_NORMAL, so the benefit would be limited to only nice boosting. However, another similar issue was raised where the priority inversion was seen did involve a blocked RT task so it is clear this would be helpful in that case. Cc: Wei Wang <wvw@google.com> Cc: Midas Chien<midaschieh@google.com> Cc: Connor O'Brien <connoro@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: kernel-team@android.com Fixes: 9d5438f462ab ("pstore: Add pmsg - user-space accessible pstore object") Reported-by: Wei Wang <wvw@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221214231834.3711880-1-jstultz@google.com
* | | | | Merge tag '9p-for-6.2-rc1' of https://github.com/martinetd/linuxLinus Torvalds2022-12-239-9/+0
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull 9p updates from Dominique Martinet: - improve p9_check_errors to check buffer size instead of msize when possible (e.g. not zero-copy) - some more syzbot and KCSAN fixes - minor headers include cleanup * tag '9p-for-6.2-rc1' of https://github.com/martinetd/linux: 9p/client: fix data race on req->status net/9p: fix response size check in p9_check_errors() net/9p: distinguish zero-copy requests 9p/xen: do not memcpy header into req->rc 9p: set req refcount to zero to avoid uninitialized usage 9p/net: Remove unneeded idr.h #include 9p/fs: Remove unneeded idr.h #include