summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'fs_for_v6.5-rc1' of ↵Linus Torvalds2023-06-2929-290/+465
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc filesystem updates from Jan Kara: - Rewrite kmap_local() handling in ext2 - Convert ext2 direct IO path to iomap (with some infrastructure tweaks associated with that) - Convert two boilerplate licenses in udf to SPDX identifiers - Other small udf, ext2, and quota fixes and cleanups * tag 'fs_for_v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix uninitialized array access for some pathnames ext2: Drop fragment support quota: fix warning in dqgrab() quota: Properly disable quotas when add_dquot_ref() fails fs: udf: udftime: Replace LGPL boilerplate with SPDX identifier fs: udf: Replace GPL 2.0 boilerplate license notice with SPDX identifier fs: Drop wait_unfrozen wait queue ext2_find_entry()/ext2_dotdot(): callers don't need page_addr anymore ext2_{set_link,delete_entry}(): don't bother with page_addr ext2_put_page(): accept any pointer within the page ext2_get_page(): saner type ext2: use offset_in_page() instead of open-coding it as subtraction ext2_rename(): set_link and delete_entry may fail ext2: Add direct-io trace points ext2: Move direct-io to use iomap ext2: Use generic_buffers_fsync() implementation ext4: Use generic_buffers_fsync_noflush() implementation fs/buffer.c: Add generic_buffers_fsync*() implementation ext2/dax: Fix ext2_setsize when len is page aligned
| * udf: Fix uninitialized array access for some pathnamesJan Kara2023-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | For filenames that begin with . and are between 2 and 5 characters long, UDF charset conversion code would read uninitialized memory in the output buffer. The only practical impact is that the name may be prepended a "unification hash" when it is not actually needed but still it is good to fix this. Reported-by: syzbot+cd311b1e43cc25f90d18@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/000000000000e2638a05fe9dc8f9@google.com Signed-off-by: Jan Kara <jack@suse.cz>
| * ext2: Drop fragment supportJan Kara2023-06-132-31/+4
| | | | | | | | | | | | | | | | Ext2 has fields in superblock reserved for subblock allocation support. However that never landed. Drop the many years dead code. Reported-by: syzbot+af5e10f73dbff48f70af@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz>
| * quota: fix warning in dqgrab()Ye Bin2023-06-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's issue as follows when do fault injection: WARNING: CPU: 1 PID: 14870 at include/linux/quotaops.h:51 dquot_disable+0x13b7/0x18c0 Modules linked in: CPU: 1 PID: 14870 Comm: fsconfig Not tainted 6.3.0-next-20230505-00006-g5107a9c821af-dirty #541 RIP: 0010:dquot_disable+0x13b7/0x18c0 RSP: 0018:ffffc9000acc79e0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88825e41b980 RDX: 0000000000000000 RSI: ffff88825e41b980 RDI: 0000000000000002 RBP: ffff888179f68000 R08: ffffffff82087ca7 R09: 0000000000000000 R10: 0000000000000001 R11: ffffed102f3ed026 R12: ffff888179f68130 R13: ffff888179f68110 R14: dffffc0000000000 R15: ffff888179f68118 FS: 00007f450a073740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe96f2efd8 CR3: 000000025c8ad000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dquot_load_quota_sb+0xd53/0x1060 dquot_resume+0x172/0x230 ext4_reconfigure+0x1dc6/0x27b0 reconfigure_super+0x515/0xa90 __x64_sys_fsconfig+0xb19/0xd20 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Above issue may happens as follows: ProcessA ProcessB ProcessC sys_fsconfig vfs_fsconfig_locked reconfigure_super ext4_remount dquot_suspend -> suspend all type quota sys_fsconfig vfs_fsconfig_locked reconfigure_super ext4_remount dquot_resume ret = dquot_load_quota_sb add_dquot_ref do_open -> open file O_RDWR vfs_open do_dentry_open get_write_access atomic_inc_unless_negative(&inode->i_writecount) ext4_file_open dquot_file_open dquot_initialize __dquot_initialize dqget atomic_inc(&dquot->dq_count); __dquot_initialize __dquot_initialize dqget if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) ext4_acquire_dquot -> Return error DQ_ACTIVE_B flag isn't set dquot_disable invalidate_dquots if (atomic_read(&dquot->dq_count)) dqgrab WARN_ON_ONCE(!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) -> Trigger warning In the above scenario, 'dquot->dq_flags' has no DQ_ACTIVE_B is normal when dqgrab(). To solve above issue just replace the dqgrab() use in invalidate_dquots() with atomic_inc(&dquot->dq_count). Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230605140731.2427629-3-yebin10@huawei.com>
| * quota: Properly disable quotas when add_dquot_ref() failsJan Kara2023-06-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When add_dquot_ref() fails (usually due to IO error or ENOMEM), we want to disable quotas we are trying to enable. However dquot_disable() call was passed just the flags we are enabling so in case flags == DQUOT_USAGE_ENABLED dquot_disable() call will just fail with EINVAL instead of properly disabling quotas. Fix the problem by always passing DQUOT_LIMITS_ENABLED | DQUOT_USAGE_ENABLED to dquot_disable() in this case. Reported-and-tested-by: Ye Bin <yebin10@huawei.com> Reported-by: syzbot+e633c79ceaecbf479854@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230605140731.2427629-2-yebin10@huawei.com>
| * fs: udf: udftime: Replace LGPL boilerplate with SPDX identifierBagas Sanjaya2023-05-301-16/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Replace license boilerplate in udftime.c with SPDX identifier for LGPL-2.0. Cc: Paul Eggert <eggert@twinsun.com> Cc: Richard Fontana <rfontana@redhat.com> Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230522005434.22133-3-bagasdotme@gmail.com>
| * fs: udf: Replace GPL 2.0 boilerplate license notice with SPDX identifierBagas Sanjaya2023-05-3014-70/+14
| | | | | | | | | | | | | | | | | | | | | | | | The notice refers to full GPL 2.0 text on now defunct MIT FTP site [1]. Replace it with appropriate SPDX license identifier. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Pali Rohár <pali@kernel.org> Link: https://web.archive.org/web/20020809115410/ftp://prep.ai.mit.edu/pub/gnu/GPL [1] Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230522005434.22133-2-bagasdotme@gmail.com>
| * fs: Drop wait_unfrozen wait queueJan Kara2023-05-302-6/+3
| | | | | | | | | | | | | | | | | | | | | | wait_unfrozen waitqueue is used only in quota code to wait for filesystem to become unfrozen. In that place we can just use sb_start_write() - sb_end_write() pair to achieve the same. So just remove the waitqueue. Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230525141710.7595-1-jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
| * ext2_find_entry()/ext2_dotdot(): callers don't need page_addr anymoreAl Viro2023-05-293-39/+21
| | | | | | | | | | | | | | | | | | ... and that's how it should've been done in the first place Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * ext2_{set_link,delete_entry}(): don't bother with page_addrAl Viro2023-05-293-15/+11
| | | | | | | | | | | | | | | | | | | | ext2_set_link() simply doesn't use it anymore and ext2_delete_entry() can easily obtain it from the directory entry pointer... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * ext2_put_page(): accept any pointer within the pageAl Viro2023-05-292-25/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | eliminates the need to keep the pointer to the first byte within the page if we are guaranteed to have pointers to some byte in the same page at hand. Don't backport without commit 88d7b12068b9 ("highmem: round down the address passed to kunmap_flush_on_unmap()"). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * ext2_get_page(): saner typeAl Viro2023-05-291-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to pass to caller both the page reference and pointer to the first byte in the now-mapped page. The former always has the same type, the latter varies from caller to caller. So make it void *ext2_get_page(...., struct page **page) rather than struct page *ext2_get_page(..., void **page_addr) and avoid the casts... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * ext2: use offset_in_page() instead of open-coding it as subtractionAl Viro2023-05-291-8/+6
| | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * ext2_rename(): set_link and delete_entry may failAl Viro2023-05-292-25/+16
| | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * ext2: Add direct-io trace pointsRitesh Harjani (IBM)2023-05-164-2/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the trace point to ext2 direct-io apis in fs/ext2/file.c Here is how the output looks like a.out-467865 [006] 6758.170968: ext2_dio_write_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT|WRITE aio 1 ret 0 a.out-467865 [006] 6758.171061: ext2_dio_write_end: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 0 flags DIRECT|WRITE aio 1 ret -529 kworker/3:153-444162 [003] 6758.171252: ext2_dio_write_endio: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT|WRITE aio 1 ret 0 a.out-468222 [001] 6761.628924: ext2_dio_read_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 1 ret 0 a.out-468222 [001] 6761.629063: ext2_dio_read_end: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 0 flags DIRECT aio 1 ret -529 a.out-468428 [005] 6763.937454: ext2_dio_write_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 0 ret 0 a.out-468428 [005] 6763.937829: ext2_dio_write_endio: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 0 ret 0 a.out-468428 [005] 6763.937847: ext2_dio_write_end: dev 7:12 ino 0xe isize 0x1000 pos 0x1000 len 0 flags DIRECT aio 0 ret 4096 a.out-468609 [000] 6765.702878: ext2_dio_read_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 0 ret 0 a.out-468609 [000] 6765.703243: ext2_dio_read_end: dev 7:12 ino 0xe isize 0x1000 pos 0x1000 len 0 flags DIRECT aio 0 ret 4096 Reported-and-tested-by: Disha Goel <disgoel@linux.ibm.com> [Need to add CFLAGS_trace for fixing unable to find trace file problem] Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <b8b0897fa2b273a448d7b4ba7317357ac73c08bc.1682069716.git.ritesh.list@gmail.com>
| * ext2: Move direct-io to use iomapRitesh Harjani (IBM)2023-05-163-19/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch converts ext2 direct-io path to iomap interface. - This also takes care of DIO_SKIP_HOLES part in which we return -ENOTBLK from ext2_iomap_begin(), in case if the write is done on a hole. - This fallbacks to buffered-io in case of DIO_SKIP_HOLES or in case of a partial write or if any error is detected in ext2_iomap_end(). We try to return -ENOTBLK in such cases. - For any unaligned or extending DIO writes, we pass IOMAP_DIO_FORCE_WAIT flag to ensure synchronous writes. - For extending writes we set IOMAP_F_DIRTY in ext2_iomap_begin because otherwise with dsync writes on devices that support FUA, generic_write_sync won't be called and we might miss inode metadata updates. - Since ext2 already now uses _nolock vartiant of sync write. Hence there is no inode lock problem with iomap in this patch. - ext2_iomap_ops are now being shared by DIO, DAX & fiemap path Tested-by: Disha Goel <disgoel@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <610b672a52f2a7ff6dc550fd14d0f995806232a5.1682069716.git.ritesh.list@gmail.com>
| * ext2: Use generic_buffers_fsync() implementationRitesh Harjani (IBM)2023-05-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Next patch converts ext2 to use iomap interface for DIO. iomap layer can call generic_write_sync() -> ext2_fsync() from iomap_dio_complete while still holding the inode_lock(). Now writeback from other paths doesn't need inode_lock(). It seems there is also no need of an inode_lock() for sync_mapping_buffers(). It uses it's own mapping->private_lock for it's buffer list handling. Hence this patch is in preparation to move ext2 to iomap. This uses generic_buffers_fsync() which does not take any inode_lock() in ext2_fsync(). Tested-by: Disha Goel <disgoel@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <76d206a464574ff91db25bc9e43479b51ca7e307.1682069716.git.ritesh.list@gmail.com>
| * ext4: Use generic_buffers_fsync_noflush() implementationRitesh Harjani (IBM)2023-05-161-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4 when got converted to iomap for dio, it copied __generic_file_fsync implementation to avoid taking inode_lock in order to avoid any deadlock (since iomap takes an inode_lock while calling generic_write_sync()). The previous patch already added generic_buffers_fsync*() which does not take any inode_lock(). Hence kill the redundant code and use generic_buffers_fsync_noflush() function instead. Tested-by: Disha Goel <disgoel@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <b43d4bb4403061ed86510c9587673e30a461ba14.1682069716.git.ritesh.list@gmail.com>
| * fs/buffer.c: Add generic_buffers_fsync*() implementationRitesh Harjani (IBM)2023-05-161-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the higher layers like iomap takes inode_lock() when calling generic_write_sync(). Also writeback already happens from other paths without inode lock, so it's difficult to say that we really need sync_mapping_buffers() to take any inode locking here. Having said that, let's add generic_buffers_fsync/_noflush() implementation in buffer.c with no inode_lock/unlock() for now so that filesystems like ext2 and ext4's nojournal mode can use it. Ext4 when got converted to iomap for direct-io already copied it's own variant of __generic_file_fsync() without lock. This patch adds generic_buffers_fsync() & generic_buffers_fsync_noflush() implementations for use in filesystems like ext2 & ext4 respectively. Later we can review other filesystems as well to see if we can make generic_buffers_fsync/_noflush() which does not take any inode_lock() as the default path. Tested-by: Disha Goel <disgoel@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <d573408ac8408627d23a3d2d166e748c172c4c9e.1682069716.git.ritesh.list@gmail.com>
| * ext2/dax: Fix ext2_setsize when len is page alignedRitesh Harjani (IBM)2023-05-161-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 #139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc CC: stable@vger.kernel.org Fixes: 2aa3048e03d3 ("iomap: switch iomap_zero_range to use iomap_iter") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <046a58317f29d9603d1068b2bbae47c2332c17ae.1682069716.git.ritesh.list@gmail.com>
* | Merge tag 'fsnotify_for_v6.5-rc1' of ↵Linus Torvalds2023-06-298-31/+63
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: - Support for fanotify events returning file handles for filesystems not exportable via NFS - Improved error handling exportfs functions - Add missing FS_OPEN events when unusual open helpers are used * tag 'fsnotify_for_v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: move fsnotify_open() hook into do_dentry_open() exportfs: check for error return value from exportfs_encode_*() fanotify: support reporting non-decodeable file handles exportfs: allow exporting non-decodeable file handles to userspace exportfs: add explicit flag to request non-decodeable file handles exportfs: change connectable argument to bit flags
| * | fsnotify: move fsnotify_open() hook into do_dentry_open()Amir Goldstein2023-06-123-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fsnotify_open() hook is called only from high level system calls context and not called for the very many helpers to open files. This may makes sense for many of the special file open cases, but it is inconsistent with fsnotify_close() hook that is called for every last fput() of on a file object with FMODE_OPENED. As a result, it is possible to observe ACCESS, MODIFY and CLOSE events without ever observing an OPEN event. Fix this inconsistency by replacing all the fsnotify_open() hooks with a single hook inside do_dentry_open(). If there are special cases that would like to opt-out of the possible overhead of fsnotify() call in fsnotify_open(), they would probably also want to avoid the overhead of fsnotify() call in the rest of the fsnotify hooks, so they should be opening that file with the __FMODE_NONOTIFY flag. However, in the majority of those cases, the s_fsnotify_connectors optimization in fsnotify_parent() would be sufficient to avoid the overhead of fsnotify() call anyway. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230611122429.1499617-1-amir73il@gmail.com>
| * | exportfs: check for error return value from exportfs_encode_*()Amir Goldstein2023-05-253-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The exportfs_encode_*() helpers call the filesystem ->encode_fh() method which returns a signed int. All the in-tree implementations of ->encode_fh() return a positive integer and FILEID_INVALID (255) for error. Fortify the callers for possible future ->encode_fh() implementation that will return a negative error value. name_to_handle_at() would propagate the returned error to the users if filesystem ->encode_fh() method returns an error. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/linux-fsdevel/ca02955f-1877-4fde-b453-3c1d22794740@kili.mountain/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230524154825.881414-1-amir73il@gmail.com>
| * | fanotify: support reporting non-decodeable file handlesAmir Goldstein2023-05-252-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fanotify users do not always need to decode the file handles reported with FAN_REPORT_FID. Relax the restriction that filesystem needs to support NFS export and allow reporting file handles from filesystems that only support ecoding unique file handles. Even filesystems that do not have export_operations at all can fallback to use the default FILEID_INO32_GEN encoding, but we use the existence of export_operations as an indication that the encoded file handles will be sufficiently unique and that user will be able to compare them to filesystem objects using AT_HANDLE_FID flag to name_to_handle_at(2). For filesystems that do not support NFS export, users will have to use the AT_HANDLE_FID of name_to_handle_at(2) if they want to compare the object in path to the object fid reported in an event. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230502124817.3070545-5-amir73il@gmail.com>
| * | exportfs: allow exporting non-decodeable file handles to userspaceAmir Goldstein2023-05-251-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some userspace programs use st_ino as a unique object identifier, even though inode numbers may be recycable. This issue has been addressed for NFS export long ago using the exportfs file handle API and the unique file handle identifiers are also exported to userspace via name_to_handle_at(2). fanotify also uses file handles to identify objects in events, but only for filesystems that support NFS export. Relax the requirement for NFS export support and allow more filesystems to export a unique object identifier via name_to_handle_at(2) with the flag AT_HANDLE_FID. A file handle requested with the AT_HANDLE_FID flag, may or may not be usable as an argument to open_by_handle_at(2). To allow filesystems to opt-in to supporting AT_HANDLE_FID, a struct export_operations is required, but even an empty struct is sufficient for encoding FIDs. Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230502124817.3070545-4-amir73il@gmail.com>
| * | exportfs: add explicit flag to request non-decodeable file handlesAmir Goldstein2023-05-223-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, all callers of exportfs_encode_inode_fh(), except for fsnotify's show_mark_fhandle(), check that filesystem can decode file handles, but we would like to add more callers that do not require a file handle that can be decoded. Introduce a flag to explicitly request a file handle that may not to be decoded later and a wrapper exportfs_encode_fid() that sets this flag and convert show_mark_fhandle() to use the new wrapper. This will be used to allow adding fanotify support to filesystems that do not support NFS export. Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230502124817.3070545-3-amir73il@gmail.com>
| * | exportfs: change connectable argument to bit flagsAmir Goldstein2023-05-222-4/+14
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the bool connectable arguemnt into a bit flags argument and define the EXPORT_FS_CONNECTABLE flag as a requested property of the file handle. We are going to add a flag for requesting non-decodeable file handles. Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230502124817.3070545-2-amir73il@gmail.com>
* | Merge tag 'dlm-6.5' of ↵Linus Torvalds2023-06-2913-222/+203
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "The dlm posix lock handling (for gfs2) has three notable changes: - Local pids returned from GETLK are no longer negated. A previous patch negating remote pids mistakenly changed local pids also. - SETLKW operations can now be interrupted only when the process is killed, and not from other signals. General interruption was resulting in previously acquired locks being cleared, not just the in-progress lock. Handling this correctly will require extending a cancel capability to user space (a future feature.) - If multiple threads are requesting posix locks (with SETLKW), fix incorrect matching of results to the requests. The dlm networking has several minor cleanups, and one notable change: - Avoid delaying ack messages for too long (used for message reliability), resulting in a backlog of un-acked messages. These could previously be delayed as a result of either too many or too few other messages being sent. Now an upper and lower threshold is used to determine when an ack should be sent" * tag 'dlm-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: remove filter local comms on close fs: dlm: add send ack threshold and append acks to msgs fs: dlm: handle sequence numbers as atomic fs: dlm: handle lkb wait count as atomic_t fs: dlm: filter ourself midcomms calls fs: dlm: warn about messages from left nodes fs: dlm: move dlm_purge_lkb_callbacks to user module fs: dlm: cleanup STOP_IO bitflag set when stop io fs: dlm: don't check othercon twice fs: dlm: unregister memory at the very last fs: dlm: fix missing pending to false fs: dlm: clear pending bit when queue was empty fs: dlm: revert check required context while close fs: dlm: fix mismatch of plock results from userspace fs: dlm: make F_SETLK use unkillable wait_event fs: dlm: interrupt posix locks only when process is killed fs: dlm: fix cleanup pending ops when interrupted fs: dlm: return positive pid value for F_GETLK dlm: Replace all non-returning strlcpy with strscpy
| * | fs: dlm: remove filter local comms on closeAlexander Aring2023-06-201-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current way how lowcomms is configured is due configfs entries. Each comms configfs entry will create a lowcomms connection. Even the local connection itself will be stored as a lowcomms connection, although most functionality for a local lowcomms connection struct is not necessary. Now in some scenarios we will see that dlm_controld reports a -EEXIST when configure a node via configfs: ... /sys/kernel/config/dlm/cluster/comms/1/addr: write failed: 17 -1 Doing a: cat /sys/kernel/config/dlm/cluster/comms/1/addr_list reported nothing. This was being seen on cluster with nodeid 1 and it's local configuration. To be sure the configfs entries are in sync with lowcomms connection structures we always call dlm_midcomms_close() to be sure the lowcomms connection gets removed when the configfs entry gets dropped. Before commit 07ee38674a0b ("fs: dlm: filter ourself midcomms calls") it was just doing this by accident and the filter by doing: if (nodeid == dlm_our_nodeid()) return 0; inside dlm_midcomms_close() was never been hit because drop_comm() sets local_comm to NULL and cause that dlm_our_nodeid() returns always the invalid nodeid 0. Fixes: 07ee38674a0b ("fs: dlm: filter ourself midcomms calls") Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: add send ack threshold and append acks to msgsAlexander Aring2023-06-142-75/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the time when we sending an ack back to tell the other side it can free some message because it is arrived on the receiver node, due random reconnects e.g. TCP resets this is handled as well on application layer to not let DLM run into a deadlock state. The current handling has the following problems: 1. We end in situations that we only send an ack back message of 16 bytes out and no other messages. Whereas DLM has logic to combine so much messages as it can in one send() socket call. This behaviour can be discovered by "trace-cmd start -e dlm_recv" and observing the ret field being 16 bytes. 2. When processing of DLM messages will never end because we receive a lot of messages, we will not send an ack back as it happens when the processing loop ends. This patch introduces a likely and unlikely threshold case. The likely case will send an ack back on a transmit path if the threshold is triggered of amount of processed upper layer protocol. This will solve issue 1 because it will be send when another normal DLM message will be sent. It solves issue 2 because it is not part of the processing loop. There is however a unlikely case, the unlikely case has a bigger threshold and will be triggered when we only receive messages and do not sent any message back. This case avoids that the sending node will keep a lot of message for a long time as we send sometimes ack backs to tell the sender to finally release messages. The atomic cmpxchg() is there to provide a atomically ack send with reset of the upper layer protocol delivery counter. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: handle sequence numbers as atomicAlexander Aring2023-06-141-15/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently seq_next is only be read on the receive side which processed in an ordered way. The seq_send is being protected by locks. To being able to read the seq_next value on send side as well we convert it to an atomic_t value. The atomic_cmpxchg() is probably not necessary, however the atomic_inc() depends on a if coniditional and this should be handled in an atomic context. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: handle lkb wait count as atomic_tAlexander Aring2023-06-142-19/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the lkb_wait_count is locked by the rsb lock and it should be fine to handle lkb_wait_count as non atomic_t value. However for the overall process of reducing locking this patch converts it to an atomic_t value. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: filter ourself midcomms callsAlexander Aring2023-06-144-20/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It makes no sense to call midcomms/lowcomms functionality for the local node as socket functionality is only required for remote nodes. This patch filters those calls in the upper layer of lockspace membership handling instead of doing it in midcomms/lowcomms layer as they should never be aware of local nodeid. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: warn about messages from left nodesAlexander Aring2023-06-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch warns about messages which are received from nodes who already left the lockspace resource signaled by the cluster manager. Before commit 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") there was a synchronization issue with the socket lifetime and the cluster event of leaving a lockspace and other nodes did not stop of sending messages because the cluster manager has a pending message to leave the lockspace. The reliable session layer for dlm use sequence numbers to ensure dlm message were never being dropped. If this is not corrected synchronized we have a problem, this patch will use the filter case and turn it into a WARN_ON_ONCE() so we seeing such issue on the kernel log because it should never happen now. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: move dlm_purge_lkb_callbacks to user moduleAlexander Aring2023-06-144-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the dlm_purge_lkb_callbacks() function from ast to user dlm module as it is only a function being used by dlm user implementation. I got be hinted to hold specific locks regarding the callback handling for dlm_purge_lkb_callbacks() but it was false positive. It is confusing because ast dlm implementation uses a different locking behaviour as user locks uses as DLM handles kernel and user dlm locks differently. To avoid the confusing we move this function to dlm user implementation. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: cleanup STOP_IO bitflag set when stop ioAlexander Aring2023-06-141-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There should no difference between setting the CF_IO_STOP flag before restore_callbacks() to do it before or afterwards. The restore_callbacks() will be sure that no callback is executed anymore when the bit wasn't set. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: don't check othercon twiceAlexander Aring2023-06-141-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | This patch removes an another check if con->othercon set inside the branch which already does that. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: unregister memory at the very lastAlexander Aring2023-06-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The dlm modules midcomms, debugfs, lockspace, uses kmem caches. We ensure that the kmem caches getting deallocated after those modules exited. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: fix missing pending to falseAlexander Aring2023-06-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch sets the process_dlm_messages_pending boolean to false when there was no message to process. It is a case which should not happen but if we are prepared to recover from this situation by setting pending boolean to false. Cc: stable@vger.kernel.org Fixes: dbb751ffab0b ("fs: dlm: parallelize lowcomms socket handling") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: clear pending bit when queue was emptyAlexander Aring2023-06-141-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch clears the DLM_IFL_CB_PENDING_BIT flag which will be set when there is callback work queued when there was no callback to dequeue. It is a buggy case and should never happen, that's why there is a WARN_ON(). However if the case happens we are prepared to somehow recover from it. Cc: stable@vger.kernel.org Fixes: 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: revert check required context while closeAlexander Aring2023-06-143-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reverts commit 2c3fa6ae4d52 ("dlm: check required context while close"). The function dlm_midcomms_close(), which will call later dlm_lowcomms_close(), is called when the cluster manager tells the node got fenced which means on midcomms/lowcomms layer to disconnect the node from the cluster communication. The node can rejoin the cluster later. This patch was ensuring no new message were able to be triggered when we are in the close() function context. This was done by checking if the lockspace has been stopped. However there is a missing check that we only need to check specific lockspaces where the fenced node is member of. This is currently complicated because there is no way to easily check if a node is part of a specific lockspace without stopping the recovery. For now we just revert this commit as it is just a check to finding possible leaks of stopping lockspaces before close() is called. Cc: stable@vger.kernel.org Fixes: 2c3fa6ae4d52 ("dlm: check required context while close") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: fix mismatch of plock results from userspaceAlexander Aring2023-05-241-13/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a waiting plock request (F_SETLKW) is sent to userspace for processing (dlm_controld), the result is returned at a later time. That result could be incorrectly matched to a different waiting request in cases where the owner field is the same (e.g. different threads in a process.) This is fixed by comparing all the properties in the request and reply. The results for non-waiting plock requests are now matched based on list order because the results are returned in the same order they were sent. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: make F_SETLK use unkillable wait_eventAlexander Aring2023-05-231-17/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While a non-waiting posix lock request (F_SETLK) is waiting for user space processing (in dlm_controld), wait for that processing to complete with an unkillable wait_event(). This makes F_SETLK behave the same way for F_RDLCK, F_WRLCK and F_UNLCK. F_SETLKW continues to use wait_event_killable(). Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: interrupt posix locks only when process is killedAlexander Aring2023-05-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a posix lock request is waiting for a result from user space (dlm_controld), do not let it be interrupted unless the process is killed. This reverts commit a6b1533e9a57 ("dlm: make posix locks interruptible"). The problem with the interruptible change is that all locks were cleared on any signal interrupt. If a signal was received that did not terminate the process, the process could continue running after all its dlm posix locks had been cleared. A future patch will add cancelation to allow proper interruption. Cc: stable@vger.kernel.org Fixes: a6b1533e9a57 ("dlm: make posix locks interruptible") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: fix cleanup pending ops when interruptedAlexander Aring2023-05-221-19/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Immediately clean up a posix lock request if it is interrupted while waiting for a result from user space (dlm_controld.) This largely reverts the recent commit b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling"). That previous commit attempted to defer lock cleanup to the point in time when a result from user space arrived. The deferred approach was not reliable because some dlm plock ops may not receive replies. Cc: stable@vger.kernel.org Fixes: b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | fs: dlm: return positive pid value for F_GETLKAlexander Aring2023-05-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GETLK pid values have all been negated since commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks"). Revert this for local pids, and leave in place negative pids for remote owners. Cc: stable@vger.kernel.org Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | dlm: Replace all non-returning strlcpy with strscpyAzeem Shaikh2023-05-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
* | | Merge tag 'xfs-6.5-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2023-06-295-14/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs updates from Darrick Wong: "There's not much going on this cycle -- the large extent counts feature graduated, so now users can create more extremely fragmented files! :P The rest are bug fixes; and I'll be sending more next week. - Fix a problem where shrink would blow out the space reserve by declining to shrink the filesystem - Drop the EXPERIMENTAL tag for the large extent counts feature - Set FMODE_CAN_ODIRECT and get rid of an address space op - Fix an AG count overflow bug in growfs if the new device size is redonkulously large" * tag 'xfs-6.5-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix ag count overflow during growfs xfs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method xfs: drop EXPERIMENTAL tag for large extent counts xfs: don't deplete the reserve pool when trying to shrink the fs
| * | | xfs: fix ag count overflow during growfsLong Li2023-06-132-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found a corruption during growfs: XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of file fs/xfs/libxfs/xfs_alloc.c. Caller __xfs_free_extent+0x28e/0x3c0 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: <TASK> dump_stack_lvl+0x50/0x70 xfs_corruption_error+0x134/0x150 __xfs_free_extent+0x2c1/0x3c0 xfs_ag_extend_space+0x291/0x3e0 xfs_growfs_data+0xd72/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd XFS (loop0): Corruption detected. Unmount and run xfs_repair XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_growfs_data+0x691/0xe90 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: <TASK> dump_stack_lvl+0x50/0x70 xfs_error_report+0x93/0xc0 xfs_trans_cancel+0x2c0/0x350 xfs_growfs_data+0x691/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f2d86706577 The bug can be reproduced with the following sequence: # truncate -s 1073741824 xfs_test.img # mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img # truncate -s 2305843009213693952 xfs_test.img # mount -o loop xfs_test.img /mnt/test # xfs_growfs -D 1125899907891200 /mnt/test The root cause is that during growfs, user space passed in a large value of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is too small, new AG count will exceed UINT_MAX. Because of AG number type is unsigned int and it would overflow, that caused nagcount much smaller than the actual value. During AG extent space, delta blocks in xfs_resizefs_init_new_ags() will much larger than the actual value due to incorrect nagcount, even exceed UINT_MAX. This will cause corruption and be detected in __xfs_free_extent. Fix it by growing the filesystem to up to the maximally allowed AGs and not return EINVAL when new AG count overflow. Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
| * | | xfs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO methodChristoph Hellwig2023-06-122-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag") file systems can just set the FMODE_CAN_ODIRECT flag at open time instead of wiring up a dummy direct_IO method to indicate support for direct I/O. Do that for xfs so that noop_direct_IO can eventually be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>