summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* afs: Avoid endless loop if file is larger than expectedMarc Dionne2023-05-021-0/+4
| | | | | | | | | | | | | | | | | afs_read_dir fetches an amount of data that's based on what the inode size is thought to be. If the file on the server is larger than what was fetched, the code rechecks i_size and retries. If the local i_size was not properly updated, this can lead to an endless loop of fetching i_size from the server and noticing each time that the size is larger on the server. If it is known that the remote size is larger than i_size, bump up the fetch size to that size. Fixes: f3ddee8dc4e2 ("afs: Fix directory handling") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org
* afs: Fix getattr to report server i_size on dirs, not local sizeDavid Howells2023-05-021-1/+8
| | | | | | | | | | | | | | | Fix afs_getattr() to report the server's idea of the file size of a directory rather than the local size. The local size may differ as we edit the local copy to avoid having to redownload it and we may end up with a differently structured blob of a different size. However, if the directory is discarded from the pagecache we then download it again and the user may see the directory file size apparently change. Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
* afs: Fix updating of i_size with dv jump from serverMarc Dionne2023-05-021-0/+1
| | | | | | | | | | | | | | If the data version returned from the server is larger than expected, the local data is invalidated, but we may still want to note the remote file size. Since we're setting change_size, we have to also set data_changed for the i_size to get updated. Fixes: 3f4aa9818163 ("afs: Fix EOF corruption") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org
* Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2023-05-014-36/+56
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Some ext4 regression and bug fixes" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: clean up error handling in __ext4_fill_super() ext4: reflect error codes from ext4_multi_mount_protect() to its callers ext4: fix lost error code reporting in __ext4_fill_super() ext4: fix unused iterator variable warnings ext4: fix use-after-free read in ext4_find_extent for bigalloc + inline ext4: fix i_disksize exceeding i_size problem in paritally written case
| * ext4: clean up error handling in __ext4_fill_super()Theodore Ts'o2023-04-281-22/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were two ways to return an error code; one was via setting the 'err' variable, and the second, if err was zero, was via the 'ret' variable. This was both confusing and fragile, and when code was factored out of __ext4_fill_super(), some of the error codes returned by the original code was replaced by -EINVAL, and in one case, the error code was placed by 0, triggering a kernel null pointer dereference. Clean this up by removing the 'ret' variable, leaving only one way to set the error code to be returned, and restore the errno codes that were returned via the the mount system call as they were before we started refactoring __ext4_fill_super(). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jason Yan <yanaijie@huawei.com>
| * ext4: reflect error codes from ext4_multi_mount_protect() to its callersTheodore Ts'o2023-04-282-8/+17
| | | | | | | | | | | | | | | | This will allow more fine-grained errno codes to be returned by the mount system call. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix lost error code reporting in __ext4_fill_super()Theodore Ts'o2023-04-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When code was factored out of __ext4_fill_super() into ext4_percpu_param_init() the error return was discarded. This meant that it was possible for __ext4_fill_super() to return zero, indicating success, without the struct super getting completely filled in, leading to a potential NULL pointer dereference. Reported-by: syzbot+bbf0f9a213c94f283a5c@syzkaller.appspotmail.com Fixes: 1f79467c8a6b ("ext4: factor out ext4_percpu_param_init() ...") Link: https://syzkaller.appspot.com/bug?id=6dac47d5e58af770c0055f680369586ec32e144c Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jason Yan <yanaijie@huawei.com>
| * ext4: fix unused iterator variable warningsNathan Chancellor2023-04-281-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_QUOTA is disabled, there are warnings around unused iterator variables: fs/ext4/super.c: In function 'ext4_put_super': fs/ext4/super.c:1262:13: error: unused variable 'i' [-Werror=unused-variable] 1262 | int i, err; | ^ fs/ext4/super.c: In function '__ext4_fill_super': fs/ext4/super.c:5200:22: error: unused variable 'i' [-Werror=unused-variable] 5200 | unsigned int i; | ^ cc1: all warnings being treated as errors The kernel has updated to GNU11, allowing the variables to be declared within the for loop. Do so to clear up the warnings. Fixes: dcbf87589d90 ("ext4: factor out ext4_flex_groups_free()") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jason Yan <yanaijie@huawei.com> Link: https://lore.kernel.org/r/20230420-ext4-unused-variables-super-c-v1-1-138b6db6c21c@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix use-after-free read in ext4_find_extent for bigalloc + inlineYe Bin2023-04-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syzbot found the following issue: loop0: detected capacity change from 0 to 2048 EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 without journal. Quota mode: none. ================================================================== BUG: KASAN: use-after-free in ext4_ext_binsearch_idx fs/ext4/extents.c:768 [inline] BUG: KASAN: use-after-free in ext4_find_extent+0x76e/0xd90 fs/ext4/extents.c:931 Read of size 4 at addr ffff888073644750 by task syz-executor420/5067 CPU: 0 PID: 5067 Comm: syz-executor420 Not tainted 6.2.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:306 print_report+0x107/0x1f0 mm/kasan/report.c:417 kasan_report+0xcd/0x100 mm/kasan/report.c:517 ext4_ext_binsearch_idx fs/ext4/extents.c:768 [inline] ext4_find_extent+0x76e/0xd90 fs/ext4/extents.c:931 ext4_clu_mapped+0x117/0x970 fs/ext4/extents.c:5809 ext4_insert_delayed_block fs/ext4/inode.c:1696 [inline] ext4_da_map_blocks fs/ext4/inode.c:1806 [inline] ext4_da_get_block_prep+0x9e8/0x13c0 fs/ext4/inode.c:1870 ext4_block_write_begin+0x6a8/0x2290 fs/ext4/inode.c:1098 ext4_da_write_begin+0x539/0x760 fs/ext4/inode.c:3082 generic_perform_write+0x2e4/0x5e0 mm/filemap.c:3772 ext4_buffered_write_iter+0x122/0x3a0 fs/ext4/file.c:285 ext4_file_write_iter+0x1d0/0x18f0 call_write_iter include/linux/fs.h:2186 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x7dc/0xc50 fs/read_write.c:584 ksys_write+0x177/0x2a0 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f4b7a9737b9 RSP: 002b:00007ffc5cac3668 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4b7a9737b9 RDX: 00000000175d9003 RSI: 0000000020000200 RDI: 0000000000000004 RBP: 00007f4b7a933050 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000079f R11: 0000000000000246 R12: 00007f4b7a9330e0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Above issue is happens when enable bigalloc and inline data feature. As commit 131294c35ed6 fixed delayed allocation bug in ext4_clu_mapped for bigalloc + inline. But it only resolved issue when has inline data, if inline data has been converted to extent(ext4_da_convert_inline_data_to_extent) before writepages, there is no EXT4_STATE_MAY_INLINE_DATA flag. However i_data is still store inline data in this scene. Then will trigger UAF when find extent. To resolve above issue, there is need to add judge "ext4_has_inline_data(inode)" in ext4_clu_mapped(). Fixes: 131294c35ed6 ("ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline") Reported-by: syzbot+bf4bb7731ef73b83a3b4@syzkaller.appspotmail.com Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Tested-by: Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://lore.kernel.org/r/20230406111627.1916759-1-tudor.ambarus@linaro.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix i_disksize exceeding i_size problem in paritally written caseZhihao Cheng2023-04-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible for i_disksize can exceed i_size, triggering a warning. generic_perform_write copied = iov_iter_copy_from_user_atomic(len) // copied < len ext4_da_write_end | ext4_update_i_disksize | new_i_size = pos + copied; | WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize) // update i_disksize | generic_write_end | copied = block_write_end(copied, len) // copied = 0 | if (unlikely(copied < len)) | if (!PageUptodate(page)) | copied = 0; | if (pos + copied > inode->i_size) // return false if (unlikely(copied == 0)) goto again; if (unlikely(iov_iter_fault_in_readable(i, bytes))) { status = -EFAULT; break; } We get i_disksize greater than i_size here, which could trigger WARNING check 'i_size_read(inode) < EXT4_I(inode)->i_disksize' while doing dio: ext4_dio_write_iter iomap_dio_rw __iomap_dio_rw // return err, length is not aligned to 512 ext4_handle_inode_extension WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize) // Oops WARNING: CPU: 2 PID: 2609 at fs/ext4/file.c:319 CPU: 2 PID: 2609 Comm: aa Not tainted 6.3.0-rc2 RIP: 0010:ext4_file_write_iter+0xbc7 Call Trace: vfs_write+0x3b1 ksys_write+0x77 do_syscall_64+0x39 Fix it by updating 'copied' value before updating i_disksize just like ext4_write_inline_data_end() does. A reproducer can be found in the buganizer link below. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217209 Fixes: 64769240bd07 ("ext4: Add delayed allocation support in data=writeback mode") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230321013721.89818-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | Merge tag '6.4-rc-smb3-client-fixes-part1' of ↵Linus Torvalds2023-05-018-151/+109
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: - deferred close fix for an important case when cached file should be closed immediately - two fixes for missing locks - eight minor cleanup * tag '6.4-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number for cifs.ko smb3: move some common open context structs to smbfs_common smb3: make query_on_disk_id open context consistent and move to common code SMB3.1.1: add new tree connect ShareFlags cifs: missing lock when updating session status SMB3: Close deferred file handles in case of handle lease break SMB3: Add missing locks to protect deferred close file list cifs: Avoid a cast in add_lease_context() cifs: Simplify SMB2_open_init() cifs: Simplify SMB2_open_init() cifs: Simplify SMB2_open_init()
| * | cifs: update internal module version number for cifs.koSteve French2023-04-281-2/+2
| | | | | | | | | | | | | | | | | | From 2.42 to 2.43 Signed-off-by: Steve French <stfrench@microsoft.com>
| * | smb3: move some common open context structs to smbfs_commonSteve French2023-04-283-36/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | create durable and create durable reconnect context and the maximal access create context struct definitions can be put in common code in smbfs_common Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | smb3: make query_on_disk_id open context consistent and move to common codeSteve French2023-04-284-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | cifs and ksmbd were using a slightly different version of the query_on_disk_id struct which could be confusing. Use the ksmbd version of this struct, and move it to common code. Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | SMB3.1.1: add new tree connect ShareFlagsSteve French2023-04-281-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also update these flag names in a few places to match the simpler easier to understand names now used in the protocol documentation (see MS-SMB2 section 2.2.10) Acked-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: missing lock when updating session statusSteve French2023-04-281-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity noted a place where we were not grabbing the ses_lock when setting (and checking) ses_status. Addresses-Coverity: 1536833 ("Data race condition (MISSING_LOCK)") Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | SMB3: Close deferred file handles in case of handle lease breakBharath SM2023-04-272-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should not cache deferred file handles if we dont have handle lease on a file. And we should immediately close all deferred handles in case of handle lease break. Fixes: 9e31678fb403 ("SMB3: fix lease break timeout when multiple deferred close handles for the same file.") Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | SMB3: Add missing locks to protect deferred close file listBharath SM2023-04-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cifs_del_deferred_close function has a critical section which modifies the deferred close file list. We must acquire deferred_lock before calling cifs_del_deferred_close function. Fixes: ca08d0eac020 ("cifs: Fix memory leak on the deferred close") Signed-off-by: Bharath SM <bharathsm@microsoft.com> Acked-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: Avoid a cast in add_lease_context()Volker Lendecke2023-04-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We have the correctly-typed struct smb2_create_req * available in the caller. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by Ralph Boehme <slow@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: Simplify SMB2_open_init()Volker Lendecke2023-04-231-17/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce code duplication by calculating req->CreateContextsLength in one place. This is the last reference to "req" in the add_*_context functions, remove that parameter. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: Simplify SMB2_open_init()Volker Lendecke2023-04-231-33/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce code duplication by stitching together create contexts in one place. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: Simplify SMB2_open_init()Volker Lendecke2023-04-231-32/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | We can point to the create contexts in just one place, we don't have to do this in every add_*_context routine. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | Merge tag '6.4-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds2023-04-296-443/+348
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ksmbd server updates from Steve French: - SMB3.1.1 negotiate context fixes and cleanup - new lock_rename_child VFS helper - ksmbd fix to avoid unlink race and to use the new VFS helper to avoid rename race * tag '6.4-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix racy issue from using ->d_parent and ->d_name ksmbd: remove unused compression negotiate ctx packing ksmbd: avoid duplicate negotiate ctx offset increments ksmbd: set NegotiateContextCount once instead of every inc fs: introduce lock_rename_child() helper ksmbd: remove internal.h include
| * | | ksmbd: fix racy issue from using ->d_parent and ->d_nameNamjae Jeon2023-04-245-386/+277
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_rename_child() to lock stable parent while underlying rename racy. Introduce vfs_path_parent_lookup helper to avoid out of share access and export vfs functions like the following ones to use vfs_path_parent_lookup(). - rename __lookup_hash() to lookup_one_qstr_excl(). - export lookup_one_qstr_excl(). - export getname_kernel() and putname(). vfs_path_parent_lookup() is used for parent lookup of destination file using absolute pathname given from FILE_RENAME_INFORMATION request. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | ksmbd: remove unused compression negotiate ctx packingDavid Disseldorp2023-04-241-24/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | build_compression_ctxt() is currently unreachable due to conn.compress_algorithm remaining zero (SMB3_COMPRESS_NONE). It appears to have been broken in a couple of subtle ways over the years: - prior to d6c9ad23b421 ("ksmbd: use the common definitions for NEGOTIATE_PROTOCOL") smb2_compression_ctx.DataLength was set to 8, which didn't account for the single CompressionAlgorithms flexible array member. - post d6c9ad23b421 smb2_compression_capabilities_context CompressionAlgorithms is a three member array, while CompressionAlgorithmCount is set to indicate only one member. assemble_neg_contexts() ctxt_size is also incorrectly incremented by sizeof(struct smb2_compression_capabilities_context) + 2, which assumes one flexible array member. Signed-off-by: David Disseldorp <ddiss@suse.de> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | ksmbd: avoid duplicate negotiate ctx offset incrementsDavid Disseldorp2023-04-241-16/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both pneg_ctxt and ctxt_size change in unison, with each adding the length of the previously added context, rounded up to an eight byte boundary. Drop pneg_ctxt increments and instead use the ctxt_size offset when passing output pointers to per-context helper functions. This slightly simplifies offset tracking and shaves off a few text bytes. Before (x86-64 gcc 7.5): text data bss dec hex filename 213234 8677 672 222583 36577 ksmbd.ko After: text data bss dec hex filename 213218 8677 672 222567 36567 ksmbd.ko Signed-off-by: David Disseldorp <ddiss@suse.de> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | ksmbd: set NegotiateContextCount once instead of every incDavid Disseldorp2023-04-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are no early returns, so marshalling the incremented NegotiateContextCount with every context is unnecessary. Signed-off-by: David Disseldorp <ddiss@suse.de> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | Merge tag 'pull-lock_rename_child' of ↵Steve French2023-04-243-15/+57
| |\ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into ksmbd-for-next lock_rename_child() (for ksmbd folks) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | fs: introduce lock_rename_child() helperAl Viro2023-04-201-11/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the dentry of a source file and the dentry of a destination directory to lock parent inodes for rename. As soon as this function returns, ->d_parent of the source file dentry is stable and inodes are properly locked for calling vfs-rename. This helper is needed for ksmbd server. rename request of SMB protocol has to rename an opened file, no matter which directory it's in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | ksmbd: remove internal.h includeNamjae Jeon2023-04-202-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since vfs_path_lookup is exported, It should not be internal. Move vfs_path_lookup prototype in internal.h to linux/namei.h. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds2023-04-2915-336/+450
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Chuck Lever: "The big ticket item for this release is that support for RPC-with-TLS [RFC 9289] has been added to the Linux NFS server. The goal is to provide a simple-to-deploy, low-overhead in-transit confidentiality and peer authentication mechanism. It can supplement NFS Kerberos and it can protect the use of legacy non-cryptographic user authentication flavors such as AUTH_SYS. The TLS Record protocol is handled entirely by kTLS, meaning it can use either software encryption or offload encryption to smart NICs. Aside from that, work continues on improving NFSD's open file cache. Among the many clean-ups in that area is a patch to convert the rhashtable to use the list-hashing version of that data structure" * tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits) NFSD: Handle new xprtsec= export option SUNRPC: Support TLS handshake in the server-side TCP socket code NFSD: Clean up xattr memory allocation flags NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop SUNRPC: Clear rq_xid when receiving a new RPC Call SUNRPC: Recognize control messages in server-side TCP socket code SUNRPC: Be even lazier about releasing pages SUNRPC: Convert svc_xprt_release() to the release_pages() API SUNRPC: Relocate svc_free_res_pages() nfsd: simplify the delayed disposal list code SUNRPC: Ignore return value of ->xpo_sendto SUNRPC: Ensure server-side sockets have a sock->file NFSD: Watch for rq_pages bounds checking errors in nfsd_splice_actor() sunrpc: simplify two-level sysctl registration for svcrdma_parm_table SUNRPC: return proper error from get_expiry() lockd: add some client-side tracepoints nfs: move nfs_fhandle_hash to common include file lockd: server should unlock lock if client rejects the grant lockd: fix races in client GRANTED_MSG wait logic lockd: move struct nlm_wait to lockd.h ...
| * | | | NFSD: Handle new xprtsec= export optionChuck Lever2023-04-272-3/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable administrators to require clients to use transport layer security when accessing particular exports. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFSD: Clean up xattr memory allocation flagsChuck Lever2023-04-271-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tetsuo Handa points out: > Since GFP_KERNEL is "GFP_NOFS | __GFP_FS", usage like > "GFP_KERNEL | GFP_NOFS" does not make sense. The original intent was to hold the inode lock while estimating the buffer requirements for the requested information. Frank van der Linden, the author of NFSD's xattr code, says: > ... you need inode_lock to get an atomic view of an xattr. Since > both nfsd_getxattr and nfsd_listxattr to the standard trick of > querying the xattr length with a NULL buf argument (just getting > the length back), allocating the right buffer size, and then > querying again, they need to hold the inode lock to avoid having > the xattr changed from under them while doing that. > > From that then flows the requirement that GFP_FS could cause > problems while holding i_rwsem, so I added GFP_NOFS. However, Dave Chinner states: > You can do GFP_KERNEL allocations holding the i_rwsem just fine. > All that it requires is the caller holds a reference to the > inode ... Since these code paths acquire a dentry, they do indeed hold a reference. It is therefore safe to use GFP_KERNEL for these memory allocations. In particular, that's what this code is already doing; but now the C source code looks sane too. At a later time we can revisit in order to remove the inode lock in favor of simply retrying if the estimated buffer size is too small. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loopDai Ngo2023-04-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following request sequence to the same file causes the NFS client and server getting into an infinite loop with COMMIT and NFS4ERR_DELAY: OPEN REMOVE WRITE COMMIT Problem reported by recall11, recall12, recall14, recall20, recall22, recall40, recall42, recall48, recall50 of nfstest suite. This patch restores the handling of race condition in nfsd_file_do_acquire with unlink to that prior of the regression. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | nfsd: simplify the delayed disposal list codeJeff Layton2023-04-261-42/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When queueing a dispose list to the appropriate "freeme" lists, it pointlessly queues the objects one at a time to an intermediate list. Remove a few helpers and just open code a list_move to make it more clear and efficient. Better document the resulting functions with kerneldoc comments. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFSD: Watch for rq_pages bounds checking errors in nfsd_splice_actor()Chuck Lever2023-04-261-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There have been several bugs over the years where the NFSD splice actor has attempted to write outside the rq_pages array. This is a "should never happen" condition, but if for some reason the pipe splice actor should attempt to walk past the end of rq_pages, it needs to terminate the READ operation to prevent corruption of the pointer addresses in the fields just beyond the array. A server crash is thus prevented. Since the code is not behaving, the READ operation returns -EIO to the client. None of the READ payload data can be trusted if the splice actor isn't operating as expected. Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
| * | | | SUNRPC: return proper error from get_expiry()NeilBrown2023-04-262-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The get_expiry() function currently returns a timestamp, and uses the special return value of 0 to indicate an error. Unfortunately this causes a problem when 0 is the correct return value. On a system with no RTC it is possible that the boot time will be seen to be "3". When exportfs probes to see if a particular filesystem supports NFS export it tries to cache information with an expiry time of "3". The intention is for this to be "long in the past". Even with no RTC it will not be far in the future (at most a second or two) so this is harmless. But if the boot time happens to have been calculated to be "3", then get_expiry will fail incorrectly as it converts the number to "seconds since bootime" - 0. To avoid this problem we change get_expiry() to report the error quite separately from the expiry time. The error is now the return value. The expiry time is reported through a by-reference parameter. Reported-by: Jerry Zhang <jerry@skydio.com> Tested-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | lockd: add some client-side tracepointsJeff Layton2023-04-265-2/+131
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | nfs: move nfs_fhandle_hash to common include fileJeff Layton2023-04-261-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockd needs to be able to hash filehandles for tracepoints. Move the nfs_fhandle_hash() helper to a common nfs include file. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | lockd: server should unlock lock if client rejects the grantJeff Layton2023-04-261-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently lockd just dequeues the block and ignores it if the client sends a GRANT_RES with a status of nlm_lck_denied. That status is an indicator that the client has rejected the lock, so the right thing to do is to unlock the lock we were trying to grant. Reported-by: Yongcheng Yang <yoyang@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2063818 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | lockd: fix races in client GRANTED_MSG wait logicJeff Layton2023-04-262-31/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the wait for a grant is done (for whatever reason), nlmclnt_block updates the status of the nlm_rqst with the status of the block. At the point it does this, however, the block is still queued its status could change at any time. This is particularly a problem when the waiting task is signaled during the wait. We can end up giving up on the lock just before the GRANTED_MSG callback comes in, and accept it even though the lock request gets back an error, leaving a dangling lock on the server. Since the nlm_wait never lives beyond the end of nlmclnt_lock, put it on the stack and add functions to allow us to enqueue and dequeue the block. Enqueue it just before the lock/wait loop, and dequeue it just after we exit the loop instead of waiting until the end of the function. Also, scrape the status at the time that we dequeue it to ensure that it's final. Reported-by: Yongcheng Yang <yoyang@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2063818 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | lockd: move struct nlm_wait to lockd.hJeff Layton2023-04-261-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next patch needs struct nlm_wait in fs/lockd/clntproc.c, so move the definition to a shared header file. As an added clean-up, drop the unused b_reclaim field. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | lockd: purge resources held on behalf of nlm clients when shutting downJeff Layton2023-04-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's easily possible for the server to have an outstanding lock when we go to shut down. When that happens, we often get a warning like this in the kernel log: lockd: couldn't shutdown host module for net f0000000! This is because the shutdown procedures skip removing any hosts that still have outstanding resources (locks). Eventually, things seem to get cleaned up anyway, but the log message is unsettling, and server shutdown doesn't seem to be working the way it was intended. Ensure that we tear down any resources held on behalf of a client when tearing one down for server shutdown. Reported-by: Yongcheng Yang <yoyang@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2063818 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFSD: Convert filecache to rhltableChuck Lever2023-04-262-187/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While we were converting the nfs4_file hashtable to use the kernel's resizable hashtable data structure, Neil Brown observed that the list variant (rhltable) would be better for managing nfsd_file items as well. The nfsd_file hash table will contain multiple entries for the same inode -- these should be kept together on a list. And, it could be possible for exotic or malicious client behavior to cause the hash table to resize itself on every insertion. A nice simplification is that rhltable_lookup() can return a list that contains only nfsd_file items that match a given inode, which enables us to eliminate specialized hash table helper functions and use the default functions provided by the rhashtable implementation). Since we are now storing nfsd_file items for the same inode on a single list, that effectively reduces the number of hash entries that have to be tracked in the hash table. The mininum bucket count is therefore lowered. Light testing with fstests generic/531 show no regressions. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | nfsd: allow reaping files still under writebackJeff Layton2023-04-262-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On most filesystems, there is no reason to delay reaping an nfsd_file just because its underlying inode is still under writeback. nfsd just relies on client activity or the local flusher threads to do writeback. The main exception is NFS, which flushes all of its dirty data on last close. Add a new EXPORT_OP_FLUSH_ON_CLOSE flag to allow filesystems to signal that they do this, and only skip closing files under writeback on such filesystems. Also, remove a redundant NULL file pointer check in nfsd_file_check_writeback, and clean up nfs's export op flag definitions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | nfsd: update comment over __nfsd_file_cache_purgeJeff Layton2023-04-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | nfsd: don't take/put an extra reference when putting a fileJeff Layton2023-04-261-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last thing that filp_close does is an fput, so don't bother taking and putting the extra reference. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | nfsd: add some comments to nfsd_file_do_acquireJeff Layton2023-04-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David Howells mentioned that he found this bit of code confusing, so sprinkle in some comments to clarify. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | nfsd: don't kill nfsd_files because of lease break errorJeff Layton2023-04-261-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An error from break_lease is non-fatal, so we needn't destroy the nfsd_file in that case. Just put the reference like we normally would and return the error. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparatorJeff Layton2023-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test_bit returns bool, so we can just compare the result of that to the key->gc value without the "!!". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>