summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag '6.7-rc-smb3-client-fixes-part2' of ↵Linus Torvalds2023-11-1116-88/+491
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - ctime caching fix (for setxattr) - encryption fix - DNS resolver mount fix - debugging improvements - multichannel fixes including cases where server stops or starts supporting multichannel after mount - reconnect fix - minor cleanups * tag '6.7-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number for cifs.ko cifs: handle when server stops supporting multichannel cifs: handle when server starts supporting multichannel Missing field not being returned in ioctl CIFS_IOC_GET_MNT_INFO smb3: allow dumping session and tcon id to improve stats analysis and debugging smb: client: fix mount when dns_resolver key is not available smb3: fix caching of ctime on setxattr smb3: minor cleanup of session handling code cifs: reconnect work should have reference on server struct cifs: do not pass cifs_sb when trying to add channels cifs: account for primary channel in the interface list cifs: distribute channels across interfaces based on speed cifs: handle cases where a channel is closed smb3: more minor cleanups for session handling routines smb3: minor RDMA cleanup cifs: Fix encryption of cleared, but unset rq_iter data buffers
| * cifs: update internal module version number for cifs.koSteve French2023-11-101-2/+2
| | | | | | | | | | | | From 2.45 to 2.46 Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: handle when server stops supporting multichannelShyam Prasad N2023-11-106-10/+145
| | | | | | | | | | | | | | | | | | | | When a server stops supporting multichannel, we will keep attempting reconnects to the secondary channels today. Avoid this by freeing extra channels when negotiate returns no multichannel support. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: handle when server starts supporting multichannelShyam Prasad N2023-11-103-2/+34
| | | | | | | | | | | | | | | | | | | | When the user mounts with multichannel option, but the server does not support it, there can be a time in future where it can be supported. With this change, such a case is handled. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
| * Missing field not being returned in ioctl CIFS_IOC_GET_MNT_INFOSteve French2023-11-101-0/+1
| | | | | | | | | | | | | | | | | | | | The tcon_flags field was always being set to zero in the information about the mount returned by the ioctl CIFS_IOC_GET_MNT_INFO instead of being set to the value of the Flags field in the tree connection structure as intended. Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: allow dumping session and tcon id to improve stats analysis and debuggingSteve French2023-11-102-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When multiple mounts are to the same share from the same client it was not possible to determine which section of /proc/fs/cifs/Stats (and DebugData) correspond to that mount. In some recent examples this turned out to be a significant problem when trying to analyze performance data - since there are many cases where unless we know the tree id and session id we can't figure out which stats (e.g. number of SMB3.1.1 requests by type, the total time they take, which is slowest, how many fail etc.) apply to which mount. The only existing loosely related ioctl CIFS_IOC_GET_MNT_INFO does not return the information needed to uniquely identify which tcon is which mount although it does return various flags and device info. Add a cifs.ko ioctl CIFS_IOC_GET_TCON_INFO (0x800ccf0c) to return tid, session id, tree connect count. Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb: client: fix mount when dns_resolver key is not availablePaulo Alcantara2023-11-093-7/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a wrong assumption that with CONFIG_CIFS_DFS_UPCALL=y there would always be a dns_resolver key set up so we could unconditionally upcall to resolve UNC hostname rather than using the value provided by mount(2). Only require it when performing automount of junctions within a DFS share so users that don't have dns_resolver key still can mount their regular shares with server hostname resolved by mount.cifs(8). Fixes: 348a04a8d113 ("smb: client: get rid of dfs code dep in namespace.c") Cc: stable@vger.kernel.org Tested-by: Eduard Bachmakov <e.bachmakov@gmail.com> Reported-by: Eduard Bachmakov <e.bachmakov@gmail.com> Closes: https://lore.kernel.org/all/CADCRUiNvZuiUZ0VGZZO9HRyPyw6x92kiA7o7Q4tsX5FkZqUkKg@mail.gmail.com/ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: fix caching of ctime on setxattrSteve French2023-11-091-1/+4
| | | | | | | | | | | | | | | | | | | | Fixes xfstest generic/728 which had been failing due to incorrect ctime after setxattr and removexattr Update ctime on successful set of xattr Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: minor cleanup of session handling codeSteve French2023-11-091-6/+12
| | | | | | | | | | | | | | Minor cleanup of style issues found by checkpatch Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: reconnect work should have reference on server structShyam Prasad N2023-11-092-16/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The delayed work for reconnect takes server struct as a parameter. But it does so without holding a ref to it. Normally, this may not show a problem as the reconnect work is only cancelled on umount. However, since we now plan to support scaling down of channels, and the scale down can happen from reconnect work itself, we need to fix it. This change takes a reference on the server struct before it is passed to the delayed work. And drops the reference in the delayed work itself. Or if the delayed work is successfully cancelled, by the process that cancels it. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: do not pass cifs_sb when trying to add channelsShyam Prasad N2023-11-093-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only reason why cifs_sb gets passed today to cifs_try_adding_channels is to pass the local_nls field for the new channels and binding session. However, the ses struct already has local_nls field that is setup during the first cifs_setup_session. So there is no need to pass cifs_sb. This change removes cifs_sb from the arg list for this and the functions that it calls and uses ses->local_nls instead. Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: account for primary channel in the interface listShyam Prasad N2023-11-092-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | The refcounting of server interfaces should account for the primary channel too. Although this is not strictly necessary, doing so will account for the primary channel in DebugData. Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: distribute channels across interfaces based on speedShyam Prasad N2023-11-093-14/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today, if the server interfaces RSS capable, we simply choose the fastest interface to setup a channel. This is not a scalable approach, and does not make a lot of attempt to distribute the connections. This change does a weighted distribution of channels across all the available server interfaces, where the weight is a function of the advertised interface speed. Also make sure that we don't mix rdma and non-rdma for channels. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: handle cases where a channel is closedShyam Prasad N2023-11-096-7/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, SMB multichannel could only scale up, but not scale down the number of channels. In this series of patch, we now allow the client to deal with the case of multichannel disabled on the server when the share is mounted. With that change, we now need the ability to scale down the channels. This change allows the client to deal with cases of missing channels more gracefully. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: more minor cleanups for session handling routinesSteve French2023-11-091-10/+15
| | | | | | | | | | | | | | Some trivial cleanup pointed out by checkpatch Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: minor RDMA cleanupSteve French2023-11-091-2/+2
| | | | | | | | | | | | | | | | Some minor smbdirect debug cleanup spotted by checkpatch Cc: Long Li <longli@microsoft.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Fix encryption of cleared, but unset rq_iter data buffersDavid Howells2023-11-081-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each smb_rqst struct contains two things: an array of kvecs (rq_iov) that contains the protocol data for an RPC op and an iterator (rq_iter) that contains the data payload of an RPC op. When an smb_rqst is allocated rq_iter is it always cleared, but we don't set it up unless we're going to use it. The functions that determines the size of the ciphertext buffer that will be needed to encrypt a request, cifs_get_num_sgs(), assumes that rq_iter is always initialised - and employs user_backed_iter() to check that the iterator isn't user-backed. This used to incidentally work, because ->user_backed was set to false because the iterator has never been initialised, but with commit f1b4cb650b9a0eeba206d8f069fcdc532bfbcd74[1] which changes user_backed_iter() to determine this based on the iterator type insted, a warning is now emitted: WARNING: CPU: 7 PID: 4584 at fs/smb/client/cifsglob.h:2165 smb2_get_aead_req+0x3fc/0x420 [cifs] ... RIP: 0010:smb2_get_aead_req+0x3fc/0x420 [cifs] ... crypt_message+0x33e/0x550 [cifs] smb3_init_transform_rq+0x27d/0x3f0 [cifs] smb_send_rqst+0xc7/0x160 [cifs] compound_send_recv+0x3ca/0x9f0 [cifs] cifs_send_recv+0x25/0x30 [cifs] SMB2_tcon+0x38a/0x820 [cifs] cifs_get_smb_ses+0x69c/0xee0 [cifs] cifs_mount_get_session+0x76/0x1d0 [cifs] dfs_mount_share+0x74/0x9d0 [cifs] cifs_mount+0x6e/0x2e0 [cifs] cifs_smb3_do_mount+0x143/0x300 [cifs] smb3_get_tree+0x15e/0x290 [cifs] vfs_get_tree+0x2d/0xe0 do_new_mount+0x124/0x340 __se_sys_mount+0x143/0x1a0 The problem is that rq_iter was never set, so the type is 0 (ie. ITER_UBUF) which causes user_backed_iter() to return true. The code doesn't malfunction because it checks the size of the iterator - which is 0. Fix cifs_get_num_sgs() to ignore rq_iter if its count is 0, thereby bypassing the warnings. It might be better to explicitly initialise rq_iter to a zero-length ITER_BVEC, say, as it can always be reinitialised later. Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Reported-by: Damian Tometzki <damian@riscv-rocks.de> Closes: https://lore.kernel.org/r/ZUfQo47uo0p2ZsYg@fedora.fritz.box/ Tested-by: Damian Tometzki <damian@riscv-rocks.de> Cc: stable@vger.kernel.org cc: Eric Biggers <ebiggers@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f1b4cb650b9a0eeba206d8f069fcdc532bfbcd74 [1] Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* | Merge tag '6.7-rc-smb3-server-part2' of git://git.samba.org/ksmbdLinus Torvalds2023-11-103-6/+41
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull smb server fixes from Steve French: - slab out of bounds fix in ACL handling - fix malformed request oops - minor doc fix * tag '6.7-rc-smb3-server-part2' of git://git.samba.org/ksmbd: ksmbd: handle malformed smb1 message ksmbd: fix kernel-doc comment of ksmbd_vfs_kern_path_locked() ksmbd: fix slab out of bounds write in smb_inherit_dacl()
| * | ksmbd: handle malformed smb1 messageNamjae Jeon2023-11-071-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If set_smb1_rsp_status() is not implemented, It will cause NULL pointer dereferece error when client send malformed smb1 message. This patch add set_smb1_rsp_status() to ignore malformed smb1 message. Cc: stable@vger.kernel.org Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | ksmbd: fix kernel-doc comment of ksmbd_vfs_kern_path_locked()Namjae Jeon2023-11-071-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix argument list that the kdoc format and script verified in ksmbd_vfs_kern_path_locked(). fs/smb/server/vfs.c:1207: warning: Function parameter or member 'parent_path' not described in 'ksmbd_vfs_kern_path_locked' Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | ksmbd: fix slab out of bounds write in smb_inherit_dacl()Namjae Jeon2023-11-071-3/+26
| |/ | | | | | | | | | | | | | | | | | | | | slab out-of-bounds write is caused by that offsets is bigger than pntsd allocation size. This patch add the check to validate 3 offsets using allocation size. Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-22271 Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
* | Merge tag 'ceph-for-6.7-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds2023-11-1023-1462/+2082
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: - support for idmapped mounts in CephFS (Christian Brauner, Alexander Mikhalitsyn). The series was originally developed by Christian and later picked up and brought over the finish line by Alexander, who also contributed an enabler on the MDS side (separate owner_{u,g}id fields on the wire). The required exports for mnt_idmap_{get,put}() in VFS have been acked by Christian and received no objection from Christoph. - a churny change in CephFS logging to include cluster and client identifiers in log and debug messages (Xiubo Li). This would help in scenarios with dozens of CephFS mounts on the same node which are getting increasingly common, especially in the Kubernetes world. * tag 'ceph-for-6.7-rc1' of https://github.com/ceph/ceph-client: ceph: allow idmapped mounts ceph: allow idmapped atomic_open inode op ceph: allow idmapped set_acl inode op ceph: allow idmapped setattr inode op ceph: pass idmap to __ceph_setattr ceph: allow idmapped permission inode op ceph: allow idmapped getattr inode op ceph: pass an idmapping to mknod/symlink/mkdir ceph: add enable_unsafe_idmap module parameter ceph: handle idmapped mounts in create_request_message() ceph: stash idmapping in mdsc request fs: export mnt_idmap_get/mnt_idmap_put libceph, ceph: move mdsmap.h to fs/ceph ceph: print cluster fsid and client global_id in all debug logs ceph: rename _to_client() to _to_fs_client() ceph: pass the mdsc to several helpers libceph: add doutc and *_client debug macros support
| * | ceph: allow idmapped mountsChristian Brauner2023-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we converted cephfs internally to account for idmapped mounts allow the creation of idmapped mounts on by setting the FS_ALLOW_IDMAP flag. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: allow idmapped atomic_open inode opChristian Brauner2023-11-031-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable ceph_atomic_open() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. [ aleksandr.mikhalitsyn: adapted to 5fadbd9929 ("ceph: rely on vfs for setgid stripping") ] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: allow idmapped set_acl inode opChristian Brauner2023-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable ceph_set_acl() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: allow idmapped setattr inode opChristian Brauner2023-11-031-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable __ceph_setattr() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. [ aleksandr.mikhalitsyn: adapted to b27c82e12965 ("attr: port attribute changes to new types") ] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: pass idmap to __ceph_setattrAlexander Mikhalitsyn2023-11-034-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just pass down the mount's idmapping to __ceph_setattr, because we will need it later. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: allow idmapped permission inode opChristian Brauner2023-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable ceph_permission() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: allow idmapped getattr inode opChristian Brauner2023-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable ceph_getattr() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: pass an idmapping to mknod/symlink/mkdirChristian Brauner2023-11-031-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable mknod/symlink/mkdir iops to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: add enable_unsafe_idmap module parameterAlexander Mikhalitsyn2023-11-033-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This parameter is used to decide if we allow to perform IO on idmapped mount in case when MDS lacks support of CEPHFS_FEATURE_HAS_OWNER_UIDGID feature. In this case we can't properly handle MDS permission checks and if UID/GID-based restrictions are enabled on the MDS side then IO requests which go through an idmapped mount may fail with -EACCESS/-EPERM. Fortunately, for most of users it's not a case and everything should work fine. But we put work "unsafe" in the module parameter name to warn users about possible problems with this feature and encourage update of cephfs MDS. Suggested-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: handle idmapped mounts in create_request_message()Christian Brauner2023-11-032-5/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inode operations that create a new filesystem object such as ->mknod, ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. Instead the caller's fs{g,u}id is used for the {g,u}id of the new filesystem object. In order to ensure that the correct {g,u}id is used map the caller's fs{g,u}id for creation requests. This doesn't require complex changes. It suffices to pass in the relevant idmapping recorded in the request message. If this request message was triggered from an inode operation that creates filesystem objects it will have passed down the relevant idmaping. If this is a request message that was triggered from an inode operation that doens't need to take idmappings into account the initial idmapping is passed down which is an identity mapping. This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID which adds two new fields (owner_{u,g}id) to the request head structure. So, we need to ensure that MDS supports it otherwise we need to fail any IO that comes through an idmapped mount because we can't process it in a proper way. MDS server without such an extension will use caller_{u,g}id fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id values are unmapped. At the same time we can't map these fields with an idmapping as it can break UID/GID-based permission checks logic on the MDS side. This problem was described with a lot of details at [1], [2]. [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ Link: https://github.com/ceph/ceph/pull/52575 Link: https://tracker.ceph.com/issues/62217 Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: stash idmapping in mdsc requestChristian Brauner2023-11-032-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sending a mds request cephfs will send relevant data for the requested operation. For creation requests the caller's fs{g,u}id is used to set the ownership of the newly created filesystem object. For setattr requests the caller can pass in arbitrary {g,u}id values to which the relevant filesystem object is supposed to be changed. If the caller is performing the relevant operation via an idmapped mount cephfs simply needs to take the idmapping into account when it sends the relevant mds request. In order to support idmapped mounts for cephfs we stash the idmapping whenever they are relevant for the operation for the duration of the request. Since mds requests can be queued and performed asynchronously we make sure to keep the idmapping around and release it once the request has finished. In follow-up patches we will use this to send correct ownership information over the wire. This patch just adds the basic infrastructure to keep the idmapping around. The actual conversion patches are all fairly minimal. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | fs: export mnt_idmap_get/mnt_idmap_putAlexander Mikhalitsyn2023-11-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | These helpers are required to support idmapped mounts in CephFS. Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | libceph, ceph: move mdsmap.h to fs/cephXiubo Li2023-11-033-2/+77
| | | | | | | | | | | | | | | | | | | | | | | | The mdsmap.h is only used by CephFS, so move it to fs/ceph. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: print cluster fsid and client global_id in all debug logsXiubo Li2023-11-0319-1312/+1747
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiple CephFS mounts on a host is increasingly common so disambiguating messages like this is necessary and will make it easier to debug issues. At the same this will improve the debug logs to make them easier to troubleshooting issues, such as print the ino# instead only printing the memory addresses of the corresponding inodes and print the dentry names instead of the corresponding memory addresses for the dentry,etc. Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: rename _to_client() to _to_fs_client()Xiubo Li2023-11-0314-95/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to covert the inode to ceph_client in the following commit, and will add one new helper for that, here we rename the old helper to _fs_client(). Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: pass the mdsc to several helpersXiubo Li2023-11-039-36/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will use the 'mdsc' to get the global_id in the following commits. Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | Merge tag 'nfs-for-6.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2023-11-0812-39/+93
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Bugfixes: - SUNRPC: - re-probe the target RPC port after an ECONNRESET error - handle allocation errors from rpcb_call_async() - fix a use-after-free condition in rpc_pipefs - fix up various checks for timeouts - NFSv4.1: - Handle NFS4ERR_DELAY errors during session trunking - fix SP4_MACH_CRED protection for pnfs IO - NFSv4: - Ensure that we test all delegations when the server notifies us that it may have revoked some of them Features: - Allow knfsd processes to break out of NFS4ERR_DELAY loops when re-exporting NFSv4.x by setting appropriate values for the 'delay_retrans' module parameter - nfs: Convert nfs_symlink() to use a folio" * tag 'nfs-for-6.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Convert nfs_symlink() to use a folio SUNRPC: Fix RPC client cleaned up the freed pipefs dentries NFSv4.1: fix SP4_MACH_CRED protection for pnfs IO SUNRPC: Add an IS_ERR() check back to where it was NFSv4.1: fix handling NFS4ERR_DELAY when testing for session trunking nfs41: drop dependency between flexfiles layout driver and NFSv3 modules NFSv4: fairly test all delegations on a SEQ4_ revocation SUNRPC: SOFTCONN tasks should time out when on the sending list SUNRPC: Force close the socket when a hard error is reported SUNRPC: Don't skip timeout checks in call_connect_status() SUNRPC: ECONNRESET might require a rebind NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY
| * | | nfs: Convert nfs_symlink() to use a folioMatthew Wilcox (Oracle)2023-11-014-22/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the folio APIs, saving about four calls to compound_head(). Convert back to a page in each of the individual protocol implementations. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | NFSv4.1: fix SP4_MACH_CRED protection for pnfs IOOlga Kornievskaia2023-11-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the client is doing pnfs IO and Kerberos is configured and EXCHANGEID successfully negotiated SP4_MACH_CRED and WRITE/COMMIT are on the list of state protected operations, then we need to make sure to choose the DS's rpc_client structure instead of the MDS's one. Fixes: fb91fb0ee7b2 ("NFS: Move call to nfs4_state_protect_write() to nfs4_write_setup()") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | NFSv4.1: fix handling NFS4ERR_DELAY when testing for session trunkingOlga Kornievskaia2023-11-011-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when client sends an EXCHANGE_ID for a possible trunked connection, for any error that happened, the trunk will be thrown out. However, an NFS4ERR_DELAY is a transient error that should be retried instead. Fixes: e818bd085baf ("NFSv4.1 remove xprt from xprt_switch if session trunking test fails") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | nfs41: drop dependency between flexfiles layout driver and NFSv3 modulesMkrtchyan, Tigran2023-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The flexfiles layout driver depends on NFSv3 module as data servers might be configure to provide nfsv3 only. Disabling the nfsv3 protocol completely disables the flexfiles layout driver, however, the data server still might support v4.1 protocol. Thus the strond couling betwwen flexfiles and nfsv3 modules should be relaxed, as layout driver will return UNSUPPORTED if not matching protocol is found. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | NFSv4: fairly test all delegations on a SEQ4_ revocationBenjamin Coddington2023-11-012-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the client is required to use TEST_STATEID to discover which delegation(s) have been revoked, it may continually test delegations at the head of the list if the server continues to be unsatisfied and send SEQ4_STATUS_RECALLABLE_STATE_REVOKED. For a large number of delegations this behavior is prone to live-lock because the client may never be able to test and free revoked state at the end of the list since the SEQ4_STATUS_RECALLABLE_STATE_REVOKED will cause us to flag delegations at the head of the list to be tested. This problem is further exacerbated by the state manager's willingness to be scheduled out on a busy system while testing the list of delegations. Keep a generation counter for each attempt to test all delegations, and skip delegations that have already been tested in the current pass. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Torkil Svensgaard <torkil@drcmr.dk> Tested-by: Ruben Vestergaard <rubenv@drcmr.dk> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mountsTrond Myklebust2023-10-224-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we're using the 'softerr' mount option, we may want to allow layoutget to return EAGAIN to allow knfsd server threads to return a JUKEBOX/DELAY error to the client instead of busy waiting. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAYTrond Myklebust2023-10-223-1/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using a 'softerr' mount, the NFSv4 client can get stuck waiting forever while the server just returns NFS4ERR_DELAY. Among other things, this causes the knfsd server threads to busy wait. Add a parameter that tells the NFSv4 client how many times to retry before giving up. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | | | Merge tag 'exfat-for-6.7-rc1-part2' of ↵Linus Torvalds2023-11-082-2/+3
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Fix an issue that exfat timestamps are not updated caused by new timestamp accessor function patch * tag 'exfat-for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix ctime is not updated exfat: fix setting uninitialized time to ctime/atime
| * | | | exfat: fix ctime is not updatedYuezhang Mo2023-11-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4c72a36edd54 ("exfat: convert to new timestamp accessors") removed attr_copy() from exfat_set_attr(). It causes xfstests generic/221 to fail. In xfstests generic/221, it tests ctime should be updated even if futimens() update atime only. But in this case, ctime will not be updated if attr_copy() is removed. attr_copy() may also update other attributes, and removing it may cause other bugs, so this commit restores to call attr_copy() in exfat_set_attr(). Fixes: 4c72a36edd54 ("exfat: convert to new timestamp accessors") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | exfat: fix setting uninitialized time to ctime/atimeYuezhang Mo2023-11-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An uninitialized time is set to ctime/atime in __exfat_write_inode(). It causes xfstests generic/003 and generic/192 to fail. And since there will be a time gap between setting ctime/atime to the inode and writing back the inode, so ctime/atime should not be set again when writing back the inode. Fixes: 4c72a36edd54 ("exfat: convert to new timestamp accessors") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
* | | | | Merge tag 'xfs-6.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2023-11-0831-950/+1433
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs updates from Chandan Babu: - Realtime device subsystem: - Cleanup usage of xfs_rtblock_t and xfs_fsblock_t data types - Replace open coded conversions between rt blocks and rt extents with calls to static inline helpers - Replace open coded realtime geometry compuation and macros with helper functions - CPU usage optimizations for realtime allocator - Misc bug fixes associated with Realtime device - Allow read operations to execute while an FICLONE ioctl is being serviced - Misc bug fixes: - Alert user when xfs_droplink() encounters an inode with a link count of zero - Handle the case where the allocator could return zero extents when servicing an fallocate request * tag 'xfs-6.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (40 commits) xfs: allow read IO and FICLONE to run concurrently xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space xfs: introduce protection for drop nlink xfs: don't look for end of extent further than necessary in xfs_rtallocate_extent_near() xfs: don't try redundant allocations in xfs_rtallocate_extent_near() xfs: limit maxlen based on available space in xfs_rtallocate_extent_near() xfs: return maximum free size from xfs_rtany_summary() xfs: invert the realtime summary cache xfs: simplify rt bitmap/summary block accessor functions xfs: simplify xfs_rtbuf_get calling conventions xfs: cache last bitmap block in realtime allocator xfs: use accessor functions for summary info words xfs: consolidate realtime allocation arguments xfs: create helpers for rtsummary block/wordcount computations xfs: use accessor functions for bitmap words xfs: create helpers for rtbitmap block/wordcount computations xfs: create a helper to handle logging parts of rt bitmap/summary blocks xfs: convert rt summary macros to helpers xfs: convert open-coded xfs_rtword_t pointer accesses to helper xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros ...