summaryrefslogtreecommitdiffstats
path: root/fs/gfs2/inode.c
Commit message (Collapse)AuthorAgeFilesLines
* gfs2: Switch lock order of inode and iopen glockAndreas Gruenbacher2022-02-151-22/+27
| | | | | | | | | | This patch tries to fix the continual ABBA deadlocks we keep having between the iopen and inode glocks. This switches the lock order in gfs2_inode_lookup and gfs2_create_inode so the iopen glock is always locked first. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
* gfs2: gfs2_setattr_size error path fixAndreas Gruenbacher2022-02-151-1/+1
| | | | | | | | | | | | | | | | | | When gfs2_setattr_size() fails, it calls gfs2_rs_delete(ip, NULL) to get rid of any reservations the inode may have. Instead, it should pass in the inode's write count as the second parameter to allow gfs2_rs_delete() to figure out if the inode has any writers left. In a next step, there are two instances of gfs2_rs_delete(ip, NULL) left where we know that there can be no other users of the inode. Replace those with gfs2_rs_deltree(&ip->i_res) to avoid the unnecessary write count check. With that, gfs2_rs_delete() is only called with the inode's actual write count, so get rid of the second parameter. Fixes: a097dc7e24cb ("GFS2: Make rgrp reservations part of the gfs2_inode structure") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: gfs2_create_inode reworkAndreas Gruenbacher2021-12-021-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | When gfs2_lookup_by_inum() calls gfs2_inode_lookup() for an uncached inode, gfs2_inode_lookup() will place a new tentative inode into the inode cache before verifying that there is a valid inode at the given address. This can race with gfs2_create_inode() which doesn't check for duplicates inodes. gfs2_create_inode() will try to assign the new inode to the corresponding inode glock, and glock_set_object() will complain that the glock is still in use by gfs2_inode_lookup's tentative inode. We noticed this bug after adding commit 486408d690e1 ("gfs2: Cancel remote delete work asynchronously") which allowed delete_work_func() to race with gfs2_create_inode(), but the same race exists for open-by-handle. Fix that by switching from insert_inode_hash() to insert_inode_locked4(), which does check for duplicate inodes. We know we've just managed to to allocate the new inode, so an inode tentatively created by gfs2_inode_lookup() will eventually go away and insert_inode_locked4() will always succeed. In addition, don't flush the inode glock work anymore (this can now only make things worse) and clean up glock_{set,clear}_object for the inode glock somewhat. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: gfs2_inode_lookup reworkAndreas Gruenbacher2021-12-021-51/+33
| | | | | | | | | | | | | Rework gfs2_inode_lookup() to only set up the new inode's glocks after verifying that the new inode is valid. There is no need for flushing the inode glock work queue anymore now, so remove that as well. While at it, get rid of the useless wrapper around iget5_locked() and its unnecessary is_bad_inode() check. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: gfs2_inode_lookup cleanupAndreas Gruenbacher2021-12-021-7/+2
| | | | | | | | | | | In gfs2_inode_lookup, once the inode has been looked up, we check if the inode generation (no_formal_ino) is the one we're looking for. If it isn't and the inode wasn't in the inode cache, we discard the newly looked up inode. This is unnecessary, complicates the code, and makes future changes to gfs2_inode_lookup harder, so change the code to retain newly looked up inodes instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: set glock object after nqBob Peterson2021-10-251-2/+2
| | | | | | | | | | | | | | | Before this patch, function gfs2_create_inode called glock_set_object to set the gl_object for inode and iopen glocks before the glock was locked. That's wrong because other competing processes like evict may be blocked waiting for the glock and still have gl_object set before the actual eviction can take place. This patch moves the call to glock_set_object until after the glock is acquire in function gfs2_create_inode, so it waits for possibly competing evicts to finish their processing first. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Eliminate GIF_INVALID flagBob Peterson2021-10-251-1/+0
| | | | | | | | | | With the addition of the new GLF_INSTANTIATE_NEEDED flag, the GIF_INVALID flag is now redundant. This patch removes it. Since inode_instantiate is only called when instantiation is needed, the check in inode_instantiate is removed too. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: fix GL_SKIP node_scope problemsBob Peterson2021-10-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, when a glock was locked, the very first holder on the queue would unlock the lockref and call the go_instantiate glops function (if one existed), unless GL_SKIP was specified. When we introduced the new node-scope concept, we allowed multiple holders to lock glocks in EX mode and share the lock. But node-scope introduced a new problem: if the first holder has GL_SKIP and the next one does NOT, since it is not the first holder on the queue, the go_instantiate op was not called. Eventually the GL_SKIP holder may call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was still a window of time in which another non-GL_SKIP holder assumes the instantiate function had been called by the first holder. In the case of rgrp glocks, this led to a NULL pointer dereference on the buffer_heads. This patch tries to fix the problem by introducing two new glock flags: GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function needs to be called to "fill in" or "read in" the object before it is referenced. GLF_INSTANTIATE_IN_PROG which is used to determine when a process is in the process of reading in the object. Whenever a function needs to reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate" function. As before, the gl_lockref spin_lock is unlocked during the IO operation, which may take a relatively long amount of time to complete. While unlocked, if another process determines go_instantiate is still needed, it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared, it needs to check GLF_INSTANTIATE_NEEDED again because the other process's go_instantiate operation may not have been successful. Functions that previously called the instantiate sub-functions now call directly into gfs2_instantiate so the new bits are managed properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: dequeue iopen holder in gfs2_inode_lookup errorBob Peterson2021-10-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, if function gfs2_inode_lookup encountered an error after it had locked the iopen glock, it never unlocked it, relying on the evict code to do the cleanup. The evict code then took the inode glock while holding the iopen glock, which violates the locking order. For example, (1) node A does a gfs2_inode_lookup that fails, leaving the iopen glock locked. (2) node B calls delete_work_func -> gfs2_lookup_by_inum -> gfs2_inode_lookup. It locks the inode glock and blocks trying to lock the iopen glock, which is held by node A. (3) node A eventually calls gfs2_evict_inode -> evict_should_delete. It blocks trying to lock the inode glock, which is now held by node B. This patch introduces error handling to function gfs2_inode_lookup so it properly dequeues held iopen glocks on errors. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Switch to may_setattr in gfs2_setattrAndreas Gruenbacher2021-08-131-2/+2
| | | | | | | | | | | The permission check in gfs2_setattr is an old and outdated version of may_setattr(). Switch to the updated version. Fixes fstest generic/079. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge tag 'gfs2-for-5.13' of ↵Linus Torvalds2021-04-291-17/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix some compiler and kernel-doc warnings - Various minor cleanups and optimizations - Add a new sysfs gfs2 status file with some filesystem wide information * tag 'gfs2-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix fall-through warnings for Clang gfs2: Fix a number of kernel-doc warnings gfs2: Make gfs2_setattr_simple static gfs2: Add new sysfs file for gfs2 status gfs2: Silence possible null pointer dereference warning gfs2: Turn gfs2_meta_indirect_buffer into gfs2_meta_buffer gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extent gfs2: Turn gfs2_extent_map into gfs2_{get,alloc}_extent gfs2: Add new gfs2_iomap_get helper gfs2: Remove unused variable sb_format gfs2: Fix dir.c function parameter descriptions gfs2: Eliminate gh parameter from go_xmote_bh func gfs2: don't create empty buffers for NO_CREATE
| * gfs2: Fix fall-through warnings for ClangGustavo A. R. Silva2021-04-201-0/+2
| | | | | | | | | | | | | | | | | | | | In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding multiple goto statements instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Fix a number of kernel-doc warningsLee Jones2021-04-091-16/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Building the kernel with W=1 results in a number of kernel-doc warnings like incorrect function names and parameter descriptions. Fix those, mostly by adding missing parameter descriptions, removing left-over descriptions, and demoting some less important kernel-doc comments into regular comments. Originally proposed by Lee Jones; improved and combined into a single patch by Andreas. Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Make gfs2_setattr_simple staticAndreas Gruenbacher2021-04-081-1/+1
| | | | | | | | | | | | This function is only used in inode.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | gfs2: convert to fileattrMiklos Szeredi2021-04-121-0/+4
|/ | | | | | | | Use the fileattr API to let the VFS handle locking, permission checking and conversion. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com>
* Merge branch 'work.misc' of ↵Linus Torvalds2021-02-271-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff pile - no common topic here" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: whack-a-mole: don't open-code iminor/imajor 9p: fix misuse of sscanf() in v9fs_stat2inode() audit_alloc_mark(): don't open-code ERR_CAST() fs/inode.c: make inode_init_always() initialize i_ino to 0 vfs: don't unnecessarily clone write access for writable fds
| * whack-a-mole: don't open-code iminor/imajorAl Viro2021-02-231-2/+2
| | | | | | | | | | | | several instances creeped back into the tree... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'gfs2-for-5.12' of ↵Linus Torvalds2021-02-231-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Log space and revoke accounting rework to fix some failed asserts. - Local resource group glock sharing for better local performance. - Add support for version 1802 filesystems: trusted xattr support and '-o rgrplvb' mounts by default. - Actually synchronize on the inode glock's FREEING bit during withdraw ("gfs2: fix glock confusion in function signal_our_withdraw"). - Fix parallel recovery of multiple journals ("gfs2: keep bios separate for each journal"). - Various other bug fixes. * tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (49 commits) gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush gfs2: Per-revoke accounting in transactions gfs2: Rework the log space allocation logic gfs2: Minor calc_reserved cleanup gfs2: Use resource group glock sharing gfs2: Allow node-wide exclusive glock sharing gfs2: Add local resource group locking gfs2: Add per-reservation reserved block accounting gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested} gfs2: Check for active reservation in gfs2_release gfs2: Don't search for unreserved space twice gfs2: Only pass reservation down to gfs2_rbm_find gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end gfs2: Add trusted xattr support gfs2: Enable rgrplvb for sb_fs_format 1802 gfs2: Don't skip dlm unlock if glock has an lvb gfs2: Lock imbalance on error path in gfs2_recover_one gfs2: Move function gfs2_ail_empty_tr gfs2: Get rid of current_tail() ...
| * | gfs2: Use resource group glock sharingBob Peterson2021-02-171-3/+3
| |/ | | | | | | | | | | | | | | | | | | This patch takes advantage of the new glock holder sharing feature for resource groups. We have already introduced local resource group locking in a previous patch, so competing accesses of local processes are already under control. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | fs: make helpers idmap mount awareChristian Brauner2021-01-241-21/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | stat: handle idmapped mountsChristian Brauner2021-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic_fillattr() helper fills in the basic attributes associated with an inode. Enable it to handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace before we store the uid and gid. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-12-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | acl: handle idmapped mountsChristian Brauner2021-01-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The posix acl permission checking helpers determine whether a caller is privileged over an inode according to the acls associated with the inode. Add helpers that make it possible to handle acls on idmapped mounts. The vfs and the filesystems targeted by this first iteration make use of posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to translate basic posix access and default permissions such as the ACL_USER and ACL_GROUP type according to the initial user namespace (or the superblock's user namespace) to and from the caller's current user namespace. Adapt these two helpers to handle idmapped mounts whereby we either map from or into the mount's user namespace depending on in which direction we're translating. Similarly, cap_convert_nscap() is used by the vfs to translate user namespace and non-user namespace aware filesystem capabilities from the superblock's user namespace to the caller's user namespace. Enable it to handle idmapped mounts by accounting for the mount's user namespace. In addition the fileystems targeted in the first iteration of this patch series make use of the posix_acl_chmod() and, posix_acl_update_mode() helpers. Both helpers perform permission checks on the target inode. Let them handle idmapped mounts. These two helpers are called when posix acls are set by the respective filesystems to handle this case we extend the ->set() method to take an additional user namespace argument to pass the mount's user namespace down. Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | attr: handle idmapped mountsChristian Brauner2021-01-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When file attributes are changed most filesystems rely on the setattr_prepare(), setattr_copy(), and notify_change() helpers for initialization and permission checking. Let them handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Helpers that perform checks on the ia_uid and ia_gid fields in struct iattr assume that ia_uid and ia_gid are intended values and have already been mapped correctly at the userspace-kernelspace boundary as we already do today. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | namei: make permission helpers idmapped mount awareChristian Brauner2021-01-241-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller is privileged over an inode. In order to handle idmapped mounts we extend the two helpers with an additional user namespace argument. On idmapped mounts the two helpers will make sure to map the inode according to the mount's user namespace and then peform identical permission checks to inode_permission() and generic_permission(). If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* Revert "GFS2: Prevent delete work from occurring on glocks used for create"Andreas Gruenbacher2020-12-011-5/+1
| | | | | | | | | | | | Since commit a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work"), we're cancelling any pending delete work of an iopen glock before attaching a new inode to that glock in gfs2_create_inode. This means that delete_work_func can no longer be queued or running when attaching the iopen glock to the new inode, and we can revert commit a4923865ea07 ("GFS2: Prevent delete work from occurring on glocks used for create"), which tried to achieve the same but in a racy way. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Make inode operations staticAndreas Gruenbacher2020-12-011-3/+7
| | | | | | The inode operations are not used outside inode.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_funcAndreas Gruenbacher2020-12-011-10/+11
| | | | | | | | | | | | In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work may block waiting for delete_work_func to complete, and delete_work_func may block trying to acquire the inode glock in gfs2_inode_lookup. Reported-by: Alexander Aring <aahringo@redhat.com> Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Upgrade shared glocks for atime updatesAndreas Gruenbacher2020-11-261-0/+21
| | | | | | | | | | | | | | | | | | Commit 20f829999c38 ("gfs2: Rework read and page fault locking") lifted the glock lock taking from the low-level ->readpage and ->readahead address space operations to the higher-level ->read_iter file and ->fault vm operations. The glocks are still taken in LM_ST_SHARED mode only. On filesystems mounted without the noatime option, ->read_iter sometimes needs to update the atime as well, though. Right now, this leads to a failed locking mode assertion in gfs2_dirty_inode. Fix that by introducing a new update_time inode operation. There, if the glock is held non-exclusively, upgrade it to an exclusive lock. Reported-by: Alexander Aring <aahringo@redhat.com> Fixes: 20f829999c38 ("gfs2: Rework read and page fault locking") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Don't call cancel_delayed_work_sync from within delete work functionAndreas Gruenbacher2020-11-021-1/+2
| | | | | | | | | | | | | | Right now, we can end up calling cancel_delayed_work_sync from within delete_work_func via gfs2_lookup_by_inum -> gfs2_inode_lookup -> gfs2_cancel_delete_work. When that happens, it will result in a deadlock. Instead, gfs2_inode_lookup should skip the call to gfs2_cancel_delete_work when called from delete_work_func (blktype == GFS2_BLKST_UNLINKED). Reported-by: Alexander Ahring Oder Aring <aahringo@redhat.com> Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Don't return NULL from gfs2_inode_lookupAndreas Gruenbacher2020-06-301-1/+2
| | | | | | | | | | Callers expect gfs2_inode_lookup to return an inode pointer or ERR_PTR(error). Commit b66648ad6dcf caused it to return NULL instead of ERR_PTR(-ESTALE) in some cases. Fix that. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: b66648ad6dcf ("gfs2: Move inode generation number check into gfs2_inode_lookup") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* Merge tag 'gfs2-for-5.8' of ↵Linus Torvalds2020-06-081-11/+36
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - An iopen glock locking scheme rework that speeds up deletes of inodes accessed from multiple nodes - Various bug fixes and debugging improvements - Convert gfs2-glocks.txt to ReST * tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: fix use-after-free on transaction ail lists gfs2: new slab for transactions gfs2: initialize transaction tr_ailX_lists earlier gfs2: Smarter iopen glock waiting gfs2: Wake up when setting GLF_DEMOTE gfs2: Check inode generation number in delete_work_func gfs2: Move inode generation number check into gfs2_inode_lookup gfs2: Minor gfs2_lookup_by_inum cleanup gfs2: Try harder to delete inodes locally gfs2: Give up the iopen glock on contention gfs2: Turn gl_delete into a delayed work gfs2: Keep track of deleted inode generations in LVBs gfs2: Allow ASPACE glocks to also have an lvb gfs2: instrumentation wrt log_flush stuck gfs2: introduce new gfs2_glock_assert_withdraw gfs2: print mapping->nrpages in glock dump for address space glocks gfs2: Only do glock put in gfs2_create_inode for free inodes gfs2: Allow lock_nolock mount to specify jid=X gfs2: Don't ignore inode write errors during inode_go_sync docs: filesystems: convert gfs2-glocks.txt to ReST
| * Merge branch 'gfs2-iopen' into for-nextAndreas Gruenbacher2020-06-051-10/+34
| |\
| | * gfs2: Move inode generation number check into gfs2_inode_lookupAndreas Gruenbacher2020-06-051-9/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Move the inode generation number check from gfs2_lookup_by_inum into gfs2_inode_lookup: gfs2_inode_lookup may be able to decide that an inode with the given inode generation number cannot exist without having to verify the block type or reading the inode from disk. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * gfs2: Minor gfs2_lookup_by_inum cleanupAndreas Gruenbacher2020-06-051-2/+9
| | | | | | | | | | | | | | | | | | | | | Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode generation number will qualify: a valid inode never has a zero no_formal_ino. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * gfs2: Turn gl_delete into a delayed workAndreas Gruenbacher2020-06-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This requires flushing delayed work items in gfs2_make_fs_ro (which is called before unmounting a filesystem). When inodes are deleted and then recreated, pending gl_delete work items would have no effect because the inode generations will have changed, so we can cancel any pending gl_delete works before reusing iopen glocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Only do glock put in gfs2_create_inode for free inodesBob Peterson2020-06-021-1/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, the error path of function gfs2_create_inode would always calls gfs2_glock_put for the inode glock. That's good for inodes that are free. But after they've been added to the vfs inodes, errors will cause the inode to be evicted, and the evict will do the glock put for us. If we do a glock put again, we can try to free the glock while there are still references to it, e.g. revokes pending for the transaction that created it. This patch adds a check: if (free_vfs_inode) before the put, thus solving the problem. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | Merge tag 'ext4_for_linus' of ↵Linus Torvalds2020-06-051-0/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of bug fixes and cleanups for ext4, including: - Fix performance problems found in dioread_nolock now that it is the default, caused by transaction leaks. - Clean up fiemap handling in ext4 - Clean up and refactor multiple block allocator (mballoc) code - Fix a problem with mballoc with a smaller file systems running out of blocks because they couldn't properly use blocks that had been reserved by inode preallocation. - Fixed a race in ext4_sync_parent() versus rename() - Simplify the error handling in the extent manipulation code - Make sure all metadata I/O errors are felected to ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers. - Avoid passing an error pointer to brelse in ext4_xattr_set() - Fix race which could result to freeing an inode on the dirty last in data=journal mode. - Fix refcount handling if ext4_iget() fails - Fix a crash in generic/019 caused by a corrupted extent node" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits) ext4: avoid unnecessary transaction starts during writeback ext4: don't block for O_DIRECT if IOCB_NOWAIT is set ext4: remove the access_ok() check in ext4_ioctl_get_es_cache fs: remove the access_ok() check in ioctl_fiemap fs: handle FIEMAP_FLAG_SYNC in fiemap_prep fs: move fiemap range validation into the file systems instances iomap: fix the iomap_fiemap prototype fs: move the fiemap definitions out of fs.h fs: mark __generic_block_fiemap static ext4: remove the call to fiemap_check_flags in ext4_fiemap ext4: split _ext4_fiemap ext4: fix fiemap size checks for bitmap files ext4: fix EXT4_MAX_LOGICAL_BLOCK macro add comment for ext4_dir_entry_2 file_type member jbd2: avoid leaking transaction credits when unreserving handle ext4: drop ext4_journal_free_reserved() ext4: mballoc: use lock for checking free blocks while retrying ext4: mballoc: refactor ext4_mb_good_group() ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling ext4: mballoc: refactor ext4_mb_discard_preallocations() ...
| * fs: move the fiemap definitions out of fs.hChristoph Hellwig2020-06-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | No need to pull the fiemap definitions into almost every file in the kernel build. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | gfs2: Fix problems regarding gfs2_qa_get and _putBob Peterson2020-05-081-3/+4
|/ | | | | | | | | | | | | | | This patch fixes a couple of places in which gfs2_qa_get and gfs2_qa_put are not balanced: we now keep references around whenever a file is open for writing (see gfs2_open_common and gfs2_release), so we need to put all references we grab in function gfs2_create_inode. This was broken in the successful case and on one error path. This also means that we don't have a reference to put in gfs2_evict_inode. In addition, gfs2_qa_put was called for the wrong inode in gfs2_link. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* Merge tag 'gfs2-for-5.7' of ↵Linus Torvalds2020-03-311-26/+27
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Bob Peterson: "We've got a lot of patches (39) for this merge window. Most of these patches are related to corruption that occurs when journals are replayed. For example: 1. A node fails while writing to the file system. 2. Other nodes use the metadata that was once used by the failed node. 3. When the node returns to the cluster, its journal is replayed, but the older metadata blocks overwrite the changes from step 2. Summary: - Fixed the recovery sequence to prevent corruption during journal replay. - Many bug fixes found during recovery testing. - New improved file system withdraw sequence. - Fixed how resource group buffers are managed. - Fixed how metadata revokes are tracked and written. - Improve processing of IO errors hit by daemons like logd and quotad. - Improved error checking in metadata writes. - Fixed how qadata quota data structures are managed" * tag 'gfs2-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (39 commits) gfs2: Fix oversight in gfs2_ail1_flush gfs2: change from write to read lock for sd_log_flush_lock in journal replay gfs2: instrumentation wrt ail1 stuck gfs2: don't lock sd_log_flush_lock in try_rgrp_unlink gfs2: Remove unnecessary gfs2_qa_{get,put} pairs gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_put gfs2: Change inode qa_data to allow multiple users gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_alloc gfs2: Switch to list_{first,last}_entry gfs2: Clean up inode initialization and teardown gfs2: Additional information when gfs2_ail1_flush withdraws gfs2: leaf_dealloc needs to allocate one more revoke gfs2: allow journal replay to hold sd_log_flush_lock gfs2: don't allow releasepage to free bd still used for revokes gfs2: flesh out delayed withdraw for gfs2_log_flush gfs2: Do proper error checking for go_sync family of glops functions gfs2: Don't demote a glock until its revokes are written gfs2: drain the ail2 list after io errors gfs2: Withdraw in gfs2_ail1_flush if write_cache_pages fails gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty ...
| * gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_putAndreas Gruenbacher2020-03-271-1/+2
| | | | | | | | | | | | | | Keeping reservations and quotas separate helps reviewing the code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * gfs2: Change inode qa_data to allow multiple usersBob Peterson2020-03-271-13/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, multiple users called gfs2_qa_alloc which allocated a qadata structure to the inode, if quotas are turned on. Later, in file close or evict, the structure was deleted with gfs2_qa_delete. But there can be several competing processes who need access to the structure. There were races between file close (release) and the others. Thus, a release could delete the structure out from under a process that relied upon its existence. For example, chown. This patch changes the management of the qadata structures to be a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get and if the structure is allocated, the count essentially starts out at 1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the last guy to decrement the count to 0 frees the memory. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_allocBob Peterson2020-03-271-6/+6
| | | | | | | | | | | | | | | | | | | | Before this patch, multiple callers called gfs2_rsqa_alloc to force the existence of a reservations structure and a quota data structure if needed. However, now the reservations are handled separately, so the quota data is only the quota data. So we eliminate the one in favor of just calling gfs2_qa_alloc directly. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * gfs2: Clean up inode initialization and teardownAndreas Gruenbacher2020-03-271-12/+6
| | | | | | | | | | | | | | | | | | | | | | When allocating a new inode, mark the iopen glock holder as uninitialized to make sure gfs2_evict_inode won't fail after an incomplete create or lookup. In gfs2_evict_inode, allow the inode glock to be NULL and remove the duplicate iopen glock teardown code. In gfs2_inode_lookup, don't tear down things that gfs2_evict_inode will already tear down. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
* | gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcacheAl Viro2020-03-121-1/+1
|/ | | | | | | | | | | with the way fs/namei.c:do_last() had been done, ->atomic_open() instances needed to recognize the case when existing file got found with O_EXCL|O_CREAT, either by falling back to finish_no_open() or failing themselves. gfs2 one didn't. Fixes: 6d4ade986f9c (GFS2: Add atomic_open support) Cc: stable@kernel.org # v3.11 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* gfs2: Avoid access time thrashing in gfs2_inode_lookupAndreas Gruenbacher2020-01-151-5/+5
| | | | | | | | | | | | | In gfs2_inode_lookup, we initialize inode->i_atime to the lowest possibly value after gfs2_inode_refresh may already have been called. This should be the other way around, but we didn't notice because usually the inode type is known from the directory entry and so gfs2_inode_lookup won't call gfs2_inode_refresh. In addition, only initialize ip->i_no_formal_ino from no_formal_ino when actually needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Remove duplicate call from gfs2_create_inodeAndreas Gruenbacher2019-11-211-1/+0
| | | | | | | In gfs2_create_inode, gfs2_set_inode_blocks is called twice for no good reason. Remove the unnecessary call. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: clean up iopen glock mess in gfs2_create_inodeBob Peterson2019-11-191-6/+7
| | | | | | | | | | | | | | | | | | | Before this patch, gfs2_create_inode had a use-after-free for the iopen glock in some error paths because it did this: gfs2_glock_put(io_gl); fail_gunlock2: if (io_gl) clear_bit(GLF_INODE_CREATING, &io_gl->gl_flags); In some cases, the io_gl was used for create and only had one reference, so the glock might be freed before the clear_bit(). This patch tries to straighten it out by only jumping to the error paths where iopen is properly set, and moving the gfs2_glock_put after the clear_bit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: removed unnecessary semicolonAliasgar Surti2019-10-301-1/+1
| | | | | | | | | There is use of unnecessary semicolon after switch case. Removed the semicolon. Signed-off-by: Aliasgar Surti <aliasgar.surti500@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Use async glocks for renameBob Peterson2019-09-041-11/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can reverse the roles of which directories are "old" and which are "new" for the purposes of rename. This can cause deadlocks where two nodes end up waiting for each other. There can be several layers of directory dependencies across many nodes. This patch fixes the problem by acquiring all gfs2_rename's inode glocks asychronously and waiting for all glocks to be acquired. That way all inodes are locked regardless of the order. The timeout value for multiple asynchronous glocks is calculated to be the total of the individual wait times for each glock times two. Since gfs2_exchange is very similar to gfs2_rename, both functions are patched in the same way. A new async glock wait queue, sd_async_glock_wait, keeps a list of waiters for these events. If gfs2's holder_wake function detects an async holder, it wakes up any waiters for the event. The waiter only tests whether any of its requests are still pending. Since the glocks are sent to dlm asychronously, the wait function needs to check to see which glocks, if any, were granted. If a glock is granted by dlm (and therefore held), its minimum hold time is checked and adjusted as necessary, as other glock grants do. If the event times out, all glocks held thus far must be dequeued to resolve any existing deadlocks. Then, if there are any outstanding locking requests, we need to loop around and wait for dlm to respond to those requests too. After we release all requests, we return -ESTALE to the caller (vfs rename) which loops around and retries the request. Node1 Node2 --------- --------- 1. Enqueue A Enqueue B 2. Enqueue B Enqueue A 3. A granted 6. B granted 7. Wait for B 8. Wait for A 9. A times out (since Node 1 holds A) 10. Dequeue B (since it was granted) 11. Wait for all requests from DLM 12. B Granted (since Node2 released it in step 10) 13. Rename 14. Dequeue A 15. DLM Grants A 16. Dequeue A (due to the timeout and since we no longer have B held for our task). 17. Dequeue B 18. Return -ESTALE to vfs 19. VFS retries the operation, goto step 1. This release-all-locks / acquire-all-locks may slow rename / exchange down as both nodes struggle in the same way and do the same thing. However, this will only happen when there is contention for the same inodes, which ought to be rare. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>