summaryrefslogtreecommitdiffstats
path: root/fs/gfs2
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'gfs2-for-5.12' of ↵Linus Torvalds2021-02-2325-660/+964
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Log space and revoke accounting rework to fix some failed asserts. - Local resource group glock sharing for better local performance. - Add support for version 1802 filesystems: trusted xattr support and '-o rgrplvb' mounts by default. - Actually synchronize on the inode glock's FREEING bit during withdraw ("gfs2: fix glock confusion in function signal_our_withdraw"). - Fix parallel recovery of multiple journals ("gfs2: keep bios separate for each journal"). - Various other bug fixes. * tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (49 commits) gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush gfs2: Per-revoke accounting in transactions gfs2: Rework the log space allocation logic gfs2: Minor calc_reserved cleanup gfs2: Use resource group glock sharing gfs2: Allow node-wide exclusive glock sharing gfs2: Add local resource group locking gfs2: Add per-reservation reserved block accounting gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested} gfs2: Check for active reservation in gfs2_release gfs2: Don't search for unreserved space twice gfs2: Only pass reservation down to gfs2_rbm_find gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end gfs2: Add trusted xattr support gfs2: Enable rgrplvb for sb_fs_format 1802 gfs2: Don't skip dlm unlock if glock has an lvb gfs2: Lock imbalance on error path in gfs2_recover_one gfs2: Move function gfs2_ail_empty_tr gfs2: Get rid of current_tail() ...
| * gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flushBob Peterson2021-02-231-2/+7
| | | | | | | | | | | | | | | | | | | | | | In gfs2_ail1_flush, we're using I/O plugging to give the block layer a better chance of merging I/O requests. If we're too aggressive here, we can end up waiting on I/O to complete while still plugged. Fix that in a way similar to writeback_sb_inodes, except that we can't use blk_flush_plug because blk_flush_plug_list is not exported. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| *-. Merge branches 'rgrp-glock-sharing' and 'gfs2-revoke' from ↵Andreas Gruenbacher2021-02-2319-549/+777
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git Merge the resource group glock sharing feature and the revoke accounting rework. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Per-revoke accounting in transactionsAndreas Gruenbacher2021-02-227-42/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the log, revokes are stored as a revoke descriptor (struct gfs2_log_descriptor), followed by zero or more additional revoke blocks (struct gfs2_meta_header). On filesystems with a blocksize of 4k, the revoke descriptor contains up to 503 revokes, and the metadata blocks contain up to 509 revokes each. We've so far been reserving space for revokes in transactions in block granularity, so a lot more space than necessary was being allocated and then released again. This patch switches to assigning revokes to transactions individually instead. Initially, space for the revoke descriptor is reserved and handed out to transactions. When more revokes than that are reserved, additional revoke blocks are added. When the log is flushed, the space for the additional revoke blocks is released, but we keep the space for the revoke descriptor block allocated. Transactions may still reserve more revokes than they will actually need in the end, but now we won't overshoot the target as much, and by only returning the space for excess revokes at log flush time, we further reduce the amount of contention between processes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Rework the log space allocation logicAndreas Gruenbacher2021-02-223-69/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current log space allocation logic is hard to understand or extend. The principle it that when the log is flushed, we may or may not have a transaction active that has space allocated in the log. To deal with that, we set aside a magical number of blocks to be used in case we don't have an active transaction. It isn't clear that the pool will always be big enough. In addition, we can't return unused log space at the end of a transaction, so the number of blocks allocated must exactly match the number of blocks used. Simplify this as follows: * When transactions are allocated or merged, always reserve enough blocks to flush the transaction (err on the safe side). * In gfs2_log_flush, return any allocated blocks that haven't been used. * Maintain a pool of spare blocks big enough to do one log flush, as before. * In gfs2_log_flush, when we have no active transaction, allocate a suitable number of blocks. For that, use the spare pool when called from logd, and leave the pool alone otherwise. This means that when the log is almost full, logd will still be able to do one more log flush, which will result in more log space becoming available. This will make the log space allocator code easier to work with in the future. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Minor calc_reserved cleanupAndreas Gruenbacher2021-02-221-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | No functional change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Move function gfs2_ail_empty_trAndreas Gruenbacher2021-02-031-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | Move this function further up in log.c so that we can use it in the next patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Get rid of current_tail()Andreas Gruenbacher2021-02-033-37/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep the current value of the updated log tail in the super block as sb_log_flush_tail instead of computing it on the fly. This avoids unnecessary sd_ail_lock taking and cleans up the code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Use a tighter bound in gfs2_trans_beginAndreas Gruenbacher2021-02-031-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a tighter bound for the number of blocks required by transactions in gfs2_trans_begin: in the worst case, we'll have mixed data and metadata, so we'll need a log desciptor for each type. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Clean up gfs2_log_reserveAndreas Gruenbacher2021-02-032-35/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wake up log waiters in gfs2_log_release when log space has actually become available. This is a much better place for the wakeup than gfs2_logd. Check if enough log space is immeditely available before anything else. If there isn't, use io_wait_event to wait instead of open-coding it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Don't wait for journal flush in clean_journalAndreas Gruenbacher2021-02-031-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 588bff95c94e added gfs2_write_log_header() and started using it in clean_journal(), with an additional call to log_flush_wait() at the end of gfs2_write_log_header() which is unnecessary for clean_journal(). Move that call out of gfs2_write_log_header() to restore the previous behavior. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Move lock flush locking to gfs2_trans_{begin,end}Andreas Gruenbacher2021-02-033-33/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the read locking of sd_log_flush_lock from gfs2_log_reserve to gfs2_trans_begin, and its unlocking from gfs2_log_release to gfs2_trans_end. Use gfs2_log_release in two places in which it was open coded before. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Get rid of sd_reserving_logAndreas Gruenbacher2021-02-036-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This counter and the associated wait queue are only used so that gfs2_make_fs_ro can efficiently wait for all pending log space allocations to fail after setting the filesystem to read-only. This comes at the cost of waking up that wait queue very frequently. Instead, when gfs2_log_reserve fails because the filesystem has become read-only, Wake up sd_log_waitq. In gfs2_make_fs_ro, set the file system read-only and then wait until all the log space has been released. Give up and report the problem after a while. With that, sd_reserving_log and sd_reserving_log_wait can be removed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Clean up on-stack transactionsAndreas Gruenbacher2021-02-035-42/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the TR_ALLOCED flag by its inverse, TR_ONSTACK: that way, the flag only needs to be set in the exceptional case of on-stack transactions. Split off __gfs2_trans_begin from gfs2_trans_begin and use it to replace the open-coded version in gfs2_ail_empty_gl. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Use sb_start_intwrite in gfs2_ail_empty_glAndreas Gruenbacher2021-02-032-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2e60d7683c8d ("GFS2: update freeze code to use freeze/thaw_super on all nodes") optimized away the sb_start_intwrite ... sb_end_intwrite protection for the on-stack transactions in gfs2_ail_empty_gl with no explanation. I can't think of a valid reason for doing that, so revert that change. This simplifies the next commit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Clean up ail2_emptyAndreas Gruenbacher2021-01-191-17/+21
| | | | | | | | | | | | | | | | | | | | | | | | Clean up the logic in ail2_empty (no functional change). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Rename gfs2_{write => flush}_revokesAndreas Gruenbacher2021-01-193-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Function gfs2_write_revokes doesn't actually write any revokes; instead, it adds revokes to the system transaction during a flush. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Minor debugging improvementAndreas Gruenbacher2021-01-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Split the assert in gfs2_trans_end into two parts. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Some documentation updatesAndreas Gruenbacher2021-01-191-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The calc_reserved description claims that buf_limit is 502 (on 4k filesystems), but it is actually 503. Fix / clarify the entire description. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Minor gfs2_write_revokes cleanupsAndreas Gruenbacher2021-01-191-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | Clean up the computations in gfs2_write_revokes (no change in functionality). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Simplify the buf_limit and databuf_limit definitionsAndreas Gruenbacher2021-01-191-15/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The BUF_OFFSET and DATABUF_OFFSET definitions are only used in buf_limit and databuf_limit, respectively, and the rounding done in those definitions is immediately wiped out by dividing by the element size. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * gfs2: Un-obfuscate function jdesc_find_iAndreas Gruenbacher2021-01-191-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | Clean up this function to show that it is trivial. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Use resource group glock sharingBob Peterson2021-02-175-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch takes advantage of the new glock holder sharing feature for resource groups. We have already introduced local resource group locking in a previous patch, so competing accesses of local processes are already under control. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Allow node-wide exclusive glock sharingBob Peterson2021-02-172-3/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag set, the exclusive lock is shared among all local processes who are holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set. From the point of view of other nodes, the lock is still held exclusively. A future patch will start using this flag to improve performance with rgrp sharing. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Add local resource group lockingAndreas Gruenbacher2021-02-174-7/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prepare for treating resource group glocks as exclusive among nodes but shared among all tasks running on a node: introduce another layer of node-specific locking that the local tasks can use to coordinate their accesses. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Add per-reservation reserved block accountingAndreas Gruenbacher2021-02-175-28/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a rs_reserved field to struct gfs2_blkreserv to keep track of the number of blocks reserved by this particular reservation, and a rd_reserved field to struct gfs2_rgrpd to keep track of the total number of reserved blocks in the resource group. Those blocks are exclusively reserved, as opposed to the rs_requested / rd_requested blocks which are tracked in the reservation tree (rd_rstree) and which can be stolen if necessary. When making a reservation with gfs2_inplace_reserve, rs_reserved is set to somewhere between ap->min_target and ap->target depending on the number of free blocks in the resource group. When allocating blocks with gfs2_alloc_blocks, rs_reserved is decremented accordingly. Eventually, any reserved but not consumed blocks are returned to the resource group by gfs2_inplace_release. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested}Andreas Gruenbacher2021-02-173-33/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We keep track of what we've so far been referring to as reservations in rd_rstree: the nodes in that tree indicate where in a resource group we'd like to allocate the next couple of blocks for a particular inode. Local processes take those as hints, but they may still "steal" blocks from those extents, so when actually allocating a block, we must double check in the bitmap whether that block is actually still free. Likewise, other cluster nodes may "steal" such blocks as well. One of the following patches introduces resource group glock sharing, i.e., sharing of an exclusively locked resource group glock among local processes to speed up allocations. To make that work, we'll need to keep track of how many blocks we've actually reserved for each inode, so we end up with two different kinds of reservations. Distinguish these two kinds by referring to blocks which are reserved but may still be "stolen" as "requested". This rename also makes it more obvious that rs_requested and rd_requested are strongly related. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Check for active reservation in gfs2_releaseAndreas Gruenbacher2021-02-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In gfs2_release, check if the inode has an active reservation to avoid unnecessary lock taking. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Don't search for unreserved space twiceAndreas Gruenbacher2021-02-171-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If gfs2_inplace_reserve has chosen a resource group but it couldn't make a reservation there, there are too many other reservations in that resource group. In that case, don't even try to respect existing reservations in gfs2_alloc_blocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Only pass reservation down to gfs2_rbm_findAndreas Gruenbacher2021-02-172-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Only pass the current reservation down to gfs2_rbm_find rather than the entire inode; we don't need any of the other information. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Also reflect single-block allocations in rgd->rd_extfail_ptAndreas Gruenbacher2021-02-171-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass a non-NULL minext to gfs2_rbm_find even for single-block allocations. In gfs2_rbm_find, also set rgd->rd_extfail_pt when a single-block allocation fails in a resource group: there is no reason for treating that case differently. In gfs2_reservation_check_and_update, only check how many free blocks we have if more than one block is requested; we already know there's at least one free block. In addition, when allocating N blocks fails in gfs2_rbm_find, we need to set rd_extfail_pt to N - 1 rather than N: rd_extfail_pt defines the biggest allocation that might still succeed. Finally, reset rd_extfail_pt when updating the resource group statistics in update_rgrp_lvb, as we already do in gfs2_rgrp_bh_get. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Set GBF_FULL flags when reading resource groupAndreas Gruenbacher2021-01-181-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reading a resource group from disk or when receiving the resource group statistics from a Lock Value Block (LVB), set/clear the GBF_FULL flags of all bitmaps in that resource group according to whether or not the resource group is full. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Don't clear GBF_FULL flags in rs_deltreeAndreas Gruenbacher2021-01-181-24/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removing a reservation doesn't make any actual space available, so don't clear the GBF_FULL flags in that case. Otherwise, we'll only spend more time scanning the bitmaps unnecessarily. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | Revert "gfs2: Don't reject a supposedly full bitmap if we have blocks reserved"Andreas Gruenbacher2021-01-181-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit e79e0e1428188b24c3b57309ffa54a33c4ae40c4. It turns out that we're only setting the GBF_FULL flag of a bitmap if we've been scanning from the beginning of the bitmap until the end and we haven't found a single free block, and we're not skipping reservations in that process, either. This means that in gfs2_rbm_find, we can always skip bitmaps with the GBF_FULL flag set. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Minor gfs2_inplace_reserve cleanupAndreas Gruenbacher2021-01-181-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the reservation size computation logic in gfs2_inplace_reserve a little. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Get rid of unnecessary variable in gfs2_alloc_blocksAndreas Gruenbacher2021-01-181-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Variable ndata is only used inside "if (!dinode)", so it can be replaced entirely with *nblocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Only use struct gfs2_rbm for bitmap manipulationsAndreas Gruenbacher2021-01-185-97/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GFS2 uses struct gfs2_rbm to represent a filesystem block number as a bit position within a resource group. This representation is used in the bitmap manipulation code to prevent excessive conversions between block numbers and bit positions, but also in struct gfs2_blkreserv which is part of struct gfs2_inode, to mark the start of a reservation. In the inode, the bit position representation makes less sense: first, the start position is used as a block number about as often as a bit position; second, the bit position representation makes the code unnecessarily complicated and difficult to read. Therefore, change struct gfs2_blkreserv to represent the start of a reservation as a block number instead of a bit position. (This requires keeping track of the resource group in gfs2_blkreserv separately.) With that change, various things can be slightly simplified, and struct gfs2_rbm can be moved to rgrp.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * | gfs2: Turn gfs2_rbm_incr into gfs2_rbm_addAndreas Gruenbacher2021-01-181-22/+33
| | |/ | | | | | | | | | | | | | | | | | | | | | Change gfs2_rbm_incr to advance an rbm by a given number of blocks. Use that in gfs2_reservation_check_and_update to save a gfs2_rbm_to_block -> gfs2_rbm_from_block round trip. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Recursive gfs2_quota_hold in gfs2_iomap_endAndreas Gruenbacher2021-02-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When starting an iomap write, gfs2_quota_lock_check -> gfs2_quota_lock -> gfs2_quota_hold is called from gfs2_iomap_begin. At the end of the write, before unlocking the quotas, punch_hole -> gfs2_quota_hold can be called again in gfs2_iomap_end, which is incorrect and leads to a failed assertion. Instead, move the call to gfs2_quota_unlock before the call to punch_hole to fix that. Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Add trusted xattr supportAndreas Gruenbacher2021-02-083-6/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for an additional filesystem version (sb_fs_format = 1802). When a filesystem with the new version is mounted, the filesystem supports "trusted.*" xattrs. In addition, version 1802 filesystems implement a form of forward compatibility for xattrs: when xattrs with an unknown prefix (ea_type) are found on a version 1802 filesystem, those attributes are not shown by listxattr, and they are not accessible by getxattr, setxattr, or removexattr. This mechanism might turn out to be what we need in the future, but if not, we can always bump the filesystem version and break compatibility instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Andrew Price <anprice@redhat.com>
| * | gfs2: Enable rgrplvb for sb_fs_format 1802Andrew Price2021-02-083-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Turn on rgrplvb by default for sb_fs_format > 1801. Mount options still have to override this so a new args field to differentiate between 'off' and 'not specified' is added, and the new default is applied only when it's not specified. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Don't skip dlm unlock if glock has an lvbBob Peterson2021-02-051-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch fb6791d100d1 was designed to allow gfs2 to unmount quicker by skipping the step where it tells dlm to unlock glocks in EX with lvbs. This was done because when gfs2 unmounts a file system, it destroys the dlm lockspace shortly after it destroys the glocks so it doesn't need to unlock them all: the unlock is implied when the lockspace is destroyed by dlm. However, that patch introduced a use-after-free in dlm: as part of its normal dlm_recoverd process, it can call ls_recovery to recover dead locks. In so doing, it can call recover_rsbs which calls recover_lvb for any mastered rsbs. Func recover_lvb runs through the list of lkbs queued to the given rsb (if the glock is cached but unlocked, it will still be queued to the lkb, but in NL--Unlocked--mode) and if it has an lvb, copies it to the rsb, thus trying to preserve the lkb. However, when gfs2 skips the dlm unlock step, it frees the glock and its lvb, which means dlm's function recover_lvb references the now freed lvb pointer, copying the freed lvb memory to the rsb. This patch changes the check in gdlm_put_lock so that it calls dlm_unlock for all glocks that contain an lvb pointer. Fixes: fb6791d100d1 ("GFS2: skip dlm_unlock calls in unmount") Cc: stable@vger.kernel.org # v3.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Lock imbalance on error path in gfs2_recover_oneAndreas Gruenbacher2021-02-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | In gfs2_recover_one, fix a sd_log_flush_lock imbalance when a recovery pass fails. Fixes: c9ebc4b73799 ("gfs2: allow journal replay to hold sd_log_flush_lock") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: keep bios separate for each journalBob Peterson2021-01-255-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | The recovery func can recover multiple journals, but they were all using the same bio. This resulted in use-after-free related to sdp->sd_log_bio. This patch moves the variable to the journal descriptor, jd, so that every recovery can operate on its own bio. And hopefully we never run out. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | gfs2: fix glock confusion in function signal_our_withdrawBob Peterson2021-01-251-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If go_free is defined, function signal_our_withdraw is supposed to synchronize on the GLF_FREEING flag of the inode glock, but it accidentally does that on the live glock. Fix that and disambiguate the glock variables. Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | Revert "GFS2: Re-add a call to log_flush_wait when flushing the journal"Bob Peterson2021-01-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 428fd95d859b24fea448380fa21ad6d841b34241. Patch 428fd95d85b2 added a call to log_flush_wait to function gfs2_log_flush. Then gfs2_log_flush calls log_write_header which submits a write request with the REQ_PREFLUSH flag which also forces it to wait. This patch removes the unnecessary call to log_flush_wait. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | gfs2: Fix invalid block size messageAndrew Price2021-01-251-1/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | gfs2: amend SLAB_RECLAIM_ACCOUNT on gfs2 related slab cacheZhaoyang Huang2021-01-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As gfs2_quotad_cachep and gfs2_glock_cachep have registered shrinkers, amending SLAB_RECLAIM_ACCOUNT when creating them, which improves slab accounting. Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: make gfs2_log_write_page staticBob Peterson2020-12-312-2/+1
| | | | | | | | | | | | | | | | | | | | | Function gfs2_log_write_page is only used in lops.c, so make it static. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: move freeze glock outside the make_fs_rw and _ro functionsBob Peterson2020-12-233-39/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, sister functions gfs2_make_fs_rw and gfs2_make_fs_ro locked (held) the freeze glock by calling gfs2_freeze_lock and gfs2_freeze_unlock. The problem is, not all the callers of gfs2_make_fs_ro should be doing this. The three callers of gfs2_make_fs_ro are: remount (gfs2_reconfigure), signal_our_withdraw, and unmount (gfs2_put_super). But when unmounting the file system we can get into the following circular lock dependency: deactivate_super down_write(&s->s_umount); <-------------------------------------- s_umount deactivate_locked_super gfs2_kill_sb kill_block_super generic_shutdown_super gfs2_put_super gfs2_make_fs_ro gfs2_glock_nq_init sd_freeze_gl freeze_go_sync if (freeze glock in SH) freeze_super (vfs) down_write(&sb->s_umount); <------- s_umount This patch moves the hold of the freeze glock outside the two sister rw/ro functions to their callers, but it doesn't request the glock from gfs2_put_super, thus eliminating the circular dependency. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>