| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that xfs_attri_set_iter() has initial states for removing
attributes, switch the pure attribute removal code over to using it.
This requires attrs being removed to always be marked as INCOMPLETE
before we start the removal due to the fact we look up the attr to
remove again in xfs_attr_node_remove_attr().
Note: this drops the fillstate/refillstate optimisations from
the remove path that avoid having to look up the path again after
setting the incomplete flag and removing remote attrs. Restoring
that optimisation to this path is future Dave's problem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We need to merge the add and remove code paths to enable safe
recovery of replace operations. Hoist the initial remove states from
xfs_attr_remove_iter into xfs_attr_set_iter. We will make use of
them in the next patches.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that the full xfs_attr_set_iter() state machine always
terminates with either the state being XFS_DAS_DONE on success or
an error on failure, we can get rid of the need for it to return
-EAGAIN whenever it needs to roll the transaction before running
the next state.
That is, we don't need to spray -EAGAIN return states everywhere,
the caller just check the state machine state for completion to
determine what action should be taken next. This greatly simplifies
the code within the state machine implementation as it now only has
to handle 0 for success or -errno for error and it doesn't need to
tell the caller to retry.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Clean up the final leaf/node states in xfs_attr_set_iter() to
further simplify the high level state machine and to set the
completion state correctly. As we are adding a separate state
for node format removal, we need to ensure that node formats
are collapsed back to shortform or empty correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We may not have a remote value for the old xattr we have to remove,
so skip over the remote value removal states and go straight to
the xattr name removal in the leaf/node block.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We can skip the REPLACE state when LARP is enabled, but that means
the XFS_DAS_FLIP_LFLAG state is now poorly named - it indicates
something that has been done rather than what the state is going to
do. Rename it to "REMOVE_OLD" to indicate that we are now going to
perform removal of the old attr.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When we set a new xattr, we have three exit paths:
1. nothing else to do
2. allocate and set the remote xattr value
3. perform the rest of a replace operation
Currently we push both 2 and 3 into the same state, regardless of
whether we just set a remote attribute or not. Once we've set the
remote xattr, we have two exit states:
1. nothing else to do
2. perform the rest of a replace operation
Hence we can split the remote xattr allocation and setting into
their own states and factor it out of xfs_attr_set_iter() to further
clean up the state machine and the implementation of the state
machine.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The operations performed from XFS_DAS_FOUND_LBLK through to
XFS_DAS_RM_LBLK are now identical to XFS_DAS_FOUND_NBLK through to
XFS_DAS_RM_NBLK. We can collapse these down into a single set of
code.
To do this, define the states that leaf and node run through as
separate sets of sequential states. Then as we move to the next
state, we can use increments rather than specific state assignments
to move through the states. This means the state progression is set
by the initial state that enters the series and we don't need to
duplicate the code anymore.
At the exit point of the series we need to select the correct leaf
or node state, but that can also be done by state increment rather
than assignment.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We re-enter the XFS_DAS_FOUND_LBLK state when we have to allocate
multiple extents for a remote xattr. We currently have a flag
called XFS_DAC_LEAF_ADDNAME_INIT to avoid running the remote attr
hole finding code more than once.
However, for the node format tree, we have a separate state for this
so we never reenter the state machine at XFS_DAS_FOUND_NBLK and so
it does not need a special flag to skip over the remote attr hold
finding code.
Convert the leaf block code to use the same state machine as the
node blocks and kill the XFS_DAC_LEAF_ADDNAME_INIT flag.
This further points out that this "ALLOC" state is only traversed
if we have remote xattrs or we are doing a rename operation. Rename
both the leaf and node alloc states to _ALLOC_RMT to indicate they
are iterating to do allocation of remote xattr blocks.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We current use XFS_DAS_UNINIT for several steps in the attr_set
state machine. We use it for setting shortform xattrs, converting
from shortform to leaf, leaf add, leaf-to-node and leaf add. All of
these things are essentially known before we start the state machine
iterating, so we really should separate them out:
XFS_DAS_SF_ADD:
- tries to do a shortform add
- on success -> done
- on ENOSPC converts to leaf, -> XFS_DAS_LEAF_ADD
- on error, dies.
XFS_DAS_LEAF_ADD:
- tries to do leaf add
- on success:
- inline attr -> done
- remote xattr || REPLACE -> XFS_DAS_FOUND_LBLK
- on ENOSPC converts to node, -> XFS_DAS_NODE_ADD
- on error, dies
XFS_DAS_NODE_ADD:
- tries to do node add
- on success:
- inline attr -> done
- remote xattr || REPLACE -> XFS_DAS_FOUND_NBLK
- on error, dies
This makes it easier to understand how the state machine starts
up and sets us up on the path to further state machine
simplifications.
This also converts the DAS state tracepoints to use strings rather
than numbers, as converting between enums and numbers requires
manual counting rather than just reading the name.
This also introduces a XFS_DAS_DONE state so that we can trace
successful operation completions easily.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Logged attribute intents only have set and remove types - there is
no separate intent type for a replace operation. We should have a
separate type for a replace operation, as it needs to perform
operations that neither SET or REMOVE can perform.
Add this type to the intent items and rearrange the deferred
operation setup to reflect the different operations we are
performing.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We currently set it and hold it when converting from short to leaf
form, then release it only to immediately look it back up again
to do the leaf insert.
Do a bit of refactoring to xfs_attr_leaf_try_add() to avoid this
messy handling of the newly allocated leaf buffer.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
generic/642 triggered a reproducable assert failure in
xlog_cil_commit() that resulted from a xfs_attr_set() committing
an empty but dirty transaction. When the CIL is empty and this
occurs, xlog_cil_commit() tries a background push and this triggers
a "pushing an empty CIL" assert.
XFS: Assertion failed: !list_empty(&cil->xc_cil), file: fs/xfs/xfs_log_cil.c, line: 1274
Call Trace:
<TASK>
xlog_cil_commit+0xa5a/0xad0
__xfs_trans_commit+0xb8/0x330
xfs_trans_commit+0x10/0x20
xfs_attr_set+0x3e2/0x4c0
xfs_xattr_set+0x8d/0xe0
__vfs_setxattr+0x6b/0x90
__vfs_setxattr_noperm+0x76/0x220
__vfs_setxattr_locked+0xdf/0x100
vfs_setxattr+0x94/0x170
setxattr+0x110/0x200
path_setxattr+0xbf/0xe0
__x64_sys_setxattr+0x2b/0x30
do_syscall_64+0x35/0x80
The problem is related to the breakdown of attribute addition in
xfs_attr_set_iter() and how it is called from deferred operations.
When we have a pure leaf xattr insert, we add the xattr to the leaf
and set the next state to XFS_DAS_FOUND_LBLK and return -EAGAIN.
This requeues the xattr defered work, rolls the transaction and
runs xfs_attr_set_iter() again. This then checks the xattr for
being remote (it's not) and whether a replace op is being done (this
is a create op) and if neither are true it returns without having
done anything.
xfs_xattri_finish_update() then unconditionally sets the transaction
dirty, and the deferops finishes and returns to __xfs_trans_commit()
which sees the transaction dirty and tries to commit it by calling
xlog_cil_commit(). The transaction is empty, and then the assert
fires if this happens when the CIL is empty.
This patch addresses the structure of xfs_attr_set_iter() that
requires re-entry on leaf add even when nothing will be done. This
gets rid of the trailing empty transaction and so doesn't trigger
the XFS_TRANS_DIRTY assignment in xfs_xattri_finish_update()
incorrectly. Addressing that is for a different patch.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add an error tag on xfs_attr3_leaf_to_node to test log attribute
recovery and replay.
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add an error tag on xfs_da3_split to test log attribute recovery
and replay.
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Quick helper function to collapse duplicate code to initialize
transactions for attributes
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds a helper function xfs_attr_leaf_addname. While this
does help to break down xfs_attr_set_iter, it does also hoist out some
of the state management. This patch has been moved to the end of the
clean up series for further discussion.
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a clean up patch that merges xfs_delattr_context into
xfs_attr_item. Now that the refactoring is complete and the delayed
operation infrastructure is in place, we can combine these to eliminate
the extra struct
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds a debug option to enable log attribute replay. Eventually
this can be removed when delayed attrs becomes permanent.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds an error tag that we can use to test log attribute
recovery and replay
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Remove xfs_attr_set_args, xfs_attr_remove_args, and xfs_attr_trans_roll.
These high level loops are now driven by the delayed operations code,
and can be removed.
Additionally collapse in the leaf_bp parameter of xfs_attr_set_iter
since we only have one caller that passes dac->leaf_bp
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
These routines set up and queue a new deferred attribute operations.
These functions are meant to be called by any routine needing to
initiate a deferred attribute operation as opposed to the existing
inline operations. New helper function xfs_attr_item_init also added.
Finally enable delayed attributes in xfs_attr_set and xfs_attr_remove.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a clean up patch that skips the flip flag logic for delayed attr
renames. Since the log replay keeps the inode locked, we do not need to
worry about race windows with attr lookups. So we can skip over
flipping the flag and the extra transaction roll for it
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds the needed routines to create, log and recover logged
extended attribute intents.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently attributes are modified directly across one or more
transactions. But they are not logged or replayed in the event of an
error. The goal of log attr replay is to enable logging and replaying
of attribute operations using the existing delayed operations
infrastructure. This will later enable the attributes to become part of
larger multi part operations that also must first be recorded to the
log. This is mostly of interest in the scheme of parent pointers which
would need to maintain an attribute containing parent inode information
any time an inode is moved, created, or removed. Parent pointers would
then be of interest to any feature that would need to quickly derive an
inode path from the mount point. Online scrub, nfs lookups and fs grow
or shrink operations are all features that could take advantage of this.
This patch adds two new log item types for setting or removing
attributes as deferred operations. The xfs_attri_log_item will log an
intent to set or remove an attribute. The corresponding
xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
freed once the transaction is done. Both log items use a generic
xfs_attr_log_format structure that contains the attribute name, value,
flags, inode, and an op_flag that indicates if the operations is a set
or remove.
[dchinner: added extra little bits needed for intent whiteouts]
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
During an attr rename operation, blocks are saved for later removal
as rmtblkno2. The rmtblkno is used in the case of needing to alloc
more blocks if not enough were available. However, in the case
that no further blocks need to be added or removed, we can return as soon
as xfs_attr_node_addname completes, rather than rolling the transaction
with an -EAGAIN return. This extra loop does not hurt anything right
now, but it will be a problem later when we get into log items because
we end up with an empty log transaction. So, add a simple check to
cut out the unneeded iteration.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new deferred attr patch set uncovered a double unlock in the
recent port of the defer ops capture and continue code. During log
recovery, we're allowed to hold buffers to a transaction that's being
used to replay an intent item. When we capture the resources as part
of scheduling a continuation of an intent chain, we call xfs_buf_hold
to retain our reference to the buffer beyond the transaction commit,
but we do /not/ call xfs_trans_bhold to maintain the buffer lock.
This means that xfs_defer_ops_continue needs to relock the buffers
before xfs_defer_restore_resources joins then tothe new transaction.
Additionally, the buffers should not be passed back via the dres
structure since they need to remain locked unlike the inodes. So
simply set dr_bufs to zero after populating the dres structure.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We don't check that the v4 feature flags taht v5 requires to be set
are actually set anywhere. Do this check when we see that the
filesystem is a v5 filesystem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While xfs_has_nlink() is not used in kernel, it is used in userspace
(e.g. by xfs_db) so we need to set the XFS_FEAT_NLINK flag correctly
in xfs_sb_version_to_features().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
xfs_repair catches fork size/format mismatches, but the in-kernel
verifier doesn't, leading to null pointer failures when attempting
to perform operations on the fork. This can occur in the
xfs_dir_is_empty() where the in-memory fork format does not match
the size and so the fork data pointer is accessed incorrectly.
Note: this causes new failures in xfs/348 which is testing mode vs
ftype mismatches. We now detect a regular file that has been changed
to a directory or symlink mode as being corrupt because the data
fork is for a symlink or directory should be in local form when
there are only 3 bytes of data in the data fork. Hence the inode
verify for the regular file now fires w/ -EFSCORRUPTED because
the inode fork format does not match the format the corrupted mode
says it should be in.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
To catch the obvious graph cycle problem and hence potential endless
looping.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.19-for-next
xfs: fix reflink inefficiencies
As Dave Chinner has complained about on IRC, there are a couple of
things about reflink that are very inefficient. First of all, we
limited the size of all bunmapi operations to avoid flooding the log
with defer ops in the worst case, but recent changes to the defer
ops code have solved that problem, so get rid of the bunmapi length
clamp.
Second, the log reservations for reflink operations are far far
larger than they need to be. Shrink them to exactly what we need to
handle each deferred RUI and CUI log item, and no more. Also reduce
logcount because we don't need 8 rolls per operation. Introduce a
transaction reservation compatibility layer to avoid changing the
minimum log size calculations.
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
These functions return the maximum number of blocks that could be logged
in a particular transaction. "log count" is confusing since there's a
separate concept of a log (operation) count in the reservation code, so
let's change it to "block count" to be less confusing.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Before to the introduction of deferred refcount operations, reflink
would try to cram refcount btree updates into the same transaction as an
allocation or a free event. Mainline XFS has never actually done that,
but we never refactored the transaction reservations to reflect that we
now do all refcount updates in separate transactions. Fix this to
reduce the transaction reservation size even farther, so that between
this patch and the previous one, we reduce the tr_write and tr_itruncate
sizes by 66%.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Back in the early days of reflink and rmap development I set the
transaction reservation sizes to be overly generous for rmap+reflink
filesystems, and a little under-generous for rmap-only filesystems.
Since we don't need *eight* transaction rolls to handle three new log
intent items, decrease the logcounts to what we actually need, and amend
the shadow reservation computation function to reflect what we used to
do so that the minimum log size doesn't change.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Move the tracepoint that computes the size of the transaction used to
compute the minimum log size into xfs_log_get_max_trans_res so that we
only have to compute this stuff once.
Leave xfs_log_get_max_trans_res as a non-static function so that xfs_db
can call it to report the results of the userspace computation of the
same value to diagnose mkfs/kernel misinteractions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Every time someone changes the transaction reservation sizes, they
introduce potential compatibility problems if the changes affect the
minimum log size that we validate at mount time. If the minimum log
size gets larger (which should be avoided because doing so presents a
serious risk of log livelock), filesystems created with old mkfs will
not mount on a newer kernel; if the minimum size shrinks, filesystems
created with newer mkfs will not mount on older kernels.
Therefore, enable the creation of a shadow log reservation structure
where we can "undo" the effects of tweaks when computing minimum log
sizes. These shadow reservations should never be used in practice, but
they insulate us from perturbations in minimum log size.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In commit e1a4e37cc7b6, we clamped the length of bunmapi calls on the
data forks of shared files to avoid two failure scenarios: one where the
extent being unmapped is so sparsely shared that we exceed the
transaction reservation with the sheer number of refcount btree updates
and EFI intent items; and the other where we attach so many deferred
updates to the transaction that we pin the log tail and later the log
head meets the tail, causing the log to livelock.
We avoid triggering the first problem by tracking the number of ops in
the refcount btree cursor and forcing a requeue of the refcount intent
item any time we think that we might be close to overflowing. This has
been baked into XFS since before the original e1a4 patch.
A recent patchset fixed the second problem by changing the deferred ops
code to finish all the work items created by each round of trying to
complete a refcount intent item, which eliminates the long chains of
deferred items (27dad); and causing long-running transactions to relog
their intent log items when space in the log gets low (74f4d).
Because this clamp affects /any/ unmapping request regardless of the
sharing factors of the component blocks, it degrades the performance of
all large unmapping requests -- whereas with an unshared file we can
unmap millions of blocks in one go, shared files are limited to
unmapping a few thousand blocks at a time, which causes the upper level
code to spin in a bunmapi loop even if it wasn't needed.
This also eliminates one more place where log recovery behavior can
differ from online behavior, because bunmapi operations no longer need
to requeue. The fstest generic/447 was created to test the old fix, and
it still passes with this applied.
Partial-revert-of: e1a4e37cc7b6 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent")
Depends: 27dada070d59 ("xfs: change the order in which child and parent defer ops ar finished")
Depends: 74f4d6a1e065 ("xfs: only relog deferred intent items if free space in the log gets low")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A long time ago, I added to XFS the ability to use deferred reference
count operations as part of a transaction chain. This enabled us to
avoid blowing out the transaction reservation when the blocks in a
physical extent all had different reference counts because we could ask
the deferred operation manager for a continuation, which would get us a
clean transaction.
The refcount code asks for a continuation when the number of refcount
record updates reaches the point where we think that the transaction has
logged enough full btree blocks due to refcount (and free space) btree
shape changes and refcount record updates that we're in danger of
overflowing the transaction.
We did not previously count the EFIs logged to the refcount update
transaction because the clamps on the length of a bunmap operation were
sufficient to avoid overflowing the transaction reservation even in the
worst case situation where every other block of the unmapped extent is
shared.
Unfortunately, the restrictions on bunmap length avoid failure in the
worst case by imposing a maximum unmap length of ~3000 blocks, even for
non-pathological cases. This seriously limits performance when freeing
large extents.
Therefore, track EFIs with the same counter as refcount record updates,
and use that information as input into when we should ask for a
continuation. This enables the next patch to drop the clumsy bunmap
limitation.
Depends: 27dada070d59 ("xfs: change the order in which child and parent defer ops ar finished")
Depends: 74f4d6a1e065 ("xfs: only relog deferred intent items if free space in the log gets low")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|\| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.19-for-next
xfs: fix rmap inefficiencies
Reduce the performance impact of the reverse mapping btree when
reflink is enabled by using the much faster non-overlapped btree
lookup functions when we're searching the rmap index with a fully
specified key. If we find the exact record we're looking for,
great! We don't have to perform the full overlapped scan. For
filesystems with high sharing factors this reduces the xfs_scrub
runtime by a good 15%%.
This has been shown to reduce the fstests runtime for realtime rmap
configurations by 30%%, since the lack of AGs severely limits
scalability.
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Reverse mapping on a reflink-capable filesystem has some pretty high
overhead when performing file operations. This is because the rmap
records for logically and physically adjacent extents might not be
adjacent in the rmap index due to data block sharing. As a result, we
use expensive overlapped-interval btree search, which walks every record
that overlaps with the supplied key in the hopes of finding the record.
However, profiling data shows that when the index contains a record that
is an exact match for a query key, the non-overlapped btree search
function can find the record much faster than the overlapped version.
Try the non-overlapped lookup first when we're trying to find the left
neighbor rmap record for a given file mapping, which makes unwritten
extent conversion and remap operations run faster if data block sharing
is minimal in this part of the filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Reverse mapping on a reflink-capable filesystem has some pretty high
overhead when performing file operations. This is because the rmap
records for logically and physically adjacent extents might not be
adjacent in the rmap index due to data block sharing. As a result, we
use expensive overlapped-interval btree search, which walks every record
that overlaps with the supplied key in the hopes of finding the record.
However, profiling data shows that when the index contains a record that
is an exact match for a query key, the non-overlapped btree search
function can find the record much faster than the overlapped version.
Try the non-overlapped lookup first, which will make scrub run much
faster.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Most callers of xfs_rmap_lookup_le will retrieve the btree record
immediately if the lookup succeeds. The overlapped version of this
function (xfs_rmap_lookup_le_range) will return the record if the lookup
succeeds, so make the regular version do it too. Get rid of the useless
len argument, since it's not part of the lookup key.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Intent whiteouts will require extra work to be done during
transaction commit if the transaction contains an intent done item.
To determine if a transaction contains an intent done item, we want
to avoid having to walk all the items in the transaction to check if
they are intent done items. Hence when we add an intent done item to
a transaction, tag the transaction to indicate that it contains such
an item.
We don't tag the transaction when the defer ops is relogging an
intent to move it forward in the log. Whiteouts will never apply to
these cases, so we don't need to bother looking for them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the first operation in a string of defer ops has no intents,
then there is no reason to commit it before running the first call
to xfs_defer_finish_one(). This allows the defer ops to be used
effectively for non-intent based operations without requiring an
unnecessary extra transaction commit when first called.
This fixes a regression in per-attribute modification transaction
count when delayed attributes are not being used.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Callers currently have to round out the size of buffers to match the
aligment constraints of log iovecs and xlog_write(). They should not
need to know this detail, so introduce a new function to calculate
the iovec length (for use in ->iop_size implementations). Also
modify xlog_finish_iovec() to round up the length to the correct
alignment so the callers don't need to do this, either.
Convert the only user - inode forks - of this alignment rounding to
use the new interface.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
When we first allocate or resize an inline inode fork, we round up
the allocation to 4 byte alingment to make journal alignment
constraints. We don't clear the unused bytes, so we can copy up to
three uninitialised bytes into the journal. Zero those bytes so we
only ever copy zeros into the journal.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
into xfs-5.19-for-next
xfs: Large extent counters
The commit xfs: fix inode fork extent count overflow
(3f8a4f1d876d3e3e49e50b0396eaffcc4ba71b08) mentions that 10 billion
data fork extents should be possible to create. However the
corresponding on-disk field has a signed 32-bit type. Hence this
patchset extends the per-inode data fork extent counter to 64 bits
(out of which 48 bits are used to store the extent count).
Also, XFS has an attribute fork extent counter which is 16 bits
wide. A workload that,
1. Creates 1 million 255-byte sized xattrs,
2. Deletes 50% of these xattrs in an alternating manner,
3. Tries to insert 400,000 new 255-byte sized xattrs
causes the xattr extent counter to overflow.
Dave tells me that there are instances where a single file has more
than 100 million hardlinks. With parent pointers being stored in
xattrs, we will overflow the signed 16-bits wide attribute extent
counter when large number of hardlinks are created. Hence this
patchset extends the on-disk field to 32-bits.
The following changes are made to accomplish this,
1. A 64-bit inode field is carved out of existing di_pad and
di_flushiter fields to hold the 64-bit data fork extent counter.
2. The existing 32-bit inode data fork extent counter will be used to
hold the attribute fork extent counter.
3. A new incompat superblock flag to prevent older kernels from mounting
the filesystem.
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This commit enables XFS module to work with fs instances having 64-bit
per-inode extent counters by adding XFS_SB_FEAT_INCOMPAT_NREXT64 flag to the
list of supported incompat feature flags.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|