diff options
author | Dave Chinner <dchinner@redhat.com> | 2020-06-29 14:48:46 -0700 |
---|---|---|
committer | Darrick J. Wong <darrick.wong@oracle.com> | 2020-07-06 10:46:58 -0700 |
commit | 1319ebefd6ed7a9988b7b4bc9317fbcf61a28bfc (patch) | |
tree | f55ab46d6a6476679a7dd7f12bed8c3580821dd9 /fs/xfs/xfs_inode_item.h | |
parent | 1dfde687a65fec73e6914c184ecf8e9e54ccfe74 (diff) | |
download | linux-1319ebefd6ed7a9988b7b4bc9317fbcf61a28bfc.tar.gz linux-1319ebefd6ed7a9988b7b4bc9317fbcf61a28bfc.tar.bz2 linux-1319ebefd6ed7a9988b7b4bc9317fbcf61a28bfc.zip |
xfs: add an inode item lock
The inode log item is kind of special in that it can be aggregating
new changes in memory at the same time time existing changes are
being written back to disk. This means there are fields in the log
item that are accessed concurrently from contexts that don't share
any locking at all.
e.g. updating ili_last_fields occurs at flush time under the
ILOCK_EXCL and flush lock at flush time, under the flush lock at IO
completion time, and is read under the ILOCK_EXCL when the inode is
logged. Hence there is no actual serialisation between reading the
field during logging of the inode in transactions vs clearing the
field in IO completion.
We currently get away with this by the fact that we are only
clearing fields in IO completion, and nothing bad happens if we
accidentally log more of the inode than we actually modify. Worst
case is we consume a tiny bit more memory and log bandwidth.
However, if we want to do more complex state manipulations on the
log item that requires updates at all three of these potential
locations, we need to have some mechanism of serialising those
operations. To do this, introduce a spinlock into the log item to
serialise internal state.
This could be done via the xfs_inode i_flags_lock, but this then
leads to potential lock inversion issues where inode flag updates
need to occur inside locks that best nest inside the inode log item
locks (e.g. marking inodes stale during inode cluster freeing).
Using a separate spinlock avoids these sorts of problems and
simplifies future code.
This does not touch the use of ili_fields in the item formatting
code - that is entirely protected by the ILOCK_EXCL at this point in
time, so it remains untouched.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Diffstat (limited to 'fs/xfs/xfs_inode_item.h')
-rw-r--r-- | fs/xfs/xfs_inode_item.h | 18 |
1 files changed, 15 insertions, 3 deletions
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h index 4de5070e0765..4a10a1b92ee9 100644 --- a/fs/xfs/xfs_inode_item.h +++ b/fs/xfs/xfs_inode_item.h @@ -16,12 +16,24 @@ struct xfs_mount; struct xfs_inode_log_item { struct xfs_log_item ili_item; /* common portion */ struct xfs_inode *ili_inode; /* inode ptr */ - xfs_lsn_t ili_flush_lsn; /* lsn at last flush */ - xfs_lsn_t ili_last_lsn; /* lsn at last transaction */ - unsigned short ili_lock_flags; /* lock flags */ + unsigned short ili_lock_flags; /* inode lock flags */ + /* + * The ili_lock protects the interactions between the dirty state and + * the flush state of the inode log item. This allows us to do atomic + * modifications of multiple state fields without having to hold a + * specific inode lock to serialise them. + * + * We need atomic changes between inode dirtying, inode flushing and + * inode completion, but these all hold different combinations of + * ILOCK and iflock and hence we need some other method of serialising + * updates to the flush state. + */ + spinlock_t ili_lock; /* flush state lock */ unsigned int ili_last_fields; /* fields when flushed */ unsigned int ili_fields; /* fields to be logged */ unsigned int ili_fsync_fields; /* logged since last fsync */ + xfs_lsn_t ili_flush_lsn; /* lsn at last flush */ + xfs_lsn_t ili_last_lsn; /* lsn at last transaction */ }; static inline int xfs_inode_clean(xfs_inode_t *ip) |