diff options
author | Josef Bacik <josef@toxicpanda.com> | 2023-08-24 16:59:22 -0400 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2023-09-08 14:10:49 +0200 |
commit | 77d20c685b6baeb942606a93ed861c191381b73e (patch) | |
tree | f134317ea546b82ec58c2ef58e3a525831093ef1 /fs/btrfs/disk-io.c | |
parent | ee34a82e890a7babb5585daf1a6dd7d4d1cf142a (diff) | |
download | linux-77d20c685b6baeb942606a93ed861c191381b73e.tar.gz linux-77d20c685b6baeb942606a93ed861c191381b73e.tar.bz2 linux-77d20c685b6baeb942606a93ed861c191381b73e.zip |
btrfs: do not block starts waiting on previous transaction commit
Internally I got a report of very long stalls on normal operations like
creating a new file when auto relocation was running. The reporter used
the 'bpf offcputime' tracer to show that we would get stuck in
start_transaction for 5 to 30 seconds, and were always being woken up by
the transaction commit.
Using my timing-everything script, which times how long a function takes
and what percentage of that total time is taken up by its children, I
saw several traces like this
1083 took 32812902424 ns
29929002926 ns 91.2110% wait_for_commit_duration
25568 ns 7.7920e-05% commit_fs_roots_duration
1007751 ns 0.00307% commit_cowonly_roots_duration
446855602 ns 1.36182% btrfs_run_delayed_refs_duration
271980 ns 0.00082% btrfs_run_delayed_items_duration
2008 ns 6.1195e-06% btrfs_apply_pending_changes_duration
9656 ns 2.9427e-05% switch_commit_roots_duration
1598 ns 4.8700e-06% btrfs_commit_device_sizes_duration
4314 ns 1.3147e-05% btrfs_free_log_root_tree_duration
Here I was only tracing functions that happen where we are between
START_COMMIT and UNBLOCKED in order to see what would be keeping us
blocked for so long. The wait_for_commit() we do is where we wait for a
previous transaction that hasn't completed it's commit. This can
include all of the unpin work and other cleanups, which tends to be the
longest part of our transaction commit.
There is no reason we should be blocking new things from entering the
transaction at this point, it just adds to random latency spikes for no
reason.
Fix this by adding a PREP stage. This allows us to properly deal with
multiple committers coming in at the same time, we retain the behavior
that the winner waits on the previous transaction and the losers all
wait for this transaction commit to occur. Nothing else is blocked
during the PREP stage, and then once the wait is complete we switch to
COMMIT_START and all of the same behavior as before is maintained.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs/btrfs/disk-io.c')
-rw-r--r-- | fs/btrfs/disk-io.c | 8 |
1 files changed, 4 insertions, 4 deletions
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 0a96ea8c1d3a..639f1e308e4c 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1547,7 +1547,7 @@ static int transaction_kthread(void *arg) delta = ktime_get_seconds() - cur->start_time; if (!test_and_clear_bit(BTRFS_FS_COMMIT_TRANS, &fs_info->flags) && - cur->state < TRANS_STATE_COMMIT_START && + cur->state < TRANS_STATE_COMMIT_PREP && delta < fs_info->commit_interval) { spin_unlock(&fs_info->trans_lock); delay -= msecs_to_jiffies((delta - 1) * 1000); @@ -2682,8 +2682,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) btrfs_lockdep_init_map(fs_info, btrfs_trans_num_extwriters); btrfs_lockdep_init_map(fs_info, btrfs_trans_pending_ordered); btrfs_lockdep_init_map(fs_info, btrfs_ordered_extent); - btrfs_state_lockdep_init_map(fs_info, btrfs_trans_commit_start, - BTRFS_LOCKDEP_TRANS_COMMIT_START); + btrfs_state_lockdep_init_map(fs_info, btrfs_trans_commit_prep, + BTRFS_LOCKDEP_TRANS_COMMIT_PREP); btrfs_state_lockdep_init_map(fs_info, btrfs_trans_unblocked, BTRFS_LOCKDEP_TRANS_UNBLOCKED); btrfs_state_lockdep_init_map(fs_info, btrfs_trans_super_committed, @@ -4870,7 +4870,7 @@ static int btrfs_cleanup_transaction(struct btrfs_fs_info *fs_info) while (!list_empty(&fs_info->trans_list)) { t = list_first_entry(&fs_info->trans_list, struct btrfs_transaction, list); - if (t->state >= TRANS_STATE_COMMIT_START) { + if (t->state >= TRANS_STATE_COMMIT_PREP) { refcount_inc(&t->use_count); spin_unlock(&fs_info->trans_lock); btrfs_wait_for_commit(fs_info, t->transid); |