summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/alloc_background.c
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Check for bad needs_discard before doing discardKent Overstreet2024-04-021-21/+26
| | | | | | | | | | | In the discard worker, we were failing to validate the bucket state - meaning a corrupt needs_discard btree could cause us to discard a bucket that we shouldn't. If check_alloc_info hasn't run yet we just want to bail out, otherwise it's a filesystem inconsistent error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve bch2_fatal_error()Kent Overstreet2024-03-181-1/+1
| | | | | | error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init()Kent Overstreet2024-03-171-6/+7
| | | | | | | | | | | | Nested transaction restart handling is typically best avoided; when the inner context handles a transaction restart it invalidates the outer transaction context, so we need to make sure to return a transaction_restart_nested error. This code wasn't doing that, and hit the assertion in for_each_btree_key() that checks for that via trans->restart_count. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: reconstruct_alloc cleanupKent Overstreet2024-03-131-33/+29
| | | | | | | | | Now that we've got the errors_silent mechanism, we don't have to check if the reconstruct_alloc option is set all over the place. Also - users no longer have to explicitly select fsck and fix_errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out discard fastpathKent Overstreet2024-03-131-6/+140
| | | | | | | | | | | | | | | | | | | | | | Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_trigger_alloc() handles state changes betterKent Overstreet2024-03-131-8/+13
| | | | | | | | | | | | | | bch2_trigger_alloc() kicks off certain tasks on bucket state changes; e.g. triggering the bucket discard worker and the invalidate worker. We've observed the discard worker running too often - most runs it doesn't do any work, according to the tracepoint - so clearly, we're kicking it off too often. This adds an explicit statechange() macro to make these checks more precise. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: discard path uses unlock_long()Kent Overstreet2024-01-241-1/+1
| | | | | | | Some (bad) devices can have really terrible discard latency; we don't want them blocking memory reclaim and causing warnings. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Avoid flushing the journal in the discard pathKent Overstreet2024-01-211-19/+41
| | | | | | | | | | | When issuing discards, we may need to flush the journal if there's too many buckets that can't be discarded until a journal flush. But the heuristic was bad; we should be comparing the number of buckets that need to flushes against the number of free buckets, not the number of buckets we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: helpers for printing data typesKent Overstreet2024-01-211-6/+3
| | | | | | We need bounds checking since new versions may introduce new data types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BTREE_TRIGGER_ATOMICKent Overstreet2024-01-211-1/+1
| | | | | | Add a new flag to be explicit about when we're running atomic triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: drop to_text code for obsolete bps in alloc keysKent Overstreet2024-01-211-18/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fsck_err()s don't need to manually check c->sb.version anymoreKent Overstreet2024-01-051-3/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: unify alloc triggerKent Overstreet2024-01-051-153/+127
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: move bch2_mark_alloc() to alloc_background.cKent Overstreet2024-01-051-0/+108
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: trans_mark now takes bkey_sKent Overstreet2024-01-051-11/+11
| | | | | | | | Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_iter -> btree_path_idx_tKent Overstreet2024-01-011-2/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: for_each_member_device_rcu() now declares loop iterKent Overstreet2024-01-011-3/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: for_each_member_device() now declares loop iterKent Overstreet2024-01-011-13/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: for_each_btree_key() now declares loop iterKent Overstreet2024-01-011-27/+9
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto()Kent Overstreet2024-01-011-1/+1
| | | | | | And for_each_btree_key2_upto -> for_each_btree_key_upto Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_err_(fn|msg) check if should printKent Overstreet2024-01-011-2/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()Kent Overstreet2024-01-011-9/+9
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill for_each_btree_key()Kent Overstreet2024-01-011-17/+14
| | | | | | | | | | | | | for_each_btree_key() handles transaction restarts, like for_each_btree_key2(), but only calls bch2_trans_begin() after a transaction restart - for_each_btree_key2() wraps every loop iteration in a transaction. The for_each_btree_key() behaviour is problematic when it leads to holding the SRCU lock that prevents key cache reclaim for an unbounded amount of time - there's no real need to keep it around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Explicity go RW for fsckKent Overstreet2024-01-011-11/+6
| | | | | | | This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: count_event()Kent Overstreet2024-01-011-1/+1
| | | | | | Small helper for event counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush()Kent Overstreet2024-01-011-1/+1
| | | | | | More accurate naming. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Refactor bch2_check_alloc_to_lru_ref()Kent Overstreet2024-01-011-29/+25
| | | | | | | | | | | This code was somewhat convoluted - because originally bch2_lru_set() could modify the LRU index if there was a collision. That's no longer the case, so the "create LRU entry" path has no reason to update the alloc key, so we can separate the handling of the two fsck errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New bucket sector count helpersKent Overstreet2024-01-011-13/+9
| | | | | | | This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(), prep work for separately accounting stripe sectors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc()Kent Overstreet2024-01-011-0/+13
| | | | | | | bch2_update_cached_sectors_list() is closer to how the new disk space accounting works, called from trans_mark(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename BTREE_INSERT flagsKent Overstreet2024-01-011-15/+15
| | | | | | | BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix locking when checking freespace btreeKent Overstreet2024-01-011-25/+35
| | | | | | On transaction restart, we weren't re-validating the hole we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: use swab40 for bch_backpointer.bucket_offset bitfieldBrian Foster2023-11-041-9/+0
| | | | | | | | | | | | | The bucket_offset field of bch_backpointer is a 40-bit bitfield, but the bch2_backpointer_swab() helper uses swab32. This leads to inconsistency when an on-disk fs is accessed from an opposite endian machine. As it turns out, we already have an internal swab40() helper that is used from the bch_alloc_v4 swab callback. Lift it into the backpointers header file and use it consistently in both places. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: byte order swap bch_alloc_v4.fragmentation_lru fieldBrian Foster2023-11-041-0/+1
| | | | | | | | | | | | | | | | | | A simple test to populate a filesystem on one CPU architecture and fsck on an arch of the opposite byte order produces errors related to the fragmentation LRU. This occurs because the 64-bit fragmentation_lru field is not byte-order swapped when reads detect that the on-disk/bset key values were written in opposite byte-order of the current CPU. Update the bch2_alloc_v4 swab callback to handle fragmentation_lru as is done for other multi-byte fields. This doesn't affect existing filesystems when accessed by CPUs of the same endianness because the ->swab() callback is only called when the bset flags indicate an endianness mismatch between the CPU and on-disk data. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Ensure copygc does not spinKent Overstreet2023-11-041-0/+11
| | | | | | | If copygc does no work - finds no fragmented buckets - wait for a bit of IO to happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Enumerate fsck errorsKent Overstreet2023-11-011-74/+84
| | | | | | | | | | | | | This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_id_str()Kent Overstreet2023-10-311-3/+3
| | | | | | | Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Correctly initialize new buckets on device resizeKent Overstreet2023-10-221-9/+12
| | | | | | | | | | bch2_dev_resize() was never updated for the allocator rewrite with persistent freelists, and it wasn't noticed because the tests weren't running fsck - oops. Fix this by running bch2_dev_freespace_init() for the new buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: New superblock section members_v2Hunter Shaffer2023-10-221-1/+1
| | | | | | | | | | | members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Heap allocate btree_transKent Overstreet2023-10-221-73/+60
| | | | | | | | | | We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix W=12 build errorsKent Overstreet2023-10-221-9/+8
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix -Wformat in bch2_bucket_gens_invalid()Nathan Chancellor2023-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_bucket_gens_invalid() due to use of an incorrect format specifier: fs/bcachefs/alloc_background.c:530:10: error: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Werror,-Wformat] 529 | prt_printf(err, "bad val size (%lu != %zu)", | ~~~ | %zu 530 | bkey_val_bytes(k.k), sizeof(struct bch_bucket_gens)); | ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf' 223 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %lu but on 32-bit architectures, size_t is 'unsigned int'. Use '%zu', the format specifier for 'size_t', to eliminate the warning. Fixes: 4be0d766a7e9 ("bcachefs: bucket_gens btree") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix -Wformat in bch2_alloc_v4_invalid()Nathan Chancellor2023-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_alloc_v4_invalid() due to use of an incorrect format specifier: fs/bcachefs/alloc_background.c:246:30: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat] 245 | prt_printf(err, "bad val size (%u > %lu)", | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | %u 246 | alloc_v4_u64s(a.v), bkey_val_u64s(k.k)); | ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey.h:58:27: note: expanded from macro 'bkey_val_u64s' 58 | #define bkey_val_u64s(_k) ((_k)->u64s - BKEY_U64s) | ^ fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf' 223 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ This expression is of type 'size_t'. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %lu but on 32-bit architectures, size_t is 'unsigned int'. Use '%zu', the format specifier for 'size_t' to eliminate the warning. Fixes: 11be8e8db283 ("bcachefs: New on disk format: Backpointers") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: __bch2_btree_insert() -> bch2_btree_insert_trans()Kent Overstreet2023-10-221-3/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Convert more code to bch_err_msg()Kent Overstreet2023-10-221-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill stripe check in bch2_alloc_v4_invalid()Kent Overstreet2023-10-221-5/+0
| | | | | | | | | Since we set bucket data type to BCH_DATA_stripe based on the data pointer, not just the stripe pointer, it doesn't make sense to check for no stripe in the .key_invalid method - this is a situation that shouldn't happen, but our other fsck/repair code handles it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Always check alloc data typeKent Overstreet2023-10-221-59/+42
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Assorted fixes for clangKent Overstreet2023-10-221-31/+1
| | | | | | | clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Enumerate recovery passesKent Overstreet2023-10-221-6/+6
| | | | | | | | | | | | | | | | | | | | | Recovery and fsck have many different passes/jobs to do, which always run in the same order - but not all of them run all the time. Some are for fsck, some for unclean shutdown, some for version upgrades. This adds some new structure: a defined list of recovery passes that we can run in a loop, as well as consolidating the log messages. The main benefit is consolidating the "should run this recovery pass" logic, as well as cleaning up the "this recovery pass has finished" state; instead of having a bunch of ad-hoc state bits in c->flags, we've now got c->curr_recovery_pass. By consolidating the "should run this recovery pass" logic, in the future on disk format upgrades will be able to say "upgrading to this version requires x passes to run", instead of forcing all of fsck to run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill bch2_bucket_gens_read()Kent Overstreet2023-10-221-56/+44
| | | | | | | | | | This folds bch2_bucket_gens_read() into bch2_alloc_read(), doing the version check there. This is prep work for enumarating all recovery passes: we need some cleanup first to make calling all the recovery passes consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE()Kent Overstreet2023-10-221-2/+1
| | | | | | | | | | Version upgrades are not atomic operations: when we do a version upgrade we need to update the superblock before we start using new features, and then when the upgrade completes we need to update the superblock again. This adds a new superblock field so we can detect and handle incomplete version upgrades. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>