summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/btree_update_interior.c
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Include btree_trans in more tracepointsKent Overstreet2024-01-011-17/+18
| | | | | | | This gives us more context information - e.g. which codepath is invoking btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: convert bch_fs_flags to x-macroKent Overstreet2024-01-011-2/+2
| | | | | | | Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename BTREE_INSERT flagsKent Overstreet2024-01-011-8/+8
| | | | | | | BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill dead BTREE_INSERT flagsKent Overstreet2024-01-011-10/+4
| | | | | | | BTREE_INSERT_NOWAIT and BTREE_INSERT_GC_LOCK_HELD are no longer used, and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Journal pins must always have a flush_fnKent Overstreet2024-01-011-3/+18
| | | | | | | flush_fn is how we identify journal pins in debugfs - this is a debugging aid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs; guard against overflow in btree node splitKent Overstreet2023-12-191-0/+12
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_node_u64s_with_format() takes nr keysKent Overstreet2023-12-191-13/+14
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a journal deadlock in replayKent Overstreet2023-12-041-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently, journal pre-reservations were removed. They were for reserving space ahead of time in the journal for operations that are required for journal reclaim, e.g. btree key cache flushing and interior node btree updates. Instead we have watermarks - only operations for journal reclaim are allowed when the journal is low on space, and in general we're quite good about doing operations in the order that will free up space in the journal quickest when we're low on space. If we're doing a journal reclaim operation out of order, we usually do it in nonblocking mode if it's not freeing up space at the end of the journal. There's an exceptino though - interior btree node update operations have to be BCH_WATERMARK_reclaim - once they've been started, and they can't be nonblocking. Generally this is fine because they'll only be a very small fraction of transaction commits - but there's an exception, which is during journal replay. Journal replay does many btree operations, but doesn't need to commit them to the journal since they're already in the journal. So killing off of pre-reservation, plus another change to make journal replay more efficient by initially doing the replay in sorted btree order, made it possible for the interior update operations replay generates to fill and deadlock the journal. Fix this by introducing a new check on journal space at the _start_ of an interior update operation. This causes us to block if necessary in exactly the same way as we used to when interior updates took a journal pre-reservaiton, but without all the expensive accounting pre-reservations required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix race between btree writes and metadata dropKent Overstreet2023-11-281-0/+4
| | | | | | | | | | | | | | | btree writes update the btree node key after every write, in order to update sectors_written, and they also might need to drop pointers if one of the writes failed in a replicated btree node. But the btree node might also have had a pointer dropped while the write was in flight, by bch2_dev_metadata_drop(), and thus there was a bug where the btree node write would ovewrite the btree node's key with what it had at the start of the write. Fix this by dropping pointers not currently in the btree node key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix split_race livelockKent Overstreet2023-11-281-1/+5
| | | | | | | | | | | bch2_btree_update_start() calculates which nodes are going to have to be split/rewritten, so that we know how many nodes to reserve and how deep in the tree we have to take locks. But btree node merges require inserting two keys into the parent node, not just splits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* closures: CLOSURE_CALLBACK() to fix type punningKent Overstreet2023-11-241-2/+2
| | | | | | | | | | | | | | | | | | | | Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <keescook@chromium.org> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill journal pre-reservationsKent Overstreet2023-11-141-30/+0
| | | | | | | | | | This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't iterate over journal entries just for btree rootsKent Overstreet2023-11-051-9/+3
| | | | | | Small performance optimization, and a bit of a code cleanup too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix build errors with gcc 10Kent Overstreet2023-11-041-1/+1
| | | | | | | | | | | | gcc 10 seems to complain about array bounds in situations where gcc 11 does not - curious. This unfortunately requires adding some casts for now; we may investigate getting rid of our __u64 _data[] VLA in a future patch so that our start[0] members can be VLAs. Reported-by: John Stoffel <john@stoffel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't downgrade locks on transaction restartKent Overstreet2023-11-011-1/+1
| | | | | | | | We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Enumerate fsck errorsKent Overstreet2023-11-011-2/+2
| | | | | | | | | | | | | This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Heap allocate btree_transKent Overstreet2023-10-221-18/+17
| | | | | | | | | | We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix W=12 build errorsKent Overstreet2023-10-221-32/+26
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: remove redundant initialization of pointer dColin Ian King2023-10-221-1/+1
| | | | | | | | | | | | | The pointer d is being initialized with a value that is never read, it is being re-assigned later on when it is used in a for-loop. The initialization is redundant and can be removed. Cleans up clang-scan build warning: fs/bcachefs/buckets.c:1303:25: warning: Value stored to 'd' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Convert more code to bch_err_msg()Kent Overstreet2023-10-221-3/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix silent enum conversion errorKent Overstreet2023-10-221-7/+7
| | | | | | | | | | | | This changes mark_btree_node_locked() to take an enum btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED is -1 which may cause problems converting back and forth to six_lock_type if short enums are in use. With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type enum. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't open code closure_nr_remaining()Kent Overstreet2023-10-221-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_journal_iter.cKent Overstreet2023-10-221-1/+1
| | | | | | | | | Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Assorted fixes for clangKent Overstreet2023-10-221-2/+2
| | | | | | | clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a write buffer flush deadlockKent Overstreet2023-10-221-0/+11
| | | | | | | | | | | | | | | | We're not supposed to block if BTREE_INSERT_JOURNAL_RECLAIM && watermark != BCH_WATERMARK_reclaim. This should really be a separate BTREE_INSERT_NONBLOCK flag - add some comments to that effect, it's not important for this patch. btree write buffer flush depends on this behaviour though - the first loop tries to flush sequentially, which doesn't free up space in the journal optimally. If that can't proceed we bail out and flush in journal order - that won't work if we're blocked instead of returning an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Assorted sparse fixesKent Overstreet2023-10-221-1/+1
| | | | | | | | | - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Allow for unknown btree IDsKent Overstreet2023-10-221-9/+9
| | | | | | | | | | | | | | | | | We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill BTREE_INSERT_USE_RESERVEKent Overstreet2023-10-221-29/+27
| | | | | | | Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a null ptr deref in bch2_fs_alloc() error pathKent Overstreet2023-10-221-1/+4
| | | | | | | This fixes a null ptr deref in bch2_free_pending_node_rewrites() when the list head wasn't initialized. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill JOURNAL_WATERMARKKent Overstreet2023-10-221-3/+3
| | | | | | | This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename enum alloc_reserve -> bch_watermarkKent Overstreet2023-10-221-3/+3
| | | | | | This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bch2_btree_update_start()Kent Overstreet2023-10-221-1/+1
| | | | | | | | The calculation for number of nodes to allocate in bch2_btree_update_start() was incorrect - this fixes a BUG_ON() on the small nodes test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Delete weird hacky transaction restart injectionKent Overstreet2023-10-221-3/+0
| | | | | | | | | | | since we currently don't have a good fault injection library, bch2_btree_insert_node() was randomly injecting faults based on local_clock(). At the very least this should have been a debug mode only thing, but this is a brittle method so let's just delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: More drop_locks_do() conversionsKent Overstreet2023-10-221-8/+4
| | | | | | | Using drop_locks_do() ensures that every unlock() is paired with a relock(), with proper error checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: drop_locks_do()Kent Overstreet2023-10-221-6/+2
| | | | | | | | | Add a new helper for the common pattern of: - trans_unlock() - do something - trans_relock() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: GFP_NOIO -> GFP_NOFSKent Overstreet2023-10-221-1/+1
| | | | | | | | GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* six locks: Kill six_lock_state unionKent Overstreet2023-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested by Linus, this drops the six_lock_state union in favor of raw bitmasks. On the one hand, bitfields give more type-level structure to the code. However, a significant amount of the code was working with six_lock_state as a u64/atomic64_t, and the conversions from the bitfields to the u64 were deemed a bit too out-there. More significantly, because bitfield order is poorly defined (#ifdef __LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the sequence number would overflow into the rest of the bitfield if the compiler didn't put the sequence number at the high end of the word. The new code is a bit saner when we're on an architecture without real atomic64_t support - all accesses to lock->state now go through atomic64_*() operations. On architectures with real atomic64_t support, we additionally use atomic bit ops for setting/clearing individual bits. Text size: 7467 bytes -> 4649 bytes - compilers still suck at bitfields. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve trans_restart_split_race tracepointKent Overstreet2023-10-221-2/+2
| | | | | | | Seeing occasional test failures where we get stuck in a livelock that involves this event - this will help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bch2_extent_fallocate() in nocow modeKent Overstreet2023-10-221-0/+2
| | | | | | | | | | | When we allocate disk space, we need to be incrementing the WRITE io clock, which perhaps should be renamed to sectors allocated - copygc uses this io clock to know when to run. Also, we should be incrementing the same clock when allocating btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Private error codes: ENOMEMKent Overstreet2023-10-221-3/+6
| | | | | | | This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Drop some anonymous structs, unionsKent Overstreet2023-10-221-2/+2
| | | | | | | | | | Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BKEY_PADDED_ONSTACK()Kent Overstreet2023-10-221-1/+1
| | | | | | | | Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED() only needs the anonymous struct when it's used on the stack, to guarantee layout, not when it's embedded in another struct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Plumb btree_trans through btree cache codeKent Overstreet2023-10-221-4/+11
| | | | | | | | Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned device support is going to require a new allocation for every btree node write. This is a bit of prep work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add tracepoint & counter for btree split raceKent Overstreet2023-10-221-1/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_journal_entries_postprocess()Kent Overstreet2023-10-221-10/+5
| | | | | | | | This brings back journal_entries_compact(), but in a more efficient form - we need to do multiple postprocess steps, so iterate over the journal entries being written just once to make it more efficient. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Handle btree node rewrites before going RWKent Overstreet2023-10-221-7/+58
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add some logging for btree node rewrites due to errorsKent Overstreet2023-10-221-3/+20
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Debug mode for c->writes referencesKent Overstreet2023-10-221-3/+3
| | | | | | | | | | This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix btree_node_write_blocked() not being clearedKent Overstreet2023-10-221-0/+3
| | | | | | | | | The btree_node_write_blocked bit was a later addition to this code, it only mirrors the state of the b->write_blocked list (empty or nonempty) - unfortunately, when it was added it wasn't correctly kept in sync - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve btree_reserve_get_fail tracepointKent Overstreet2023-10-221-1/+2
| | | | | | Now we include the return code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>