summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/btree_trans_commit.c
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: s/bkey_invalid_flags/bch_validate_flagsKent Overstreet2024-05-091-5/+5
| | | | | | | We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: x-macroize journal flags enumsKent Overstreet2024-05-081-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_trans_verify_not_unlocked()Kent Overstreet2024-05-081-0/+7
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_trans_commit_flags_to_text()Kent Overstreet2024-05-081-0/+21
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet2024-05-081-12/+12
| | | | | | | | | | | Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: prt_printf() now respects \r\n\tKent Overstreet2024-05-081-4/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix deadlock in journal replayKent Overstreet2024-04-131-3/+4
| | | | | | | | btree_key_can_insert_cached() should be checking the watermark - BCH_TRANS_COMMIT_journal_replay really means nonblocking mode when watermark < reclaim, it was being used incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix UAFs of btree_insert_entry arrayKent Overstreet2024-04-131-13/+14
| | | | | | | | | | | The btree paths array is now dynamically resizable - and as well the btree_insert_entries array, as it needs to be the same size. The merge path (and interior update path) allocates new btree paths, thus can trigger a resize; thus we need to not retain direct pointers after invoking merge; similarly when running btree node triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BCH_WATERMARK_interior_updatesKent Overstreet2024-04-011-1/+2
| | | | | | | | | | | | This adds a new watermark, higher priority than BCH_WATERMARK_reclaim, for interior btree updates. We've seen a deadlock where journal replay triggers a ton of btree node merges, and these use up all available open buckets and then interior updates get stuck. One cause of this is that we're currently lacking btree node merging on write buffer btrees - that needs to be fixed as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add checks for invalid snapshot IDsKent Overstreet2024-03-311-1/+1
| | | | | | | | | | | Previously, we assumed that keys were consistent with the snapshots btree - but that's not correct as fsck may not have been run or may not be complete. This adds checks and error handling when using the in-memory snapshots table (that mirrors the snapshots btree). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add missing __GFP_NOWARNKent Overstreet2024-01-211-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Prep work for variable size btree node buffersKent Overstreet2024-01-211-6/+3
| | | | | | | | | | | | | | | | | bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_trans_account_disk_usage_change()Kent Overstreet2024-01-211-0/+5
| | | | | | | | | | | The disk space accounting rewrite is splitting out accounting for each replicas set - those are moving to btree keys, instead of percpu counters. This breaks bch2_trans_fs_usage_apply() up, splitting out the part we will still need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BTREE_TRIGGER_ATOMICKent Overstreet2024-01-211-12/+7
| | | | | | Add a new flag to be explicit about when we're running atomic triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Combine .trans_trigger, .atomic_triggerKent Overstreet2024-01-051-18/+13
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: mark now takes bkey_sKent Overstreet2024-01-051-3/+3
| | | | | | | | Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: trans_mark now takes bkey_sKent Overstreet2024-01-051-2/+2
| | | | | | | | Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check journal entries for invalid keys in trans commit pathKent Overstreet2024-01-051-0/+39
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix interior update path btree_path usesKent Overstreet2024-01-011-7/+5
| | | | | | | | | | | Since the btree_paths array is now about to become growable, we have to be careful not to refer to paths by pointer across contexts where they may be reallocated. This fixes the remaining btree_interior_update() paths - split and merge. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Clean up btree_transKent Overstreet2024-01-011-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_insert_entry -> btree_path_idx_tKent Overstreet2024-01-011-30/+37
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: trans_for_each_update() now declares loop iterKent Overstreet2024-01-011-16/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: kill btree_trans->wb_updatesKent Overstreet2024-01-011-18/+0
| | | | | | the btree write buffer path now creates a journal entry directly Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree write buffer now slurps keys from journalKent Overstreet2024-01-011-51/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve trans->extra_journal_entriesKent Overstreet2024-01-011-11/+9
| | | | | | | | | | | | Instead of using a darray, we now allocate journal entries for the transaction commit path with our normal bump allocator - with an inlined fastpath, and using btree_transaction_stats to remember how much to initially allocate so as to avoid transaction restarts. This is prep work for converting write buffer updates to use this mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_write_buffer_flush_locked()Kent Overstreet2024-01-011-2/+4
| | | | | | | Minor refactoring - improved naming, and move the responsibility for flush_lock to the caller instead of having it be shared. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Clean up btree write buffer write ref handlingKent Overstreet2024-01-011-4/+2
| | | | | | | | | | | | | __bch2_btree_write_buffer_flush() now assumes a write ref is already held (as called by the transaction commit path); and the wrappers bch2_write_buffer_flush() and flush_sync() take an explicit write ref. This means internally the write buffer code can always use BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags around and hoping the NOCHECK_RW flag was always carried around correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: convert bch_fs_flags to x-macroKent Overstreet2024-01-011-4/+4
| | | | | | | Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename BTREE_INSERT flagsKent Overstreet2024-01-011-19/+20
| | | | | | | BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill dead BTREE_INSERT flagsKent Overstreet2024-01-011-4/+0
| | | | | | | BTREE_INSERT_NOWAIT and BTREE_INSERT_GC_LOCK_HELD are no longer used, and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill BTREE_UPDATE_PREJOURNALKent Overstreet2024-01-011-6/+1
| | | | | | | | | | With the previous patch that reworks BTREE_INSERT_JOURNAL_REPLAY, we can now switch the btree write buffer to use it for flushing. This has the advantage that transaction commits don't need to take a journal reservation at all. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res"Kent Overstreet2024-01-011-4/+15
| | | | | | | | | | | This slightly changes how trans->journal_res works, in preparation for changing the btree write buffer flush path to use it. Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal reservation; trans->journal_res.seq already refers to the journal sequence number to pin". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Clear k->needs_whitout earlier in commit pathKent Overstreet2024-01-011-2/+2
| | | | | | | | The upcoming btree write buffer rework is going to use the journal itself as the first stage of the write buffer; this is a cleanup to make sure k->needs_whiteout is initialized before keys hit the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Journal pins must always have a flush_fnKent Overstreet2024-01-011-1/+8
| | | | | | | flush_fn is how we identify journal pins in debugfs - this is a debugging aid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix warning when building in userspaceKent Overstreet2024-01-011-2/+1
| | | | | | bch_err() doesn't reference the fs arg in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill journal pre-reservationsKent Overstreet2023-11-141-34/+2
| | | | | | | | | | This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make sure to drop/retake btree locks before reclaimKent Overstreet2023-11-131-6/+42
| | | | | | | | | | | | We really don't want to be invoking memory reclaim with btree locks held: even aside from (solvable, but tricky) recursion issues, it can cause painful to diagnose performance edge cases. This fixes a recently reported issue in btree_key_can_insert_cached(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: Mateusz Guzik <mjguzik@gmail.com> Fixes: https://lore.kernel.org/linux-bcachefs/CAGudoHEsb_hGRMeWeXh+UF6po0qQuuq_NKSEo+s1sEb6bDLjpA@mail.gmail.com/T/
* bcachefs: btree_trans->write_lockedKent Overstreet2023-11-131-36/+49
| | | | | | | | As prep work for the next patch to fix a key cache reclaim issue, we need to start tracking whether we're currently holding write locks - so that we can release and retake the before calling into memory reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: rebalance_work btree is not a snapshots btreeKent Overstreet2023-11-041-0/+1
| | | | | | | | | | | rebalance_work entries may refer to entries in the extents btree, which is a snapshots btree, or they may also refer to entries in the reflink btree, which is not. Hence rebalance_work keys may use the snapshot field but it's not required to be nonzero - add a new btree flag to reflect this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix build errors with gcc 10Kent Overstreet2023-11-041-3/+3
| | | | | | | | | | | | gcc 10 seems to complain about array bounds in situations where gcc 11 does not - curious. This unfortunately requires adding some casts for now; we may investigate getting rid of our __u64 _data[] VLA in a future patch so that our start[0] members can be VLAs. Reported-by: John Stoffel <john@stoffel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't downgrade locks on transaction restartKent Overstreet2023-11-011-6/+3
| | | | | | | | We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: All triggers are BTREE_TRIGGER_WANTS_OLD_AND_NEWKent Overstreet2023-10-311-4/+2
| | | | | | | | Upcoming rebalance_work btree will require extent triggers to be BTREE_TRIGGER_WANTS_OLD_AND_NEW - so to reduce potential confusion, let's just make all triggers BTREE_TRIGGER_WANTS_OLD_AND_NEW. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix btree_node_type enumKent Overstreet2023-10-311-6/+5
| | | | | | | | More forwards compatibility fixups: having BKEY_TYPE_btree at the end of the enum conflicts with unnkown btree IDs, this shifts BKEY_TYPE_btree to slot 0 and fixes things up accordingly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_id_str()Kent Overstreet2023-10-311-1/+1
| | | | | | | Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Always check for invalid bkeys in main commit pathKent Overstreet2023-10-221-4/+0
| | | | | | | | | | Previously, we would check for invalid bkeys at transaction commit time, but only if CONFIG_BCACHEFS_DEBUG=y. This check is important enough to always be on - it appears there's been corruption making it into the journal that would have been caught by it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Heap allocate btree_transKent Overstreet2023-10-221-5/+3
| | | | | | | | | | We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix W=12 build errorsKent Overstreet2023-10-221-5/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Put bkey invalid check in commit path in a more useful placeKent Overstreet2023-10-221-19/+19
| | | | | | | | When doing updates early in recovery, before we can go RW, we still want to check that keys are valid at commit time - this moves key invalid checking to before the "btree updates to journal" path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a double free on invalid bkeyKent Overstreet2023-10-221-1/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix btree write buffer with snapshots btreesKent Overstreet2023-10-221-0/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>