summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Run merges at BCH_WATERMARK_btreeKent Overstreet2024-04-131-0/+6
| | | | | | | | This fixes a deadlock where the interior update path during journal replay ends up doing a ton of merges on the backpointers btree, and deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix missing write refs in fs fio pathsKent Overstreet2024-04-133-14/+23
| | | | | | bch2_journal_flush_seq requires us to have a write ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix deadlock in journal replayKent Overstreet2024-04-131-3/+4
| | | | | | | | btree_key_can_insert_cached() should be checking the watermark - BCH_TRANS_COMMIT_journal_replay really means nonblocking mode when watermark < reclaim, it was being used incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Go rw if running any explicit recovery passesKent Overstreet2024-04-131-1/+1
| | | | | | | This fixes a bug where we fail to start when upgrading/downgrading because we forgot we needed to go rw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Standardize helpers for printing enum strs with bounds checksKent Overstreet2024-04-1310-56/+69
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: don't queue btree nodes for rewrites during scanKent Overstreet2024-04-131-1/+3
| | | | | | | many nodes found during scan will be old nodes, overwritten by newer nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix race in bch2_btree_node_evict()Kent Overstreet2024-04-131-1/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix unsafety in bch2_stripe_to_text()Kent Overstreet2024-04-132-21/+27
| | | | | | .to_text() functions need to work on key values that didn't pass .valid Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix unsafety in bch2_extent_ptr_to_text()Kent Overstreet2024-04-131-1/+3
| | | | | | Need to check if we have a valid bucket before checking if ptr is stale Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree node scan: handle encrypted nodesKent Overstreet2024-04-131-0/+10
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for packed bkeys that are too bigKent Overstreet2024-04-132-7/+14
| | | | | | add missing validation; fixes assertion pop in bkey unpack Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix UAFs of btree_insert_entry arrayKent Overstreet2024-04-131-13/+14
| | | | | | | | | | | The btree paths array is now dynamically resizable - and as well the btree_insert_entries array, as it needs to be the same size. The merge path (and interior update path) allocates new btree paths, thus can trigger a resize; thus we need to not retain direct pointers after invoking merge; similarly when running btree node triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't use bch2_btree_node_lock_write_nofail() in btree split pathKent Overstreet2024-04-111-15/+26
| | | | | | | | | | It turns out - btree splits happen with the rest of the transaction still locked, to avoid unnecessary restarts, which means using nofail doesn't work here - we can deadlock. Fortunately, we now have the ability to return errors here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()Kent Overstreet2024-04-101-5/+7
| | | | | | | We weren't respecting trans->journal_replay_not_finished - we shouldn't be searching the journal keys unless we have a ref on them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()Kent Overstreet2024-04-101-27/+1
| | | | | | | | | | | dropping read locks in bch2_btree_node_lock_write_nofail() dates from before we had the cycle detector; we can now tell the cycle detector directly when taking a lock may not fail because we can't handle transaction restarts. This is needed for adding should_be_locked asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a race in btree_update_nodes_written()Kent Overstreet2024-04-101-3/+7
| | | | | | | | | | One btree update might have terminated in a node update, and then while it is in flight another btree update might free that original node. This race has to be handled in btree_update_nodes_written() - we were missing a READ_ONCE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_node_scan: Respect member.data_allowedKent Overstreet2024-04-091-0/+3
| | | | | | If a device wasn't used for btree nodes, no need to scan for them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't scan for btree nodes when we can reconstructKent Overstreet2024-04-094-18/+29
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix check_topology() when using node scanKent Overstreet2024-04-091-1/+1
| | | | | | | shoot down journal keys _before_ populating journal keys with pointers to scanned nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix eytzinger0_find_gt()Kent Overstreet2024-04-081-6/+20
| | | | | | | | | - fix return types: promoting from unsigned to ssize_t does not do what we want here, and was pointless since the rest of the eytzinger code is u32 - nr, not size Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix bch2_get_acl() transaction restart handlingKent Overstreet2024-04-071-16/+14
| | | | | | | bch2_acl_from_disk() uses allocate_dropping_locks, and can thus return a transaction restart - this wasn't handled. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu listHongbo Li2024-04-071-0/+2
| | | | | | | | | | | | | | | | When allocating bkey_cached from bc->freed_pcpu list, it missed decreasing the count of nr_freed_pcpu which would cause the mismatch between the value of nr_freed_pcpu and the list items. This problem also exists in moving new bkey_cached to bc->freed_pcpu list. If these happened, the bug info may appear in bch2_fs_btree_key_cache_exit by the follow code: BUG_ON(list_count_nodes(&bc->freed_pcpu) != bc->nr_freed_pcpu); BUG_ON(list_count_nodes(&bc->freed_nonpcpu) != bc->nr_freed_nonpcpu); Fixes: c65c13f0eac6 ("bcachefs: Run btree key cache shrinker less aggressively") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()Kent Overstreet2024-04-071-10/+45
| | | | | | | | | | | | | Multiple bug fixes for journal iters: - When the journal keys gap buffer is resized, we have to adjust the iterators for moving the gap to the end - We don't want to rewind iterators to point to the key we just inserted if it's not for the correct btree/level Also, add some new assertions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename struct field swap to prevent macro naming collisionThorsten Blum2024-04-061-4/+4
| | | | | | | | The struct field swap can collide with the swap() macro defined in linux/minmax.h. Rename the struct field to prevent such collisions. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* MAINTAINERS: Add entry for bcachefs documentationBagas Sanjaya2024-04-061-0/+1
| | | | | | | | Now that bcachefs docs exist in Documentation/filesystems/bcachefs/, cover it in MAINTAINERS entry for the filesystem. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* Documentation: filesystems: Add bcachefs toctreeBagas Sanjaya2024-04-062-0/+12
| | | | | | | | | | | | | | Commit eb386617be4bdf ("bcachefs: Errcode tracepoint, documentation") adds initial bcachefs documentation (private error codes) but without any table of contents tree for the filesystem docs, hence Sphinx warns: Documentation/filesystems/bcachefs/errorcodes.rst: WARNING: document isn't included in any toctree Add bcachefs toctree to fix above warning. Fixes: eb386617be4b ("bcachefs: Errcode tracepoint, documentation") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: JOURNAL_SPACE_LOWKent Overstreet2024-04-065-12/+19
| | | | | | | | | | | | | | | | | | "bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant performance regression (nearly 50%) on multithreaded random writes with fio. The reason is that the journal watermark checks multiple things, including the state of the btree write buffer, and on multithreaded update heavy workloads we're bottleneked on write buffer flushing - we don't want kicknig off btree updates to depend on the state of the write buffer. This isn't strictly correct; the interior btree update path does do write buffer updates, but it's a tiny fraction of total accounting updates and we're more concerned with space in the journal itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINEKent Overstreet2024-04-061-0/+4
| | | | | | | | | | | | | | | | BCH_IOCTL_FSCK_OFFLINE allows the userspace fsck tool to use the kernel implementation of fsck - primarily when the kernel version is a better version match. It should look and act exactly like the normal userspace fsck that the user expected to be invoking, so errors should never result in a kernel panic. We may want to consider further restricting errors=panic - it's only intended for debugging in controlled test environments, it should have no purpose it normal usage. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystemsKent Overstreet2024-04-061-44/+50
| | | | | | | | | | To open an encrypted filesystem, we use request_key() to get the encryption key from the user's keyring - but request_key() needs to happen in the context of the process that invoked the ioctl. This easily fixed by using bch2_fs_open() in nostart mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix rand_delete unit testKent Overstreet2024-04-051-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix ! vs ~ typo in __clear_bit_le64()Dan Carpenter2024-04-051-1/+1
| | | | | | | | | The ! was obviously intended to be ~. As it is, this function does the equivalent to: "addr[bit / 64] = 0;". Fixes: 27fcec6c27ca ("bcachefs: Clear recovery_passes_required as they complete without errors") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix rebalance from durability=0 deviceKent Overstreet2024-04-051-4/+13
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Print shutdown journal sequence numberKent Overstreet2024-04-041-0/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Further improve btree_update_to_text()Kent Overstreet2024-04-042-55/+48
| | | | | | Print start and end level of the btree update; also a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move btree_updates to debugfsKent Overstreet2024-04-042-9/+42
| | | | | | | sysfs is limited to PAGE_SIZE, and when we're debugging strange deadlocks/priority inversions we need to see the full list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Bump limit in btree_trans_too_many_iters()Kent Overstreet2024-04-042-1/+15
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make snapshot_is_ancestor() safeKent Overstreet2024-04-041-6/+13
| | | | | | | Snapshot table accesses generally need to be checking for invalid snapshot ID now, fix one that was missed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: create debugfs dir for each btreeThomas Bertschinger2024-04-031-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This creates a subdirectory for each individual btree under the btrees/ debugfs directory. Directory structure, before: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc ├── alloc-bfloat-failed ├── alloc-formats ├── backpointers ├── backpointers-bfloat-failed ├── backpointers-formats ... Directory structure, after: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc │   ├── bfloat-failed │   ├── formats │   └── keys ├── backpointers │   ├── bfloat-failed │   ├── formats │   └── keys ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: reconstruct_inode()Kent Overstreet2024-04-031-2/+50
| | | | | | | If an inode is missing, but corresponding extents and dirent still exist, it's well worth recreating it - this does so. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Subvolume reconstructionKent Overstreet2024-04-031-19/+148
| | | | | | We can now recreate missing subvolumes from dirents and/or inodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for extents that point to same spaceKent Overstreet2024-04-032-8/+168
| | | | | | | | | In backpointer repair, if we get a missing backpointer - but there's already a backpointer that points to an existing extent - we've got multiple extents that point to the same space and need to decide which to keep. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Reconstruct missing snapshot nodesKent Overstreet2024-04-036-6/+199
| | | | | | | | When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Flag btrees with missing dataKent Overstreet2024-04-036-5/+44
| | | | | | | We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Topology repair now uses nodes found by scanning to fill holesKent Overstreet2024-04-032-107/+199
| | | | | | | | | | | | | With the new btree node scan code, we can now recover from corrupt btree roots - simply create a new fake root at depth 1, and then insert all the leaves we found. If the root wasn't corrupt but there's corruption elsewhere in the btree, we can fill in holes as needed with the newest version of a given node(s) from the scan; we also check if a given btree node is older than what we found from the scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Repair pass for scanning for btree nodesKent Overstreet2024-04-0312-51/+605
| | | | | | | | | | | | | | | If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't skip fake btree roots in fsckKent Overstreet2024-04-031-3/+0
| | | | | | | When a btree root is unreadable, we might still have keys fro the journal to walk and mark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()Kent Overstreet2024-04-033-7/+7
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Etyzinger cleanupsKent Overstreet2024-04-037-182/+285
| | | | | | | | Pull out eytzinger.c and kill eytzinger_cmp_fn. We now provide eytzinger0_sort and eytzinger0_sort_r, which use the standard cmp_func_t and cmp_r_func_t callbacks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_shoot_down_journal_keys()Kent Overstreet2024-04-033-10/+35
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Clear recovery_passes_required as they complete without errorsKent Overstreet2024-04-033-12/+43
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>