summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Fix deadlock in journal write pathKent Overstreet2024-04-201-18/+42
| | | | | | | | | | | | | | bch2_journal_write() was incorrectly waiting on earlier journal writes synchronously; this usually worked because most of the time we'd be running in the context of a thread that did a journal_buf_put(), but sometimes we'd be running out of the same workqueue that completes those prior journal writes. Additionally, this makes sure to punt to a workqueue before submitting preflushes - we really don't want to be calling submit_bio() in the main transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Tweak btree key cache shrinker so it actually freesKent Overstreet2024-04-201-15/+4
| | | | | | | | | | | | | | | | | | | | | Freeing key cache items is a multi stage process; we need to wait for an SRCU grace period to elapse, and we handle this ourselves - partially to avoid callback overhead, but primarily so that when allocating we can first allocate from the freed items waiting for an SRCU grace period. Previously, the shrinker was counting the items on the 'waiting for SRCU grace period' lists as items being scanned, but this meant that too many items waiting for an SRCU grace period could prevent it from doing any work at all. After this, we're seeing that items skipped due to the accessed bit are the main cause of the shrinker not making any progress, and we actually want the key cache shrinker to run quite aggressively because reclaimed items will still generally be found (more compactly) in the btree node cache - so we also tweak the shrinker to not count those against nr_to_scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulongKent Overstreet2024-04-201-1/+1
| | | | | | | | this stores the SRCU sequence number, which we use to check if an SRCU barrier has elapsed; this is a partial fix for the key cache shrinker not actually freeing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix missing call to bch2_fs_allocator_background_exit()Kent Overstreet2024-04-201-0/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for journal entries overruning end of sb clean sectionKent Overstreet2024-04-202-1/+10
| | | | | | | | | | | Fix a missing bounds check in superblock validation. Note that we don't yet have repair code for this case - repair code for individual items is generally low priority, since the whole superblock is checksummed, validated prior to write, and we have backups. Reported-by: lei lu <llfamsec@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bio alloc in check_extent_checksum()Kent Overstreet2024-04-171-1/+1
| | | | | | if the buffer is virtually mapped it won't be a single bvec Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix leak in bch2_gc_write_reflink_keyKent Overstreet2024-04-171-1/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: KEY_TYPE_error is allowed for reflinkKent Overstreet2024-04-171-1/+2
| | | | | | | | KEY_TYPE_error is left behind when we have to delete all pointers in an extent in fsck; it allows errors to be correctly returned by reads later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shiftKent Overstreet2024-04-172-5/+5
| | | | | Fixes: 27c15ed297cb bcachefs: bch_member.btree_allocated_bitmap Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: make sure to release last journal pin in replayKent Overstreet2024-04-161-1/+4
| | | | | | | This fixes a deadlock when journal replay has many keys to insert that were from fsck, not the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: node scan: ignore multiple nodes with same seq if interiorKent Overstreet2024-04-161-0/+2
| | | | | | | Interior nodes are not really needed, when we have to scan - but if this pops up for leaf nodes we'll need a real heuristic. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix format specifier in validate_bset_keys()Nathan Chancellor2024-04-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building for 32-bit platforms, for which size_t is 'unsigned int', there is a warning from a format string in validate_bset_keys(): fs/bcachefs/btree_io.c: In function 'validate_bset_keys': fs/bcachefs/btree_io.c:891:34: error: format '%lu' expects argument of type 'long unsigned int', but argument 12 has type 'unsigned int' [-Werror=format=] 891 | "bad k->u64s %u (min %u max %lu)", k->u64s, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/btree_io.c:603:32: note: in definition of macro 'btree_err' 603 | msg, ##__VA_ARGS__); \ | ^~~ fs/bcachefs/btree_io.c:887:21: note: in expansion of macro 'btree_err_on' 887 | if (btree_err_on(!bkeyp_u64s_valid(&b->format, k), | ^~~~~~~~~~~~ fs/bcachefs/btree_io.c:891:64: note: format string is defined here 891 | "bad k->u64s %u (min %u max %lu)", k->u64s, | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as errors BKEY_U64s is size_t so the entire expression is promoted to size_t. Use the '%zu' specifier so that there is no warning regardless of the width of size_t. Fixes: 031ad9e7dbd1 ("bcachefs: Check for packed bkeys that are too big") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404130747.wH6Dd23p-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202404131536.HdAMBOVc-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINEKent Overstreet2024-04-163-3/+19
| | | | | | We need to initialize the stdio redirects before they're used. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: set_btree_iter_dontneed also clears should_be_lockedKent Overstreet2024-04-151-2/+7
| | | | | | | | This is part of a larger series cleaning up the semantics of should_be_locked and adding assertions around it; if we don't need an iterator/path anymore, it clearly doesn't need to be locked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix error path of __bch2_read_super()Chao Yu2024-04-151-2/+5
| | | | | | | | | In __bch2_read_super(), if kstrdup() fails, it needs to release memory in sb->holder, fix to call bch2_free_super() in the error path. Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for backpointer bucket_offset >= bucket sizeKent Overstreet2024-04-143-10/+9
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_member.btree_allocated_bitmapKent Overstreet2024-04-149-6/+131
| | | | | | | | | | | | | | | | | | | | | | | This adds a small (64 bit) per-device bitmap that tracks ranges that have btree nodes, for accelerating btree node scan if it is ever needed. - New helpers, bch2_dev_btree_bitmap_marked() and bch2_dev_bitmap_mark(), for checking and updating the bitmap - Interior btree update path updates the bitmaps when required - The check_allocations pass has a new fsck_err check, btree_bitmap_not_marked - New on disk format version, mi_btree_mitmap, which indicates the new bitmap is present - Upgrade table lists the required recovery pass and expected fsck error - Btree node scan uses the bitmap to skip ranges if we're on the new version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: sysfs internal/trigger_journal_flushKent Overstreet2024-04-141-1/+10
| | | | | | Add a sysfs knob for immediately flushing the entire journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix bch2_btree_node_fill() for !pathKent Overstreet2024-04-141-26/+18
| | | | | | | We shouldn't be doing the unlock/relock dance when we're not using a path - this fixes an assertion pop when called from btree node scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add safety checks in bch2_btree_node_fill()Kent Overstreet2024-04-141-1/+24
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Interior known are required to have known key typesKent Overstreet2024-04-141-1/+2
| | | | | | | | For forwards compatibilyt, we allow bkeys of unknown type in leaf nodes; we can simply ignore metadata we don't understand. Pointers to btree nodes must always be of known types, howwever. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add missing bounds check in __bch2_bkey_val_invalid()Kent Overstreet2024-04-141-1/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix btree node merging on write buffer btreesKent Overstreet2024-04-131-2/+12
| | | | | | | | The btree write buffer flush fastpath that avoids the main transaction commit path had the unfortunate side effect of not doing btree node merging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Disable merges from interior update pathKent Overstreet2024-04-131-0/+10
| | | | | | | | | | | | | | | | | | There's been a bug in the btree write buffer where it wasn't triggering btree node merges - and leaving behind a bunch of nearly empty btree nodes. Then during journal replay, when updates to the backpointers btree aren't using the btree write buffer (because we require synchronization with journal replay), we end up doing those merges all at once. Then if it's the interior update path running them, we deadlock because those run with the highest watermark. There's no real need for the interior update path to be doing btree node merges; other code paths can handle that at lower watermarks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Run merges at BCH_WATERMARK_btreeKent Overstreet2024-04-131-0/+6
| | | | | | | | This fixes a deadlock where the interior update path during journal replay ends up doing a ton of merges on the backpointers btree, and deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix missing write refs in fs fio pathsKent Overstreet2024-04-133-14/+23
| | | | | | bch2_journal_flush_seq requires us to have a write ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix deadlock in journal replayKent Overstreet2024-04-131-3/+4
| | | | | | | | btree_key_can_insert_cached() should be checking the watermark - BCH_TRANS_COMMIT_journal_replay really means nonblocking mode when watermark < reclaim, it was being used incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Go rw if running any explicit recovery passesKent Overstreet2024-04-131-1/+1
| | | | | | | This fixes a bug where we fail to start when upgrading/downgrading because we forgot we needed to go rw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Standardize helpers for printing enum strs with bounds checksKent Overstreet2024-04-1310-56/+69
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: don't queue btree nodes for rewrites during scanKent Overstreet2024-04-131-1/+3
| | | | | | | many nodes found during scan will be old nodes, overwritten by newer nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix race in bch2_btree_node_evict()Kent Overstreet2024-04-131-1/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix unsafety in bch2_stripe_to_text()Kent Overstreet2024-04-132-21/+27
| | | | | | .to_text() functions need to work on key values that didn't pass .valid Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix unsafety in bch2_extent_ptr_to_text()Kent Overstreet2024-04-131-1/+3
| | | | | | Need to check if we have a valid bucket before checking if ptr is stale Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree node scan: handle encrypted nodesKent Overstreet2024-04-131-0/+10
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for packed bkeys that are too bigKent Overstreet2024-04-132-7/+14
| | | | | | add missing validation; fixes assertion pop in bkey unpack Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix UAFs of btree_insert_entry arrayKent Overstreet2024-04-131-13/+14
| | | | | | | | | | | The btree paths array is now dynamically resizable - and as well the btree_insert_entries array, as it needs to be the same size. The merge path (and interior update path) allocates new btree paths, thus can trigger a resize; thus we need to not retain direct pointers after invoking merge; similarly when running btree node triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't use bch2_btree_node_lock_write_nofail() in btree split pathKent Overstreet2024-04-111-15/+26
| | | | | | | | | | It turns out - btree splits happen with the rest of the transaction still locked, to avoid unnecessary restarts, which means using nofail doesn't work here - we can deadlock. Fortunately, we now have the ability to return errors here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()Kent Overstreet2024-04-101-5/+7
| | | | | | | We weren't respecting trans->journal_replay_not_finished - we shouldn't be searching the journal keys unless we have a ref on them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()Kent Overstreet2024-04-101-27/+1
| | | | | | | | | | | dropping read locks in bch2_btree_node_lock_write_nofail() dates from before we had the cycle detector; we can now tell the cycle detector directly when taking a lock may not fail because we can't handle transaction restarts. This is needed for adding should_be_locked asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix a race in btree_update_nodes_written()Kent Overstreet2024-04-101-3/+7
| | | | | | | | | | One btree update might have terminated in a node update, and then while it is in flight another btree update might free that original node. This race has to be handled in btree_update_nodes_written() - we were missing a READ_ONCE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_node_scan: Respect member.data_allowedKent Overstreet2024-04-091-0/+3
| | | | | | If a device wasn't used for btree nodes, no need to scan for them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't scan for btree nodes when we can reconstructKent Overstreet2024-04-094-18/+29
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix check_topology() when using node scanKent Overstreet2024-04-091-1/+1
| | | | | | | shoot down journal keys _before_ populating journal keys with pointers to scanned nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix eytzinger0_find_gt()Kent Overstreet2024-04-081-6/+20
| | | | | | | | | - fix return types: promoting from unsigned to ssize_t does not do what we want here, and was pointless since the rest of the eytzinger code is u32 - nr, not size Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix bch2_get_acl() transaction restart handlingKent Overstreet2024-04-071-16/+14
| | | | | | | bch2_acl_from_disk() uses allocate_dropping_locks, and can thus return a transaction restart - this wasn't handled. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu listHongbo Li2024-04-071-0/+2
| | | | | | | | | | | | | | | | When allocating bkey_cached from bc->freed_pcpu list, it missed decreasing the count of nr_freed_pcpu which would cause the mismatch between the value of nr_freed_pcpu and the list items. This problem also exists in moving new bkey_cached to bc->freed_pcpu list. If these happened, the bug info may appear in bch2_fs_btree_key_cache_exit by the follow code: BUG_ON(list_count_nodes(&bc->freed_pcpu) != bc->nr_freed_pcpu); BUG_ON(list_count_nodes(&bc->freed_nonpcpu) != bc->nr_freed_nonpcpu); Fixes: c65c13f0eac6 ("bcachefs: Run btree key cache shrinker less aggressively") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()Kent Overstreet2024-04-071-10/+45
| | | | | | | | | | | | | Multiple bug fixes for journal iters: - When the journal keys gap buffer is resized, we have to adjust the iterators for moving the gap to the end - We don't want to rewind iterators to point to the key we just inserted if it's not for the correct btree/level Also, add some new assertions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename struct field swap to prevent macro naming collisionThorsten Blum2024-04-061-4/+4
| | | | | | | | The struct field swap can collide with the swap() macro defined in linux/minmax.h. Rename the struct field to prevent such collisions. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* MAINTAINERS: Add entry for bcachefs documentationBagas Sanjaya2024-04-061-0/+1
| | | | | | | | Now that bcachefs docs exist in Documentation/filesystems/bcachefs/, cover it in MAINTAINERS entry for the filesystem. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* Documentation: filesystems: Add bcachefs toctreeBagas Sanjaya2024-04-062-0/+12
| | | | | | | | | | | | | | Commit eb386617be4bdf ("bcachefs: Errcode tracepoint, documentation") adds initial bcachefs documentation (private error codes) but without any table of contents tree for the filesystem docs, hence Sphinx warns: Documentation/filesystems/bcachefs/errorcodes.rst: WARNING: document isn't included in any toctree Add bcachefs toctree to fix above warning. Fixes: eb386617be4b ("bcachefs: Errcode tracepoint, documentation") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>