summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/btree_io.c
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Kill bch2_assert_btree_nodes_not_locked()Kent Overstreet2024-07-141-6/+0
| | | | | | | We no longer track individual btree node locks with lockdep, so this will never be enabled. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree node scan: fall back to comparing by journal seqKent Overstreet2024-07-141-0/+4
| | | | | | | | | | highly damaged filesystems, or filesystems that have been damaged and repair and damaged again, may have sequence numbers we can't fully trust - which in itself is something we need to debug. Add a journal_seq fallback so that repair doesn't get stuck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fsck_err() may now take a btree_transKent Overstreet2024-07-141-1/+1
| | | | | | | | | fsck_err() now optionally takes a btree_trans; if the current thread has one, it is required that it be passed. The next patch will use this to unlock when waiting for user input. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: btree_ptr_sectors_written() now takes bkey_s_cKent Overstreet2024-07-141-4/+4
| | | | | | this is for the userspace metadata dump tool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for bsets past bch_btree_ptr_v2.sectors_writtenKent Overstreet2024-07-141-2/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use try_cmpxchg() family of functions instead of cmpxchg()Uros Bizjak2024-07-141-9/+11
| | | | | | | | | | | | | | | | | Use try_cmpxchg() family of functions instead of cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out btree_write_submit_wqKent Overstreet2024-06-101-4/+4
| | | | | | | | | | Split the workqueues for btree read completions and btree write submissions; we don't want concurrency control on btree read completions, but we do want concurrency control on write submissions, else blocking in submit_bio() will cause a ton of kworkers to be allocated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Plumb bkey into __btree_err()Kent Overstreet2024-05-281-40/+45
| | | | | | | It can be useful to know the exact byte offset within a btree node where an error occured. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_get_ioref() checks for device not presentKent Overstreet2024-05-091-3/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_dev_get_ioref2(); btree_io.cKent Overstreet2024-05-091-15/+18
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: for_each_bset() declares loop iterKent Overstreet2024-05-091-7/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_bkey_drop_ptrs() declares loop iterKent Overstreet2024-05-081-1/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: make btree read errors silent during scanKent Overstreet2024-05-081-5/+11
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: member helper cleanupsKent Overstreet2024-05-081-8/+8
| | | | | | | | | | | | | | Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_node_header_to_text()Kent Overstreet2024-05-081-7/+20
| | | | | | better btree node read path error messages Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: prt_printf() now respects \r\n\tKent Overstreet2024-05-081-10/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix needs_whiteout BUG_ON() in bkey_sort()Kent Overstreet2024-05-081-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btree nodes are log structured; thus, we need to emit whiteouts when we're deleting a key that's been written out to disk. k->needs_whiteout tracks whether a key will need a whiteout when it's deleted, and this requires some careful handling; e.g. the key we're deleting may not have been written out to disk, but it may have overwritten a key that was - thus we need to carry this flag around on overwrites. Invariants: There may be multiple key for the same position in a given node (because of overwrites), but only one of them will be a live (non deleted) key, and only one key for a given position will have the needs_whiteout flag set. Additionally, we don't want to carry around whiteouts that need to be written in the main searchable part of a btree node - btree_iter_peek() will have to skip past them, and this can lead to an O(n^2) issues when doing sequential deletions (e.g. inode rm/truncate). So there's a separate region in the btree node buffer for unwritten whiteouts; these are merge sorted with the rest of the keys we're writing in the btree node write path. The unwritten whiteouts was a later optimization that bch2_sort_keys() didn't take into account; the unwritten whiteouts area means that we never have deleted keys with needs_whiteout set in the main searchable part of a btree node. That means we can simplify and optimize some sort paths, and eliminate an assertion that syzbot found: - Unless we're in the btree node write path, it's always ok to drop whiteouts when sorting - When sorting for a btree node write, we drop the whiteout if it's not from the unwritten whiteouts area, or if it's overwritten by a real key at the same position. This completely eliminates some tricky logic for propagating the needs_whiteout flag: syzbot was able to hit the assertion that checked that there shouldn't be more than one key at the same pos with needs_whiteout set, likely due to a combination of flipping on needs_whiteout on all written keys (they need whiteouts if overwritten), combined with not always dropping unneeded whiteouts, and the tricky logic in the sort path for preserving needs_whiteout that wasn't really needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix format specifier in validate_bset_keys()Nathan Chancellor2024-04-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building for 32-bit platforms, for which size_t is 'unsigned int', there is a warning from a format string in validate_bset_keys(): fs/bcachefs/btree_io.c: In function 'validate_bset_keys': fs/bcachefs/btree_io.c:891:34: error: format '%lu' expects argument of type 'long unsigned int', but argument 12 has type 'unsigned int' [-Werror=format=] 891 | "bad k->u64s %u (min %u max %lu)", k->u64s, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/btree_io.c:603:32: note: in definition of macro 'btree_err' 603 | msg, ##__VA_ARGS__); \ | ^~~ fs/bcachefs/btree_io.c:887:21: note: in expansion of macro 'btree_err_on' 887 | if (btree_err_on(!bkeyp_u64s_valid(&b->format, k), | ^~~~~~~~~~~~ fs/bcachefs/btree_io.c:891:64: note: format string is defined here 891 | "bad k->u64s %u (min %u max %lu)", k->u64s, | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as errors BKEY_U64s is size_t so the entire expression is promoted to size_t. Use the '%zu' specifier so that there is no warning regardless of the width of size_t. Fixes: 031ad9e7dbd1 ("bcachefs: Check for packed bkeys that are too big") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404130747.wH6Dd23p-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202404131536.HdAMBOVc-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: don't queue btree nodes for rewrites during scanKent Overstreet2024-04-131-1/+3
| | | | | | | many nodes found during scan will be old nodes, overwritten by newer nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Check for packed bkeys that are too bigKent Overstreet2024-04-131-7/+8
| | | | | | add missing validation; fixes assertion pop in bkey unpack Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Flag btrees with missing dataKent Overstreet2024-04-031-4/+9
| | | | | | | We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BCH_WATERMARK_interior_updatesKent Overstreet2024-04-011-1/+1
| | | | | | | | | | | | This adds a new watermark, higher priority than BCH_WATERMARK_reclaim, for interior btree updates. We've seen a deadlock where journal replay triggers a ton of btree node merges, and these use up all available open buckets and then interior updates get stuck. One cause of this is that we're currently lacking btree node merging on write buffer btrees - that needs to be fixed as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix btree node keys accounting in topology repair pathKent Overstreet2024-03-311-0/+1
| | | | | | | When dropping keys now outside a now because we're changing the node min/max, we need to redo the node's accounting as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improved topology repair checksKent Overstreet2024-03-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Consolidate bch2_gc_check_topology() and btree_node_interior_verify(), and replace them with an improved version, bch2_btree_node_check_topology(). This checks that children of an interior node correctly span the full range of the parent node with no overlaps. Also, ensure that topology repairs at runtime are always a fatal error; in particular, this adds a check in btree_iter_down() - if we don't find a key while walking down the btree that's indicative of a topology error and should be flagged as such, not a null ptr deref. Some checks in btree_update_interior.c remaining BUG_ONS(), because we already checked the node for topology errors when starting the update, and the assertions indicate that we _just_ corrupted the btree node - i.e. the problem can't be that existing on disk corruption, they indicate an actual algorithmic bug. In the future, we'll be annotating the fsck errors list with which recovery pass corrects them; the open coded "run explicit recovery pass or fatal error" in bch2_btree_node_check_topology() will in the future be done for every fsck_err() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve bch2_fatal_error()Kent Overstreet2024-03-181-5/+5
| | | | | | error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't corrupt journal keys gap buffer when dropping alloc infoKent Overstreet2024-03-171-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill more -EIO error codesKent Overstreet2024-03-131-4/+3
| | | | | | | | This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: kill kvpmalloc()Kent Overstreet2024-03-131-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Correctly validate k->u64s in btree node read pathKent Overstreet2024-03-101-1/+10
| | | | | | | | | | | validate_bset_keys() never properly validated k->u64s; it checked if it was 0, but not if it was smaller than keys for the given packed format; this fixes that small oversight. This patch was backported, so it's adding quite a few error enums so that they don't get renumbered and we don't have confusing gaps. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Prep work for variable size btree node buffersKent Overstreet2024-01-211-19/+19
| | | | | | | | | | | | | | | | | bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: improve checksum error messagesKent Overstreet2024-01-051-6/+12
| | | | | | | | | | | new helpers: - bch2_csum_to_text() - bch2_csum_err_msg() standardize our checksum error messages a bit, and print out the checksums a bit more nicely. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: improve validate_bset_keys()Kent Overstreet2024-01-051-20/+55
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add missing bch2_latency_acct() callKent Overstreet2024-01-051-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add time_stats for btree_node_read_done()Kent Overstreet2024-01-051-0/+2
| | | | | | | Seeing weird latency issues in the btree node read path - add one bch2_btree_node_read_done(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bkey_for_each_ptr() now declares loop iterKent Overstreet2024-01-011-2/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: better error message in btree_node_write_work()Kent Overstreet2024-01-011-1/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve error message when finding wrong btree nodeKent Overstreet2024-01-011-2/+10
| | | | | | | | | single_device.merge_torture_flakey is, very rarely, finding a btree node that doesn't match the key that points to it: this patch improves the error message to print out more fields from the btree node header, so that we can see what else does or does not match the key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Include btree_trans in more tracepointsKent Overstreet2024-01-011-5/+6
| | | | | | | This gives us more context information - e.g. which codepath is invoking btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename BTREE_INSERT flagsKent Overstreet2024-01-011-3/+3
| | | | | | | BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't drop journal pins in exit pathKent Overstreet2023-12-031-2/+2
| | | | | | | | | | | There's no need to drop journal pins in our exit paths - the code was trying to have everything cleaned up on any shutdown, but better to just tweak the assertions a bit. This fixes a bug where calling into journal reclaim in the exit path would cass a null ptr deref. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* closures: CLOSURE_CALLBACK() to fix type punningKent Overstreet2023-11-241-4/+3
| | | | | | | | | | | | | | | | | | | | Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <keescook@chromium.org> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bkey_copy() is no longer a macroKent Overstreet2023-11-051-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Enumerate fsck errorsKent Overstreet2023-11-011-49/+125
| | | | | | | | | | | | | This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add IO error counts to bch_memberKent Overstreet2023-11-011-7/+16
| | | | | | | | | We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_id_str()Kent Overstreet2023-10-311-14/+4
| | | | | | | Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Heap allocate btree_transKent Overstreet2023-10-221-8/+5
| | | | | | | | | | We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix W=12 build errorsKent Overstreet2023-10-221-27/+9
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Break up io.cKent Overstreet2023-10-221-1/+1
| | | | | | | | | More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Array bounds fixesKent Overstreet2023-10-221-11/+10
| | | | | | | | | | | It's no longer legal to use a zero size array as a flexible array member - this causes UBSAN to complain. This patch switches our zero size arrays to normal flexible array members when possible, and inserts casts in other places (e.g. where we use the zero size array as a marker partway through an array). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BCH_COMPAT_bformat_overflow_done no longer requiredKent Overstreet2023-10-221-1/+1
| | | | | | | | | | | | | | | | Awhile back, we changed bkey_format generation to ensure that the packed representation could never represent fields larger than the unpacked representation. This was to ensure that bkey_packed_successor() always gave a sensible result, but in the current code bkey_packed_successor() is only used in a debug assertion - not for anything important. This kills the requirement that we've gotten rid of those weird bkey formats, and instead changes the assertion to check if we're dealing with an old weird bkey format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>