summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/recovery.c
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: don't free error pointersKent Overstreet2024-05-061-1/+2
| | | | | Reported-by: syzbot+3333603f569fc2ef258c@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: make sure to release last journal pin in replayKent Overstreet2024-04-161-1/+4
| | | | | | | This fixes a deadlock when journal replay has many keys to insert that were from fsck, not the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't scan for btree nodes when we can reconstructKent Overstreet2024-04-091-14/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Reconstruct missing snapshot nodesKent Overstreet2024-04-031-0/+1
| | | | | | | | When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Flag btrees with missing dataKent Overstreet2024-04-031-0/+23
| | | | | | | We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Repair pass for scanning for btree nodesKent Overstreet2024-04-031-25/+29
| | | | | | | | | | | | | | | If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()Kent Overstreet2024-04-031-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_shoot_down_journal_keys()Kent Overstreet2024-04-031-10/+12
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Clear recovery_passes_required as they complete without errorsKent Overstreet2024-04-031-3/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Improve -o norecovery; opts.recovery_pass_limitKent Overstreet2024-03-311-6/+4
| | | | | | | | | | | | | | | | | | | | This adds opts.recovery_pass_limit, and redoes -o norecovery to make use of it; this fixes some issues with -o norecovery so it can be safely used for data recovery. Norecovery means "don't do journal replay"; it's an important data recovery tool when we're getting stuck in journal replay. When using it this way we need to make sure we don't free journal keys after startup, so we continue to overlay them: thus it needs to imply retain_recovery_info, as well as nochanges. recovery_pass_limit is an explicit option for telling recovery to exit after a specific recovery pass; this is a much cleaner way of implementing -o norecovery, as well as being a useful debug feature in its own right. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Ensure bch_sb_field_ext always existsKent Overstreet2024-03-311-17/+8
| | | | | | | | | This makes bch_sb_field_ext more consistent with the rest of -o nochanges - we don't want to be varying other codepaths based on -o nochanges, since it's used for testing in dry run mode; also fixes some potential null ptr derefs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Flush journal immediately after replay if we did early repairKent Overstreet2024-03-311-0/+20
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out recovery_passes.cKent Overstreet2024-03-311-244/+5
| | | | | | | | | | | We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't corrupt journal keys gap buffer when dropping alloc infoKent Overstreet2024-03-171-1/+5
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: reconstruct_alloc cleanupKent Overstreet2024-03-131-13/+38
| | | | | | | | | Now that we've got the errors_silent mechanism, we don't have to check if the reconstruct_alloc option is set all over the place. Also - users no longer have to explicitly select fsck and fix_errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: split out ignore_blacklisted, ignore_not_dirtyKent Overstreet2024-03-131-3/+4
| | | | | | prep work for replaying the journal backwards Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: improve move_gap()Kent Overstreet2024-03-131-2/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: journal_keys now uses darray helpersKent Overstreet2024-03-131-6/+2
| | | | | | nice bit of code cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename journal_keys.d -> journal_keys.dataKent Overstreet2024-03-131-5/+5
| | | | | | This will let us use some darray helpers in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill more -EIO error codesKent Overstreet2024-03-131-1/+1
| | | | | | | | This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix journal replay with unreadable btree rootsKent Overstreet2024-03-101-0/+11
| | | | | | | | | | | | When a btree root is unreadable, we still might be able to get some data back by replaying what's in the journal. Previously though, we got confused when journal replay would attempt to replay a key for a level that didn't exist. This adds bch2_btree_increase_depth(), so that journal replay can handle this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: journal_seq_blacklist_add() now handles entries being added out of ↵Kent Overstreet2024-03-101-1/+1
| | | | | | | | | | | | | | | order bch2_journal_seq_blacklist_add() was bugged when the new entry overlapped with multiple existing entries, and it also assumed new entries are being added in increasing order. This is true on any sane filesystem, but when trying to recover from very badly mangled filesystems we might end up with the journal sequence number rewinding vs. what the blacklist list knows about - easiest to just handle that here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix check_version_upgrade()Kent Overstreet2024-02-131-5/+6
| | | | | | | | When also downgrading, check_version_upgrade() could pick a new version greater than the latest supported version. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_fs_usage_baseKent Overstreet2024-01-211-1/+1
| | | | | | | Split out base filesystem usage into its own type; prep work for breaking up bch2_trans_fs_usage_apply(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Restart recovery passes more reliablyKent Overstreet2024-01-051-1/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Upgrades now specify errors to fix, like downgradesKent Overstreet2024-01-051-9/+9
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Online fsck can now fix errorsKent Overstreet2024-01-051-2/+5
| | | | | | | | BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Upgrading uses bch_sb.recovery_passes_requiredKent Overstreet2024-01-051-8/+6
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix nochanges/read_only interactionKent Overstreet2024-01-051-1/+3
| | | | | | | | | | | | | nochanges means "we cannot issue writes at all"; it's possible to go into a pseudo read-write mode where we pin dirty metadata in memory, which is used for fsck in dry run mode and doing journal replay on a read only mount, but we do not want to allow an actual read-write mount in nochanges mode. But we do always want to allow early read-write, during recovery - this patch clarifies that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: vstruct_for_each() now declares loop iterKent Overstreet2024-01-011-6/+3
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: for_each_member_device() now declares loop iterKent Overstreet2024-01-011-6/+4
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: darray_for_each() now declares loop iterKent Overstreet2024-01-011-1/+0
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_err_(fn|msg) check if should printKent Overstreet2024-01-011-15/+9
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix snapshot.c assertion for online fsckKent Overstreet2024-01-011-0/+3
| | | | | | | c->curr_recovery_pass can go backwards; this adds a non rewinding version, c->recovery_pass_done. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill for_each_btree_key()Kent Overstreet2024-01-011-1/+2
| | | | | | | | | | | | | for_each_btree_key() handles transaction restarts, like for_each_btree_key2(), but only calls bch2_trans_begin() after a transaction restart - for_each_btree_key2() wraps every loop iteration in a transaction. The for_each_btree_key() behaviour is problematic when it leads to holding the SRCU lock that prevents key cache reclaim for an unbounded amount of time - there's no real need to keep it around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch2_run_online_recovery_passes()Kent Overstreet2024-01-011-19/+38
| | | | | | | | Add a new helper for running online recovery passes - i.e. online fsck. This is a subset of our normal recovery passes, and does not - for now - use or follow c->curr_recovery_pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add ability to redirect log outputKent Overstreet2024-01-011-3/+3
| | | | | | | | | | | | | | | | | | | | Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Explicity go RW for fsckKent Overstreet2024-01-011-1/+1
| | | | | | | This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: convert bch_fs_flags to x-macroKent Overstreet2024-01-011-17/+17
| | | | | | | Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Kill dev_usage->buckets_ecKent Overstreet2024-01-011-2/+0
| | | | | | | This counter is redundant; it's simply the sum of BCH_DATA_stripe and BCH_DATA_parity buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't flush journal after replayKent Overstreet2024-01-011-3/+1
| | | | | | | | | The flush_all_pins() after journal replay was unecessary, and trying to completely flush the journal while RW is not a great idea - it's not guaranteed to terminate if other threads keep adding things to the jorunal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rename BTREE_INSERT flagsKent Overstreet2024-01-011-6/+6
| | | | | | | BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make journal replay more efficientKent Overstreet2024-01-011-28/+62
| | | | | | | | | | | Journal replay now first attempts to replay keys in sorted order, similar to how the btree write buffer flush path works. Any keys that can not be replayed due to journal deadlock are then left for later and replayed in journal order, unpinning journal entries as we go. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Go rw before journal replayKent Overstreet2024-01-011-4/+12
| | | | | | | | | | | This gets us slightly nicer log messages. Also, this slightly clarifies synchronization of c->journal_keys; after we go RW it's in use by multiple threads (so that the btree iterator code can overlay keys from the journal); so it has to be prepped before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res"Kent Overstreet2024-01-011-0/+2
| | | | | | | | | | | This slightly changes how trans->journal_res works, in preparation for changing the btree write buffer flush path to use it. Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal reservation; trans->journal_res.seq already refers to the journal sequence number to pin". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Print old version when scanning for old metadataKent Overstreet2024-01-011-2/+6
| | | | | | Also: we should be using bch2_fs_read_write_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Flush fsck errors before running twiceKent Overstreet2024-01-011-0/+2
| | | | | | | | It's confusing if we run fsck a second time (in debug mode, to verify the second run is clean), but errors are still ratelimited from the first run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_sb_field_downgradeKent Overstreet2024-01-011-1/+23
| | | | | | | | | | | | | | | | | | Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: bch_sb.recovery_passes_requiredKent Overstreet2024-01-011-15/+60
| | | | | | | | | | | | | | | | Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add persistent identifiers for recovery passesKent Overstreet2024-01-011-2/+32
| | | | | | | | | The next patch will start to refer to recovery passes from the superblock; naturally, we now need identifiers that don't change, since the existing enum is in the order in which they are run and is not fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>