| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Reported-by: syzbot+3333603f569fc2ef258c@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
This fixes a deadlock when journal replay has many keys to insert that
were from fsck, not the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
| |
When the snapshots btree is going, we'll have to delete huge amounts of
data - unless we can reconstruct it by looking at the keys that refer to
it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
We need this to know when we should attempt to reconstruct the snapshots
btree
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a btree root or interior btree node goes bad, we're going to lose a
lot of data, unless we can recover the nodes that it pointed to by
scanning.
Fortunately btree node headers are fully self describing, and
additionally the magic number is xored with the filesytem UUID, so we
can do so safely.
This implements the scanning - next patch will rework topology repair to
make use of the found nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds opts.recovery_pass_limit, and redoes -o norecovery to make use
of it; this fixes some issues with -o norecovery so it can be safely
used for data recovery.
Norecovery means "don't do journal replay"; it's an important data
recovery tool when we're getting stuck in journal replay.
When using it this way we need to make sure we don't free journal keys
after startup, so we continue to overlay them: thus it needs to imply
retain_recovery_info, as well as nochanges.
recovery_pass_limit is an explicit option for telling recovery to exit
after a specific recovery pass; this is a much cleaner way of
implementing -o norecovery, as well as being a useful debug feature in
its own right.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
This makes bch_sb_field_ext more consistent with the rest of -o
nochanges - we don't want to be varying other codepaths based on -o
nochanges, since it's used for testing in dry run mode; also fixes some
potential null ptr derefs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
| |
We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.
So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
Now that we've got the errors_silent mechanism, we don't have to check
if the reconstruct_alloc option is set all over the place.
Also - users no longer have to explicitly select fsck and fix_errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
| |
prep work for replaying the journal backwards
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
| |
nice bit of code cleanup
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
| |
This will let us use some darray helpers in the next patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
| |
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a btree root is unreadable, we still might be able to get some data
back by replaying what's in the journal. Previously though, we got
confused when journal replay would attempt to replay a key for a level
that didn't exist.
This adds bch2_btree_increase_depth(), so that journal replay can handle
this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
order
bch2_journal_seq_blacklist_add() was bugged when the new entry
overlapped with multiple existing entries, and it also assumed new
entries are being added in increasing order.
This is true on any sane filesystem, but when trying to recover from
very badly mangled filesystems we might end up with the journal sequence
number rewinding vs. what the blacklist list knows about - easiest to
just handle that here.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
| |
When also downgrading, check_version_upgrade() could pick a new version
greater than the latest supported version.
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
Split out base filesystem usage into its own type; prep work for
breaking up bch2_trans_fs_usage_apply().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
| |
BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing
fsck errors. Also; set fix_errors to ask by default when fsck is
running.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nochanges means "we cannot issue writes at all"; it's possible to go
into a pseudo read-write mode where we pin dirty metadata in memory,
which is used for fsck in dry run mode and doing journal replay on a
read only mount, but we do not want to allow an actual read-write mount
in nochanges mode.
But we do always want to allow early read-write, during recovery - this
patch clarifies that.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
c->curr_recovery_pass can go backwards; this adds a non rewinding
version, c->recovery_pass_done.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.
The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
| |
Add a new helper for running online recovery passes - i.e. online fsck.
This is a subset of our normal recovery passes, and does not - for now -
use or follow c->curr_recovery_pass.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upcoming patches are going to add two new ioctls for running fsck in the
kernel, but pretending that we're running our normal userspace fsck.
This patch adds some plumbing for redirecting our normal log messages
away from the dmesg log to a thread_with_file file descriptor - via a
struct log_output, which will be consumed by the fsck f_op's read method.
The new ioctls will allow for running fsck in the kernel against an
offline filesystem (without mounting it), and an online filesystem. For
an offline filesystem we need a way to pass in a pointer to the
log_output, which is done via a new hidden opts.h option.
For online fsck, we can set c->output directly, but only want to
redirect log messages from the thread running fsck - hence the new
c->output_filter method.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less
error prone.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
This counter is redundant; it's simply the sum of BCH_DATA_stripe and
BCH_DATA_parity buckets.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
The flush_all_pins() after journal replay was unecessary, and trying to
completely flush the journal while RW is not a great idea - it's not
guaranteed to terminate if other threads keep adding things to the
jorunal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
| |
BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
| |
Journal replay now first attempts to replay keys in sorted order,
similar to how the btree write buffer flush path works.
Any keys that can not be replayed due to journal deadlock are then left
for later and replayed in journal order, unpinning journal entries as we
go.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
| |
This gets us slightly nicer log messages.
Also, this slightly clarifies synchronization of c->journal_keys; after
we go RW it's in use by multiple threads (so that the btree iterator
code can overlay keys from the journal); so it has to be prepped before
that point.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
| |
This slightly changes how trans->journal_res works, in preparation for
changing the btree write buffer flush path to use it.
Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal
reservation; trans->journal_res.seq already refers to the journal
sequence number to pin".
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
| |
Also: we should be using bch2_fs_read_write_early()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
| |
It's confusing if we run fsck a second time (in debug mode, to verify
the second run is clean), but errors are still ratelimited from the
first run.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new superblock section that contains a list of
{ minor version, recovery passes, errors_to_fix }
that is - a list of recovery passes that must be run when downgrading
past a given version, and a list of errors to silently fix.
The upcoming disk accounting rewrite is not going to be fully
compatible: we're going to have to regenerate accounting both when
upgrading to the new version, and also from downgrading from the new
version, since the new method of doing disk space accounting is a
completely different architecture based on deltas, and synchronizing
them for every jounal entry write to maintain compatibility is going to
be too expensive and impractical.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add two new superblock fields. Since the main section of the superblock
is now fully, we have to add a new variable length section for them -
bch_sb_field_ext.
- recovery_passes_requried: recovery passes that must be run on the
next mount
- errors_silent: errors that will be silently fixed
These are to improve upgrading and dwongrading: these fields won't be
cleared until after recovery successfully completes, so there won't be
any issues with crashing partway through an upgrade or a downgrade.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
|
|
|
|
|
| |
The next patch will start to refer to recovery passes from the
superblock; naturally, we now need identifiers that don't change, since
the existing enum is in the order in which they are run and is not
fixed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|