linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	bcachefs: Fix missing write refs in fs fio paths	Kent Overstreet	2024-04-13	1	-0/+2
\| \| \| \| \| \|	bch2_journal_flush_seq requires us to have a write ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Reconstruct missing snapshot nodes	Kent Overstreet	2024-04-03	1	-0/+1
\| \| \| \| \| \| \| \|	When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Flag btrees with missing data	Kent Overstreet	2024-04-03	1	-0/+1
\| \| \| \| \| \| \|	We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Repair pass for scanning for btree nodes	Kent Overstreet	2024-04-03	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Split out recovery_passes.c	Kent Overstreet	2024-03-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Move snapshot table size to struct snapshot_table	Kent Overstreet	2024-03-31	1	-1/+0
\| \| \| \| \| \| \| \|	We need to add bounds checking for snapshot table accesses - it turns out there are cases where we do need to use the snapshots table before fsck checks have completed (and indeed, fsck may not have been run). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Split out btree_node_rewrite_worker	Kent Overstreet	2024-03-17	1	-0/+2
\| \| \| \| \| \| \| \|	This fixes a deadlock due to using btree_interior_update_worker for non interior updates - async btree node rewrites were blocking, and then blocking other interior updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: time_stats: split stats-with-quantiles into a separate structure	Darrick J. Wong	2024-03-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: pull out time_stats.[ch]	Kent Overstreet	2024-03-13	1	-2/+1
\| \| \| \| \| \|	prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Rename journal_keys.d -> journal_keys.data	Kent Overstreet	2024-03-13	1	-3/+3
\| \| \| \| \| \|	This will let us use some darray helpers in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Split out discard fastpath	Kent Overstreet	2024-03-13	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_print_opts()	Kent Overstreet	2024-03-13	1	-0/+3
\| \| \| \| \| \| \|	Make sure early error messages get redirected, for kernel-fsck-from-userland. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: BTREE_ID_subvolume_children	Kent Overstreet	2024-03-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Clamp replicas_required to replicas	Kent Overstreet	2024-02-13	1	-0/+12
\| \| \| \| \| \| \|	This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Prep work for variable size btree node buffers	Kent Overstreet	2024-01-21	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: add time_stats for btree_node_read_done()	Kent Overstreet	2024-01-05	1	-0/+1
\| \| \| \| \| \| \|	Seeing weird latency issues in the btree node read path - add one bch2_btree_node_read_done(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Online fsck can now fix errors	Kent Overstreet	2024-01-05	1	-4/+1
\| \| \| \| \| \| \| \|	BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: factor out thread_with_file, thread_with_stdio	Kent Overstreet	2024-01-05	1	-8/+12
\| \| \| \| \| \| \|	thread_with_stdio now knows how to handle input - fsck can now prompt to fix errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: track transaction durations	Kent Overstreet	2024-01-05	1	-0/+1
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Make sure allocation failure errors are logged	Kent Overstreet	2024-01-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \|	The previous patch fixed a bug in allocation path error handling, and it would've been noticed sooner had it been logged properly. Generally speaking, errors that shouldn't happen in normal operation and are being returned up the stack should be logged: the write path was already logging IO errors, but non IO errors were missed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch_err_(fn\|msg) check if should print	Kent Overstreet	2024-01-01	1	-2/+7
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: fix userspace build errors	Kent Overstreet	2024-01-01	1	-0/+1
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: kill btree_trans->wb_updates	Kent Overstreet	2024-01-01	1	-1/+0
\| \| \| \| \| \|	the btree write buffer path now creates a journal entry directly Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: btree write buffer now slurps keys from journal	Kent Overstreet	2024-01-01	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Improve trans->extra_journal_entries	Kent Overstreet	2024-01-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Instead of using a darray, we now allocate journal entries for the transaction commit path with our normal bump allocator - with an inlined fastpath, and using btree_transaction_stats to remember how much to initially allocate so as to avoid transaction restarts. This is prep work for converting write buffer updates to use this mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix snapshot.c assertion for online fsck	Kent Overstreet	2024-01-01	1	-0/+3
\| \| \| \| \| \| \|	c->curr_recovery_pass can go backwards; this adds a non rewinding version, c->recovery_pass_done. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: BCH_IOCTL_FSCK_ONLINE	Kent Overstreet	2024-01-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new ioctl for running fsck on a mounted, in use filesystem. This reuses the fsck_thread code from the previous patch for running fsck on an offline, unmounted filesystem, so that log messages for the fsck thread are redirected to userspace. Only one running fsck instance is allowed at a time; a new semaphore (since the lock will be taken by one thread and released by another) is added for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_run_online_recovery_passes()	Kent Overstreet	2024-01-01	1	-0/+7
\| \| \| \| \| \| \| \|	Add a new helper for running online recovery passes - i.e. online fsck. This is a subset of our normal recovery passes, and does not - for now - use or follow c->curr_recovery_pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Add ability to redirect log output	Kent Overstreet	2024-01-01	1	-14/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: c->ro_ref	Kent Overstreet	2024-01-01	1	-0/+21
\| \| \| \| \| \| \| \| \|	Add a new refcount for async ops that don't necessarily need the fs to be RW, with similar lifetime/rules otherwise as c->writes. To be used by online fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: count_event()	Kent Overstreet	2024-01-01	1	-1/+3
\| \| \| \| \| \|	Small helper for event counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Clean up btree write buffer write ref handling	Kent Overstreet	2024-01-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	__bch2_btree_write_buffer_flush() now assumes a write ref is already held (as called by the transaction commit path); and the wrappers bch2_write_buffer_flush() and flush_sync() take an explicit write ref. This means internally the write buffer code can always use BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags around and hoping the NOCHECK_RW flag was always carried around correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: convert bch_fs_flags to x-macro	Kent Overstreet	2024-01-01	1	-28/+34
\| \| \| \| \| \| \|	Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: track_event_change()	Kent Overstreet	2024-01-01	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch_sb.recovery_passes_required	Kent Overstreet	2024-01-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix zstd compress workspace size	Kent Overstreet	2023-11-28	1	-1/+1
\| \| \| \| \| \| \|	zstd apparently lies about the size of the compression workspace it requires; if we double it compression succeeds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Proper refcounting for journal_keys	Kent Overstreet	2023-11-24	1	-0/+2
\| \| \| \| \| \| \| \| \|	The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix multiple -Warray-bounds warnings	Gustavo A. R. Silva	2023-11-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Transform zero-length array `entries` into a proper flexible-array member in `struct journal_seq_blacklist_table`; and fix the following -Warray-bounds warnings: fs/bcachefs/journal_seq_blacklist.c:148:26: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:150:30: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:154:27: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:176:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:177:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:297:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:298:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:300:31: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] This results in no differences in binary output. This helps with the ongoing efforts to globally enable -Warray-bounds. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch_sb_field_errors	Kent Overstreet	2023-11-01	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Add IO error counts to bch_member	Kent Overstreet	2023-11-01	1	-0/+2
\| \| \| \| \| \| \| \| \|	We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: rebalance_work	Kent Overstreet	2023-11-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: move: move_stats refactoring	Kent Overstreet	2023-10-31	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \|	data_progress_list is gone - it was redundant with moving_context_list The upcoming rebalance rewrite is going to have it using two different move_stats objects with the same moving_context, depending on whether it's scanning or using the rebalance_work btree - this patch plumbs stats around a bit differently so that will work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Split out disk_groups_types.h	Kent Overstreet	2023-10-31	1	-0/+1
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarily	Kent Overstreet	2023-10-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Be a bit more careful about when bch2_delete_dead_snapshots needs to run: it only needs to run synchronously if we're running fsck, and it only needs to run at all if we have snapshot nodes to delete or if fsck has noticed that it needs to run. Also: Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS Kill bch2_delete_dead_snapshots_hook(), move functionality to bch2_mark_snapshot() Factor out bch2_check_snapshot_needs_deletion(), to explicitly check if we need to be running snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: snapshot_create_lock	Kent Overstreet	2023-10-22	1	-0/+1
\| \| \| \| \| \| \|	Add a new lock for snapshot creation - this addresses a few races with logged operations and snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch_err_msg(), bch_err_fn() now filters out transaction restart errors	Kent Overstreet	2023-10-22	1	-2/+10
\| \| \| \| \| \| \|	These errors aren't actual errors, and should never be printed - do this in the common helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Heap allocate btree_trans	Kent Overstreet	2023-10-22	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix W=12 build errors	Kent Overstreet	2023-10-22	1	-1/+1
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: BTREE_ID_logged_ops	Kent Overstreet	2023-10-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Add logging to bch2_inode_peek() & related	Kent Overstreet	2023-10-22	1	-2/+2
\| \| \| \| \| \| \|	Add error messages when we fail to lookup an inode, and also add a few missing bch2_err_class() calls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>