summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/rebalance.c
Commit message (Collapse)AuthorAgeFilesLines
* bcachefs: Data move path now uses bch2_trans_unlock_long()Kent Overstreet2023-11-041-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Ensure copygc does not spinKent Overstreet2023-11-041-6/+4
| | | | | | | If copygc does no work - finds no fragmented buckets - wait for a bit of IO to happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: rebalance_workKent Overstreet2023-11-011-226/+327
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: move.c exports, refactoringKent Overstreet2023-10-311-2/+1
| | | | | | Prep work for the new rebalance code - we need a few helpers exported. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Break up io.cKent Overstreet2023-10-221-2/+0
| | | | | | | | | More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Convert more code to bch_err_msg()Kent Overstreet2023-10-221-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fix divide by zero in rebalance_work()Kent Overstreet2023-10-221-0/+4
| | | | | | This fixes https://github.com/koverstreet/bcachefs-tools/issues/159 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Compression levelsKent Overstreet2023-10-221-1/+2
| | | | | | | | | | | | | | | | | | | | This allows including a compression level when specifying a compression type, e.g. compression=zstd:15 Values from 1 through 15 indicate compression levels, 0 or unspecified indicates the default. For LZ4, values 3-15 specify that the HC algorithm should be used. Note that for compatibility, extents themselves only include the compression type, not the compression level. This means that specifying the same compression algorithm but different compression levels for the compression and background_compression options will have no effect. XXX: perhaps we could add a warning for this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: don't spin in rebalance when background target is not usableBrian Foster2023-10-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | If a bcachefs filesystem is configured with a background device (disk group), rebalance will relocate data to this device in the background by checking extent keys for whether they currently reside in the specified target. For keys that do not, rebalance performs a read/write cycle to allow the write path to properly relocate data. If the background target is not usable (read-only, for example), however, the write path doesn't actually move data to another device. Instead, rebalance spins indefinitely reading and rewriting the same data over and over to the same device. If the background target is made available again, the rebalance picks this up, relocates the data, and eventually terminates. To avoid this spinning behavior, update the rebalance background target logic to not only check whether the extent is not in the target, but whether the target is actually usable as well. If not, then don't mark the key for rewrite. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Fixes for building in userspaceKent Overstreet2023-10-221-1/+1
| | | | | | | | | | | | | | - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use bch2_err_str() in error messagesKent Overstreet2023-10-221-3/+6
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: move.c refactoringKent Overstreet2023-10-221-2/+3
| | | | | | | | | - add bch2_moving_ctxt_(init|exit) - split out __bch2_evacutae_bucket() which takes an existing moving_ctxt, this will be used for improving copygc performance by pipelining across multiple buckets Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: data jobs, including rebalance wait for copygc.Daniel Hill2023-10-221-1/+1
| | | | | | | | | | | | | | move_ratelimit() now has a bool that specifies whether we want to wait for copygc to finish. When copygc is running, we're probably low on free buckets instead of consuming the remaining buckets, we want to wait for copygc to finish. This should help with performance, and run away bucket fragmentation. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Redo data_update interfaceKent Overstreet2023-10-221-38/+46
| | | | | | | | | | | | | | | | | | | | | | This patch significantly cleans up and simplifies the data_update interface. Instead of only being able to specify a single pointer by device to rewrite, we're now able to specify any or all of the pointers in the original extent to be rewrited, as a bitmask. data_cmd is no more: the various pred functions now just return true if the extent should be moved/updated. All the data_update path does is rewrite existing replicas, or add new ones. This fixes a bug where with background compression on replicated filesystems, where rebalance -> data_update would incorrectly drop the wrong old replica, and keep trying to recompress an extent pointer and each time failing to drop the right replica. Oops. Now, the data update path doesn't look at the io options to decide which pointers to keep and which to drop - it only goes off of the data_update_options passed to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
* bcachefs: Printbuf reworkKent Overstreet2023-10-221-23/+24
| | | | | | | This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Heap allocate printbufsKent Overstreet2023-10-221-15/+27
| | | | | | | | | | | This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: add progress stats to sysfsBrett Holman2023-10-221-6/+5
| | | | | | | | This adds progress stats to sysfs for copygc, rebalance, recovery, and the cmd_job ioctls. Signed-off-by: Brett Holman <bholman.devel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Use bch2_bpos_to_text() more consistentlyKent Overstreet2023-10-221-4/+4
| | | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Scan for old btree nodes if necessary on mountKent Overstreet2023-10-221-0/+3
| | | | | | | | | | | We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned out there were people who still had filesystems with btree nodes in that format in the wild. This adds a new compat feature that indicates we've scanned for and rewritten nodes in the old format, and does that scan at mount time if the option isn't set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add code to scan for/rewite old btree nodesKent Overstreet2023-10-221-1/+2
| | | | | | | | This adds a new data job type to scan for btree nodes in the old extent format, and rewrite them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add error message for some allocation failuresKent Overstreet2023-10-221-1/+3
| | | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Persist 64 bit io clocksKent Overstreet2023-10-221-5/+5
| | | | | | | | | | | | | | | | Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Move journal reclaim to a kthreadKent Overstreet2023-10-221-1/+1
| | | | | | | This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Don't drop replicas when copygcing ec dataKent Overstreet2023-10-221-0/+1
| | | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Convert various code to printbufKent Overstreet2023-10-221-11/+8
| | | | | | | printbufs know how big the buffer is that was allocated, so we can get rid of the random PAGE_SIZEs all over the place. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Account for ioclock slop when throttling rebalance threadKent Overstreet2023-10-221-6/+10
| | | | | | | This should fix an issue where the rebalance thread was spinning Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Track incompressible dataKent Overstreet2023-10-221-1/+2
| | | | | | | | | This fixes the background_compression option: wihout some way of marking data as incompressible, rebalance will keep rewriting incompressible data over and over. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Refactor rebalance_pred functionKent Overstreet2023-10-221-49/+44
| | | | | | | | | | | | Before, the logic for if we should move an extent was duplicated somewhat, in both rebalance_add_key() and rebalance_pred(); this centralizes that in __rebalance_pred() This is prep work for a patch that enables marking data as incompressible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Add a cond_resched() to rebalance loopKent Overstreet2023-10-221-0/+2
| | | | | Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Rebalance now adds replicas if neededKent Overstreet2023-10-221-26/+19
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Refactor various code to not be extent specificKent Overstreet2023-10-221-4/+2
| | | | | | | | With reflink, various code now has to handle both KEY_TYPE_extent or KEY_TYPE_reflink_v - so, convert it to be generic across all keys with pointers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Only get btree iters from btree transactionsKent Overstreet2023-10-221-2/+2
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Lots of option handling improvementsKent Overstreet2023-10-221-5/+5
| | | | | | | Add helptext to option definitions - so we can unify the option handling with the format command Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Make bkey types globally uniqueKent Overstreet2023-10-221-15/+21
| | | | | | | | | | this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: revamp to_text methodsKent Overstreet2023-10-221-20/+14
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: extent_for_each_ptr_decode()Kent Overstreet2023-10-221-16/+15
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Split out alloc_background.cKent Overstreet2023-10-221-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: fix a divideKent Overstreet2023-10-221-1/+1
| | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* bcachefs: Initial commitKent Overstreet2023-10-221-0/+342
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>