summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs
Commit message (Collapse)AuthorAgeFilesLines
...
| * bcachefs: Fix srcu warning in check_topologyKent Overstreet2024-09-271-0/+2
| | | | | | | | | | | | | | check_topology doesn't need the srcu lock and doesn't use normal btree transactions - we can just drop the srcu lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Fix error path in check_dirent_inode_dirent()Kent Overstreet2024-09-271-3/+2
| | | | | | | | | | | | | | fsck_err() jumps to the fsck_err label when bailing out; need to make sure bp_iter was initialized... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlappingPiotr Zalewski2024-09-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Zero-initialize part of allocated bounce buffer which wasn't touched by subsequent bch2_key_sort_fix_overlapping to mitigate later uinit-value use KMSAN bug[1]. After applying the patch reproducer still triggers stack overflow[2] but it seems unrelated to the uninit-value use warning. After further investigation it was found that stack overflow occurs because KMSAN adds too many function calls[3]. Backtrace of where the stack magic number gets smashed was added as a reply to syzkaller thread[3]. It was confirmed that task's stack magic number gets smashed after the code path where KSMAN detects uninit-value use is executed, so it can be assumed that it doesn't contribute in any way to uninit-value use detection. [1] https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 [2] https://lore.kernel.org/lkml/66e57e46.050a0220.115905.0002.GAE@google.com [3] https://lore.kernel.org/all/rVaWgPULej8K7HqMPNIu8kVNyXNjjCiTB-QBtItLFBmk0alH6fV2tk4joVPk97Evnuv4ZRDd8HB5uDCkiFG6u81xKdzDj-KrtIMJSlF6Kt8=@proton.me Reported-by: syzbot+6f655a60d3244d0c6718@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 Fixes: ec4edd7b9d20 ("bcachefs: Prep work for variable size btree node buffers") Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Improve bch2_is_inode_open() warning messageKent Overstreet2024-09-271-3/+3
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Add extra padding in bkey_make_mut_noupdate()Kent Overstreet2024-09-271-1/+2
| | | | | | | | | | | | | | | | This fixes a kasan splat in propagate_key_to_snapshot_leaves() - varint_decode_fast() does reads (that it never uses) up to 7 bytes past the end of the integer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Mark inode errors as autofixKent Overstreet2024-09-271-16/+16
| | | | | | | | | | | | | | Most or all errors will be autofix in the future, we're currently just doing the ones that we know are well tested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Fix infinite loop in propagate_key_to_snapshot_leaves()Kent Overstreet2024-09-231-0/+1
| | | | | | | | | | | | | | | | As we iterate we need to mark that we no longer need iterators - otherwise we'll infinite loop via the "too many iters" check when there's many snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Ensure BCH_FS_accounting_replay_done is always setKent Overstreet2024-09-231-0/+3
| | | | | | | | | | | | | | | | if it doesn't get set we'll never be able to flush the btree write buffer; this only happens in fake rw mode, but prevents us from shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Hold read lock in bch2_snapshot_tree_oldest_subvol()Ahmed Ehab2024-09-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syzbot reports a problem that a warning is triggered due to suspicious use of rcu_dereference_check(). That is triggered by a call of bch2_snapshot_tree_oldest_subvol(). The cause of the warning is that inside bch2_snapshot_tree_oldest_subvol(), snapshot_t() is called which calls rcu_dereference() that requires a read lock to be held. Also, the call of bch2_snapshot_tree_next() eventually calls snapshot_t(). To fix this, call rcu_read_lock() before calling snapshot_t(). Then, release the lock after the termination of the while loop. Reported-by: <syzbot+f7c41a878676b72c16a6@syzkaller.appspotmail.com> Signed-off-by: Ahmed Ehab <bottaawesome633@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* | [tree-wide] finally take no_llseek outAl Viro2024-09-272-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefsLinus Torvalds2024-09-2384-1604/+3037
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bcachefs updates from Kent Overstreet: - rcu_pending, btree key cache rework: this solves lock contenting in the key cache, eliminating the biggest source of the srcu lock hold time warnings, and drastically improving performance on some metadata heavy workloads - on multithreaded creates we're now 3-4x faster than xfs. - We're now using an rhashtable instead of the system inode hash table; this is another significant performance improvement on multithreaded metadata workloads, eliminating more lock contention. - for_each_btree_key_in_subvolume_upto(): new helper for iterating over keys within a specific subvolume, eliminating a lot of open coded "subvolume_get_snapshot()" and also fixing another source of srcu lock time warnings, by running each loop iteration in its own transaction (as the existing for_each_btree_key() does). - More work on btree_trans locking asserts; we now assert that we don't hold btree node locks when trans->locked is false, which is important because we don't use lockdep for tracking individual btree node locks. - Some cleanups and improvements in the bset.c btree node lookup code, from Alan. - Rework of btree node pinning, which we use in backpointers fsck. The old hacky implementation, where the shrinker just skipped over nodes in the pinned range, was causing OOMs; instead we now use another shrinker with a much higher seeks number for pinned nodes. - Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue where rebalance would sometimes fall back to allocating from the full filesystem, which is not what we want when it's trying to move data to a specific target. - Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache allocations. - Idmap mounts are now supported (Hongbo Li) - Rename whiteouts are now supported (Hongbo Li) - Erasure coding can now handle devices being marked as failed, or forcibly removed. We still need the evacuate path for erasure coding, but it's getting very close to ready for people to start using. * tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs: (99 commits) bcachefs: return err ptr instead of null in read sb clean bcachefs: Remove duplicated include in backpointers.c bcachefs: Don't drop devices with stripe pointers bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices bcachefs: bch_fs.rw_devs_change_count bcachefs: bch2_dev_remove_stripes() bcachefs: bch2_trigger_ptr() calculates sectors even when no device bcachefs: improve error messages in bch2_ec_read_extent() bcachefs: improve error message on too few devices for ec bcachefs: improve bch2_new_stripe_to_text() bcachefs: ec_stripe_head.nr_created bcachefs: bch_stripe.disk_label bcachefs: stripe_to_mem() bcachefs: EIO errcode cleanup bcachefs: Rework btree node pinning bcachefs: split up btree cache counters for live, freeable bcachefs: btree cache counters should be size_t bcachefs: Don't count "skipped access bit" as touched in btree cache scan bcachefs: Failed devices no longer require mounting in degraded mode bcachefs: bch2_dev_rcu_noerror() ...
| * bcachefs: return err ptr instead of null in read sb cleanDiogo Jahchan Koike2024-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported a null-ptr-deref in bch2_fs_start. [0] When a sb is marked clear but doesn't have a clean section bch2_read_superblock_clean returns NULL which PTR_ERR_OR_ZERO lets through, eventually leading to a null ptr dereference down the line. Adjust read sb clean to return an ERR_PTR indicating the invalid clean section. [0] https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543 Reported-by: syzbot+1cecc37d87c4286e5543@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543 Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Remove duplicated include in backpointers.cYang Li2024-09-211-1/+0
| | | | | | | | | | | | | | | | | | | | The header files bbpos.h is included twice in backpointers.c, so one inclusion of each can be removed. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10783 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Don't drop devices with stripe pointersKent Overstreet2024-09-214-9/+32
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devicesKent Overstreet2024-09-212-27/+60
| | | | | | | | | | | | | | | | | | | | | | This factors out ec_strie_head_devs_update(), which initializes the bitmap of devices we're allocating from, and runs it every time c->rw_devs_change_count changes. We also cancel pending, not allocated stripes, since they may refer to devices that are no longer available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch_fs.rw_devs_change_countKent Overstreet2024-09-212-4/+9
| | | | | | | | | | | | | | | | Add a counter that's incremented whenever rw devices change; this will be used for erasure coding so that it can keep ec_stripe_head in sync and not deadlock on a new stripe when a device it wants goes away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_dev_remove_stripes()Kent Overstreet2024-09-214-3/+74
| | | | | | | | | | | | | | We can now correctly force-remove a device that has stripes on it; this uses the new BCH_SB_MEMBER_INVALID sentinal value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_trigger_ptr() calculates sectors even when no deviceKent Overstreet2024-09-212-10/+21
| | | | | | | | | | | | | | This is necessary for erasure coded pointers to devices that have been removed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: improve error messages in bch2_ec_read_extent()Kent Overstreet2024-09-213-19/+23
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: improve error message on too few devices for ecKent Overstreet2024-09-211-3/+16
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: improve bch2_new_stripe_to_text()Kent Overstreet2024-09-211-0/+2
| | | | | | | | | | | | also print out the new stripe key Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: ec_stripe_head.nr_createdKent Overstreet2024-09-212-2/+6
| | | | | | | | | | | | additional debug stat Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch_stripe.disk_labelKent Overstreet2024-09-214-16/+43
| | | | | | | | | | | | | | | | | | | | | | When reshaping existing stripes, we should keep them on the same target that they were allocated on; to do this, we need to add a field to the btree stripe type. This is a tad awkward, because we only have 8 bits left, and targets are 16 bits - but we only need to store a label, not a full target. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: stripe_to_mem()Kent Overstreet2024-09-211-18/+15
| | | | | | | | | | | | factor out a common helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: EIO errcode cleanupKent Overstreet2024-09-215-27/+33
| | | | | | | | | | | | | | We want to be using private errcodes whenever possible, for better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Rework btree node pinningKent Overstreet2024-09-217-75/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In backpointers fsck, we do a seqential scan of one btree, and check references to another: extents <-> backpointers Checking references generates random lookups, so we want to pin that btree in memory (or only a range, if it doesn't fit in ram). Previously, this was done with a simple check in the shrinker - "if btree node is in range being pinned, don't free it" - but this generated OOMs, as our shrinker wasn't well behaved if there was less memory available than expected. Instead, we now have two different shrinkers and lru lists; the second shrinker being for pinned nodes, with seeks set much higher than normal - so they can still be freed if necessary, but we'll prefer not to. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: split up btree cache counters for live, freeableKent Overstreet2024-09-216-32/+47
| | | | | | | | | | | | | | this is prep for introducing a second live list and shrinker for pinned nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: btree cache counters should be size_tKent Overstreet2024-09-216-36/+37
| | | | | | | | | | | | | | 32 bits won't overflow any time soon, but size_t is the correct type for counting objects in memory. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Don't count "skipped access bit" as touched in btree cache scanKent Overstreet2024-09-211-0/+1
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Failed devices no longer require mounting in degraded modeKent Overstreet2024-09-211-1/+1
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_dev_rcu_noerror()Kent Overstreet2024-09-216-13/+22
| | | | | | | | | | | | bch2_dev_rcu() now properly errors if the device is invalid Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Progress indicator for extents_to_backpointersKent Overstreet2024-09-211-6/+82
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_opts_to_text()Kent Overstreet2024-09-213-21/+35
| | | | | | | | | | | | | | Factor out bch2_show_options() into a generic helper, for debugging option passing issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: improve "no device to read from" messageKent Overstreet2024-09-211-1/+7
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Fix compilation error for bch2_sb_member_allocHongbo Li2024-09-211-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following compilation error: ``` fs/bcachefs/sb-members.c: In function ‘bch2_sb_member_alloc’: fs/bcachefs/sb-members.c:508:2: error: a label can only be part of a statement and a declaration is not a statement 508 | unsigned nr_devices = max_t(unsigned, dev_idx + 1, c->sb.nr_devices); ``` Fixes: a7d364a133c7 ("bcachefs: bch2_sb_member_alloc()") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_sb_member_alloc()Kent Overstreet2024-09-213-46/+53
| | | | | | | | | | | | refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch2_dev_remove_alloc() -> alloc_background.cKent Overstreet2024-09-213-27/+30
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Move tabstop setup to bch2_dev_usage_to_text()Kent Overstreet2024-09-212-7/+9
| | | | | | | | | | | | No reason for it not to be where it's needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Options for recovery_passes, recovery_passes_excludeKent Overstreet2024-09-218-20/+33
| | | | | | | | | | | | | | | | This adds mount options for specifying recovery passes to run, or exclude; the immediate need for this is that backpointers fsck is having trouble completing, so we need a way to skip it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Use mm_account_reclaimed_pages() when freeing btree nodesKent Overstreet2024-09-211-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When freeing in a shrinker callback, we need to notify memory reclaim, so it knows forward progress has been made. Normally this is done in e.g. slab code, but we're not freeing through slab - or rather we are, but these allocations are big, and use the kmalloc_large() path. This is really a bug in the slub code, but we're working around it here for now. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Use __GFP_ACCOUNT for reclaimable memoryKent Overstreet2024-09-212-0/+4
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Hook up RENAME_WHITEOUT in rename.Sasha Finkelstein2024-09-214-14/+52
| | | | | | | | | | | | | | This is needed for overlayfs, which is used by container managers. Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: rebalance writes use BCH_WRITE_ONLY_SPECIFIED_DEVSKent Overstreet2024-09-212-2/+3
| | | | | | | | | | | | | | this was an oversight: rebalance is moving data to a specific device, so we don't want it falling back to the full filesystem Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: BCH_WRITE_ALLOC_NOWAIT no longer applies to open bucket allocationKent Overstreet2024-09-213-12/+16
| | | | | | | | | | | | | | | | rebalance writes must be BCH_WRITE_ALLOC_NOWAIT because they don't allocate from the full filesystem - but we don't want spurious allocation failures due to open buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: fix prototype to bch2_alloc_sectors_start_trans()Kent Overstreet2024-09-214-17/+18
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: kill redundant is_vmalloc_addr()Kent Overstreet2024-09-211-8/+4
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: convert __bch2_encrypt_bio() to darrayKent Overstreet2024-09-211-19/+22
| | | | | | | | | | | | | | | | like the previous patch, kill use of bare arrays; the encryption code likes to work in big batches, so this is a small performance improvement. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: do_encrypt() now handles allocation failuresKent Overstreet2024-09-211-18/+30
| | | | | | | | | | | | convert to darray, and add a fallback when allocation fails Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Add pinned to btree cache not freed countersKent Overstreet2024-09-212-21/+36
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Annotate bch_replicas_entry_{v0,v1} with __counted_by()Thorsten Blum2024-09-094-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the __counted_by compiler attribute to the flexible array members devs to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Increment nr_devs before adding a new device to the devs array and adjust the array indexes accordingly. Add a helper macro for adding a new device. In bch2_journal_read(), explicitly set nr_devs to 0. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>