linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	bcachefs: Fix promotes	Kent Overstreet	2023-12-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	The recent work to fix data moves w.r.t. durability broke promotes, because the caused us to bail out when the extent minus pointers being dropped still has enough pointers to satisfy the current number of replicas. Disable this check when we're adding cached replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix leakage of internal error code	Kent Overstreet	2023-12-21	1	-2/+4
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix insufficient disk reservation with compression + snapshots	Kent Overstreet	2023-12-21	1	-7/+8
\| \| \| \| \| \| \|	When overwriting and splitting existing extents, we weren't correctly accounting for a 3 way split of a compressed extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: fix BCH_FSCK_ERR enum	Kent Overstreet	2023-12-19	2	-2/+2
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix bch2_alloc_sectors_start_trans() error handling	Kent Overstreet	2023-12-19	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we fail to allocate because of insufficient open buckets, we don't want to retry from the full set of devices - we just want to retry in blocking mode. But if the retry in blocking mode fails with a different error code, we end up squashing the -BCH_ERR_open_buckets_empty error with an error that makes us thing we won't be able to allocate (insufficient_devices) - which is incorrect when we didn't try to allocate from the full set of devices, and causes the write to fail. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs; guard against overflow in btree node split	Kent Overstreet	2023-12-19	1	-0/+12
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: btree_node_u64s_with_format() takes nr keys	Kent Overstreet	2023-12-19	2	-17/+14
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: print explicit recovery pass message only once	Kent Overstreet	2023-12-17	1	-0/+3
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: improve modprobe support by providing softdeps	Daniel Hill	2023-12-14	1	-0/+6
\| \| \| \| \| \| \| \| \|	We need to help modprobe load architecture specific modules so we don't fall back to generic software implementations, this should help performance when building as a module. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: fix invalid memory access in bch2_fs_alloc() error path	Thomas Bertschinger	2023-12-14	3	-2/+8
\| \| \| \| \| \| \| \| \| \|	When bch2_fs_alloc() gets an error before calling bch2_fs_btree_iter_init(), bch2_fs_btree_iter_exit() makes an invalid memory access because btree_trans_list is uninitialized. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Fixes: 6bd68ec266ad ("bcachefs: Heap allocate btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix determining required file handle length	Jan Kara	2023-12-13	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \|	The ->encode_fh method is responsible for setting amount of space required for storing the file handle if not enough space was provided. bch2_encode_fh() was not setting required length in that case which breaks e.g. fanotify. Fix it. Reported-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix nocow locks deadlock	Kent Overstreet	2023-12-11	1	-1/+2
\| \| \| \| \| \| \|	On trylock failure we were waiting for outstanding reads to complete - but nocow locks need to be held until the whole move is finished. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Close journal entry if necessary when flushing all pins	Kent Overstreet	2023-12-10	4	-4/+9
\| \| \| \| \| \| \| \|	Since outstanding journal buffers hold a journal pin, when flushing all pins we need to close the current journal entry if necessary so its pin can be released. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix uninitialized var in bch2_journal_replay()	Kent Overstreet	2023-12-10	1	-1/+1
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix deleted inode check for dirs	Kent Overstreet	2023-12-08	3	-13/+22
\| \| \| \| \| \| \| \| \| \| \|	We could delete directories transactionally on rmdir()/unlink(), but we don't; instead, like with regular files we wait for the VFS to call evict(). That means that our check for directories in the deleted inodes btree is wrong - the check should be for non-empty directories. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: rebalance shouldn't attempt to compress unwritten extents	Daniel Hill	2023-12-06	1	-1/+2
\| \| \| \| \| \| \| \|	This fixes a bug where rebalance would loop repeatedly on the same extents. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: don't attempt rw on unfreeze when shutdown	Brian Foster	2023-12-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The internal freeze mechanism in bcachefs mostly reuses the generic rw<->ro transition code. If the fs happens to shutdown during or after freeze, a transition back to rw can fail. This is expected, but returning an error from the unfreeze callout prevents the filesystem from being unfrozen. Skip the read write transition if the fs is shutdown. This allows the fs to unfreeze at the vfs level so writes will no longer block, but will still fail due to the emergency read-only state of the fs. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix creating snapshot with implict source	Kent Overstreet	2023-12-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	When creating a snapshot without specifying the source subvolume, we use the subvolume containing the new snapshot. Previously, this worked if the directory containing the new snapshot was the subvolume root - but we were using the incorrect helper, and got a subvolume ID of 0 when the parent directory wasn't the root of the subvolume, causing an emergency read-only. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Don't run indirect extent trigger unless inserting/deleting	Kent Overstreet	2023-12-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a transaction path overflow reported in the snapshot deletion path, when moving extents to the correct snapshot. The root of the issue is that creating/deleting a reflink pointer can generate an unbounded number of updates, if it is allowed to reference an unbounded number of indirect extents; to prevent this, merging of reflink pointers has been disabled. But there's a hole, which is that copygc/rebalance may fragment existing extents in the course of moving them around, and if an indirect extent becomes too fragmented we'll then become unable to delete the reflink pointer. The eventual solution is going to be to tweak trigger handling so that we can process large reflink pointers incrementally when necessary, and notice that trigger updates don't need to be run for the part of the reflink pointer not changing. That is going to be a bigger project though, for another patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Convert compression_stats to for_each_btree_key2	Kent Overstreet	2023-12-04	1	-4/+4
\| \| \| \| \| \| \|	for_each_btree_key2() runs each loop iteration in a btree transaction, and thus does not cause SRCU lock hold time problems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix bch2_extent_drop_ptrs() call	Kent Overstreet	2023-12-04	1	-2/+2
\| \| \| \| \| \| \|	Also, make bch2_extent_drop_ptrs() safer, so it works with extents and non-extents iterators. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix a journal deadlock in replay	Kent Overstreet	2023-12-04	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recently, journal pre-reservations were removed. They were for reserving space ahead of time in the journal for operations that are required for journal reclaim, e.g. btree key cache flushing and interior node btree updates. Instead we have watermarks - only operations for journal reclaim are allowed when the journal is low on space, and in general we're quite good about doing operations in the order that will free up space in the journal quickest when we're low on space. If we're doing a journal reclaim operation out of order, we usually do it in nonblocking mode if it's not freeing up space at the end of the journal. There's an exceptino though - interior btree node update operations have to be BCH_WATERMARK_reclaim - once they've been started, and they can't be nonblocking. Generally this is fine because they'll only be a very small fraction of transaction commits - but there's an exception, which is during journal replay. Journal replay does many btree operations, but doesn't need to commit them to the journal since they're already in the journal. So killing off of pre-reservation, plus another change to make journal replay more efficient by initially doing the replay in sorted btree order, made it possible for the interior update operations replay generates to fill and deadlock the journal. Fix this by introducing a new check on journal space at the _start_ of an interior update operation. This causes us to block if necessary in exactly the same way as we used to when interior updates took a journal pre-reservaiton, but without all the expensive accounting pre-reservations required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs; Don't use btree write buffer until journal replay is finished	Kent Overstreet	2023-12-04	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The keys being replayed by journal replay have to be synchronized with updates by other threads that overwrite them. We rely on btree node locks for synchronizing - but since btree write buffer updates take no btree locks, that won't work. Instead, simply disable using the btree write buffer until journal replay is finished. This fixes a rare backpointers error in the merge_torture_flakey test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Don't drop journal pins in exit path	Kent Overstreet	2023-12-03	4	-12/+5
\| \| \| \| \| \| \| \| \| \| \|	There's no need to drop journal pins in our exit paths - the code was trying to have everything cleaned up on any shutdown, but better to just tweak the assertions a bit. This fixes a bug where calling into journal reclaim in the exit path would cass a null ptr deref. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	Merge tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds	2023-12-03	7	-34/+54
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull smb client fixes from Steve French: - Two fallocate fixes - Fix warnings from new gcc - Two symlink fixes * tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client, common: fix fortify warnings cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved smb: client: report correct st_size for SMB and NFS symlinks smb: client: fix missing mode bits for SMB symlinks
\| *	smb: client, common: fix fortify warnings	Dmitry Antipov	2023-11-30	5	-31/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When compiling with gcc version 14.0.0 20231126 (experimental) and CONFIG_FORTIFY_SOURCE=y, I've noticed the following: In file included from ./include/linux/string.h:295, from ./include/linux/bitmap.h:12, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/paravirt.h:17, from ./arch/x86/include/asm/cpuid.h:62, from ./arch/x86/include/asm/processor.h:19, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:60, from ./arch/x86/include/asm/preempt.h:9, from ./include/linux/preempt.h:79, from ./include/linux/spinlock.h:56, from ./include/linux/wait.h:9, from ./include/linux/wait_bit.h:8, from ./include/linux/fs.h:6, from fs/smb/client/smb2pdu.c:18: In function 'fortify_memcpy_chk', inlined from '__SMB2_close' at fs/smb/client/smb2pdu.c:3480:4: ./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 588 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and: In file included from ./include/linux/string.h:295, from ./include/linux/bitmap.h:12, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/paravirt.h:17, from ./arch/x86/include/asm/cpuid.h:62, from ./arch/x86/include/asm/processor.h:19, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:60, from ./arch/x86/include/asm/preempt.h:9, from ./include/linux/preempt.h:79, from ./include/linux/spinlock.h:56, from ./include/linux/wait.h:9, from ./include/linux/wait_bit.h:8, from ./include/linux/fs.h:6, from fs/smb/client/cifssmb.c:17: In function 'fortify_memcpy_chk', inlined from 'CIFS_open' at fs/smb/client/cifssmb.c:1248:3: ./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 588 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In both cases, the fortification logic inteprets calls to 'memcpy()' as an attempts to copy an amount of data which exceeds the size of the specified field (i.e. more than 8 bytes from __le64 value) and thus issues an overread warning. Both of these warnings may be silenced by using the convenient 'struct_group()' quirk. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
\| *	cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved	David Howells	2023-11-29	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the cifs filesystem implementations of FALLOC_FL_INSERT_RANGE, in smb3_insert_range(), to set i_size after extending the file on the server and before we do the copy to open the gap (as we don't clean up the EOF marker if the copy fails). Fixes: 7fe6fe95b936 ("cifs: add FALLOC_FL_INSERT_RANGE support") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>
\| *	cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved	David Howells	2023-11-29	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the cifs filesystem implementations of FALLOC_FL_ZERO_RANGE, in smb3_zero_range(), to set i_size after extending the file on the server. Fixes: 72c419d9b073 ("cifs: fix smb3_zero_range so it can expand the file-size when required") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>
\| *	smb: client: report correct st_size for SMB and NFS symlinks	Paulo Alcantara	2023-11-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can't rely on FILE_STANDARD_INFORMATION::EndOfFile for reparse points as they will be always zero. Set it to symlink target's length as specified by POSIX. This will make stat() family of syscalls return the correct st_size for such files. Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
\| *	smb: client: fix missing mode bits for SMB symlinks	Paulo Alcantara	2023-11-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When instantiating inodes for SMB symlinks, add the mode bits from @cifs_sb->ctx->file_mode as we already do for the other special files. Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* \|	Merge tag 'fs_for_v6.7-rc4' of ↵	Linus Torvalds	2023-12-02	1	-1/+0
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2 fix from Jan Kara: "Fix an ext2 bug introduced by changes in ext2 & iomap stepping on each other toes (apparently ext2 driver does not get much testing in linux-next)" * tag 'fs_for_v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix ki_pos update for DIO buffered-io fallback case
\| * \|	ext2: Fix ki_pos update for DIO buffered-io fallback case	Ritesh Harjani (IBM)	2023-11-22	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit "filemap: update ki_pos in generic_perform_write", made updating of ki_pos into common code in generic_perform_write() function. This also causes generic/091 to fail. This happened due to an in-flight collision with: fb5de4358e1a ("ext2: Move direct-io to use iomap"). I have chosen fixes tag based on which commit got landed later to upstream kernel. Fixes: 182c25e9c157 ("filemap: update ki_pos in generic_perform_write") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <d595bee9f2475ed0e8a2e7fb94f7afc2c6ffc36a.1700643443.git.ritesh.list@gmail.com>
* \| \|	Merge tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs	Linus Torvalds	2023-12-02	36	-228/+394
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull more bcachefs bugfixes from Kent Overstreet: - bcache & bcachefs were broken with CFI enabled; patch for closures to fix type punning - mark erasure coding as extra-experimental; there are incompatible disk space accounting changes coming for erasure coding, and I'm still seeing checksum errors in some tests - several fixes for durability-related issues (durability is a device specific setting where we can tell bcachefs that data on a given device should be counted as replicated x times) - a fix for a rare livelock when a btree node merge then updates a parent node that is almost full - fix a race in the device removal path, where dropping a pointer in a btree node to a device would be clobbered by an in flight btree write updating the btree node key on completion - fix one SRCU lock hold time warning in the btree gc code - ther's still a bunch more of these to fix - fix a rare race where we'd start copygc before initializing the "are we rw" percpu refcount; copygc would think we were already ro and die immediately * tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs: (23 commits) bcachefs: Extra kthread_should_stop() calls for copygc bcachefs: Convert gc_alloc_start() to for_each_btree_key2() bcachefs: Fix race between btree writes and metadata drop bcachefs: move journal seq assertion bcachefs: -EROFS doesn't count as move_extent_start_fail bcachefs: trace_move_extent_start_fail() now includes errcode bcachefs: Fix split_race livelock bcachefs: Fix bucket data type for stripe buckets bcachefs: Add missing validation for jset_entry_data_usage bcachefs: Fix zstd compress workspace size bcachefs: bpos is misaligned on big endian bcachefs: Fix ec + durability calculation bcachefs: Data update path won't accidentaly grow replicas bcachefs: deallocate_extra_replicas() bcachefs: Proper refcounting for journal_keys bcachefs: preserve device path as device name bcachefs: Fix an endianness conversion bcachefs: Start gc, copygc, rebalance threads after initing writes ref bcachefs: Don't stop copygc thread on device resize bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops ...
\| * \| \|	bcachefs: Extra kthread_should_stop() calls for copygc	Kent Overstreet	2023-11-28	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a bug where going read-only was taking longer than it should have due to copygc forgetting to check kthread_should_stop() Additionally: fix a missing is_kthread check in bch2_move_ratelimit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Convert gc_alloc_start() to for_each_btree_key2()	Kent Overstreet	2023-11-28	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This eliminates some SRCU warnings: for_each_btree_key2() runs every loop iteration in a distinct transaction context. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Fix race between btree writes and metadata drop	Kent Overstreet	2023-11-28	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	btree writes update the btree node key after every write, in order to update sectors_written, and they also might need to drop pointers if one of the writes failed in a replicated btree node. But the btree node might also have had a pointer dropped while the write was in flight, by bch2_dev_metadata_drop(), and thus there was a bug where the btree node write would ovewrite the btree node's key with what it had at the start of the write. Fix this by dropping pointers not currently in the btree node key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: move journal seq assertion	Kent Overstreet	2023-11-28	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	journal_cur_seq() can legitimately be used outside of the journal lock, where this assert can race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: -EROFS doesn't count as move_extent_start_fail	Kent Overstreet	2023-11-28	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The automated tests check if we've hit too many slowpath/error path events and fail the test - if we're just shutting down, that naturally shouldn't count. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: trace_move_extent_start_fail() now includes errcode	Kent Overstreet	2023-11-28	3	-17/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Renamed from trace_move_extent_alloc_mem_fail, because there are other reasons we colud fail (disk space allocation failure). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Fix split_race livelock	Kent Overstreet	2023-11-28	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bch2_btree_update_start() calculates which nodes are going to have to be split/rewritten, so that we know how many nodes to reserve and how deep in the tree we have to take locks. But btree node merges require inserting two keys into the parent node, not just splits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Fix bucket data type for stripe buckets	Kent Overstreet	2023-11-28	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Add missing validation for jset_entry_data_usage	Kent Overstreet	2023-11-28	4	-31/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Validation was completely missing for replicas entries in the journal (not the superblock replicas section) - we can't have replicas entries pointing to invalid devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Fix zstd compress workspace size	Kent Overstreet	2023-11-28	2	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	zstd apparently lies about the size of the compression workspace it requires; if we double it compression succeeds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: bpos is misaligned on big endian	Kent Overstreet	2023-11-25	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bkey embeds a bpos that is misaligned on big endian; this is so that bch2_bkey_swab() works correctly without having to differentiate between packed and non-packed keys (a debatable design decision). This means it can't have the __aligned() tag on big endian. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Fix ec + durability calculation	Kent Overstreet	2023-11-25	1	-18/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Durability of an erasure coded pointer doesn't add the device durability; durability is the same for any extent in that stripe so the calculation only comes from the stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Data update path won't accidentaly grow replicas	Kent Overstreet	2023-11-25	5	-67/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, there was a bug where if an extent had greater durability than required (because we needed to move a durability=1 pointer and ended up putting it on a durability 2 device), we would submit a write for replicas=2 - the durability of the pointer being rewritten - instead of the number of replicas required to bring it back up to the data_replicas option. This, plus the allocation path sometimes allocating on a greater durability device than requested, meant that extents could continue having more and more replicas added as they were being rewritten. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: deallocate_extra_replicas()	Kent Overstreet	2023-11-24	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When allocating from devices with different durability, we might end up with more replicas than required; this changes bch2_alloc_sectors_start() to check for this, and drop replicas that aren't needed to hit the number of replicas requested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Proper refcounting for journal_keys	Kent Overstreet	2023-11-24	6	-11/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: preserve device path as device name	Brian Foster	2023-11-24	3	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Various userspace scripts/tools may expect mount entries in /proc/mounts to reflect the device path names used to mount the associated filesystem. bcachefs seems to normalize the device path to the underlying device name based on the block device. This confuses tools like fstests when the test devices might be lvm or device-mapper based. The default behavior for show_vfsmnt() appers to be to use the string passed to alloc_vfsmnt(), so tweak bcachefs to copy the path at device superblock read time and to display it via ->show_devname(). Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| * \| \|	bcachefs: Fix an endianness conversion	Kent Overstreet	2023-11-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cpu_to_le32(), not le32_to_cpu() - fixes a sparse complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>