linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()	Kent Overstreet	2024-04-10	1	-27/+1
\| \| \| \| \| \| \| \| \| \| \|	dropping read locks in bch2_btree_node_lock_write_nofail() dates from before we had the cycle detector; we can now tell the cycle detector directly when taking a lock may not fail because we can't handle transaction restarts. This is needed for adding should_be_locked asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Drop redundant btree_path_downgrade()s	Kent Overstreet	2024-03-13	1	-1/+2
\| \| \| \| \| \| \| \|	If a path doesn't have any active references, we shouldn't downgrade it; it'll either be reused, possibly with intent refs again, or dropped at bch2_trans_begin() time. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Add gfp flags param to bch2_prt_task_backtrace()	Kent Overstreet	2024-01-22	1	-2/+2
\| \| \| \| \| \|	Fixes: e6a2566f7a00 ("bcachefs: Better journal tracepoints") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: smatch
*	bcachefs: Improve trace_trans_restart_relock	Kent Overstreet	2024-01-21	1	-7/+33
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Improve would_deadlock trace event	Kent Overstreet	2024-01-05	1	-7/+12
\| \| \| \| \| \|	We now include backtraces for every thread involved in the cycle. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS	Kent Overstreet	2024-01-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Some tweaks to greatly reduce locking overhead for the list of btree transactions, so that it can always be enabled: leave btree_trans objects on the list when they're on the percpu single item freelist, and only check for duplicates in the same process when CONFIG_BCACHEFS_DEBUG is enabled - don't zero out the full btree_trans() unless we allocated it from the mempool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: rcu protect trans->paths	Kent Overstreet	2024-01-01	1	-7/+24
\| \| \| \| \| \| \| \| \|	Upcoming patches are going to be changing trans->paths to a reallocatable buffer. We need to guard against use after free when it's used by other threads; this introduces RCU protection to those paths and changes them to check for trans->paths == NULL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: kill btree_path.idx	Kent Overstreet	2024-01-01	1	-1/+1
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: trans_for_each_path() no longer uses path->idx	Kent Overstreet	2024-01-01	1	-14/+24
\| \| \| \| \| \| \| \| \| \|	path->idx is now a code smell: we should be using path_idx_t, since it's stable across btree path reallocation. This is also a bit faster, using the same loop counter vs. fetching path->idx from each path we iterate over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: kill btree_path->(alloc_seq\|downgrade_seq)	Kent Overstreet	2024-01-01	1	-1/+0
\| \| \| \| \| \| \| \|	These were for extra info in tracepoints for debugging a specialized issue - we do not want to bloat btree_path for this, at least in release builds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Improve trace_trans_restart_would_deadlock	Kent Overstreet	2024-01-01	1	-3/+22
\| \| \| \| \| \| \| \|	In the CI, we're seeing tests failing due to excessive would_deadlock transaction restarts - the tracepoint now includes the lock cycle that occured. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Improve btree_path_dowgrade tracepoint	Kent Overstreet	2024-01-01	1	-2/+2
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Ensure srcu lock is not held too long	Kent Overstreet	2023-11-04	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SRCU read lock that btree_trans takes exists to make it safe for bch2_trans_relock() to deref pointers to btree nodes/key cache items we don't have locked, but as a side effect it blocks reclaim from freeing those items. Thus, it's important to not hold it for too long: we need to differentiate between bch2_trans_unlock() calls that will be only for a short duration, and ones that will be for an unbounded duration. This introduces bch2_trans_unlock_long(), to be used mainly by the data move paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Don't downgrade locks on transaction restart	Kent Overstreet	2023-11-01	1	-9/+29
\| \| \| \| \| \| \| \|	We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Assorted fixes for clang	Kent Overstreet	2023-10-22	1	-3/+3
\| \| \| \| \| \| \|	clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Assorted sparse fixes	Kent Overstreet	2023-10-22	1	-7/+0
\| \| \| \| \| \| \| \| \|	- endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Don't call lock_graph_descend() with wait lock held	Kent Overstreet	2023-10-22	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a deadlock: 01305 WARNING: possible circular locking dependency detected 01305 6.3.0-ktest-gf4de9bee61af #5305 Tainted: G W 01305 ------------------------------------------------------ 01305 cat/14658 is trying to acquire lock: 01305 ffffffc00982f460 (fs_reclaim){+.+.}-{0:0}, at: __kmem_cache_alloc_node+0x48/0x278 01305 01305 but task is already holding lock: 01305 ffffff8011aaf040 (&lock->wait_lock){+.+.}-{2:2}, at: bch2_check_for_deadlock+0x4b8/0xa58 01305 01305 which lock already depends on the new lock. 01305 01305 01305 the existing dependency chain (in reverse order) is: 01305 01305 -> #2 (&lock->wait_lock){+.+.}-{2:2}: 01305 _raw_spin_lock+0x54/0x70 01305 __six_lock_wakeup+0x40/0x1b0 01305 six_unlock_ip+0xe8/0x248 01305 bch2_btree_key_cache_scan+0x720/0x940 01305 shrink_slab.constprop.0+0x284/0x770 01305 shrink_node+0x390/0x828 01305 balance_pgdat+0x390/0x6d0 01305 kswapd+0x2e4/0x718 01305 kthread+0x184/0x1a8 01305 ret_from_fork+0x10/0x20 01305 01305 -> #1 (&c->lock#2){+.+.}-{3:3}: 01305 __mutex_lock+0x104/0x14a0 01305 mutex_lock_nested+0x30/0x40 01305 bch2_btree_key_cache_scan+0x5c/0x940 01305 shrink_slab.constprop.0+0x284/0x770 01305 shrink_node+0x390/0x828 01305 balance_pgdat+0x390/0x6d0 01305 kswapd+0x2e4/0x718 01305 kthread+0x184/0x1a8 01305 ret_from_fork+0x10/0x20 01305 01305 -> #0 (fs_reclaim){+.+.}-{0:0}: 01305 __lock_acquire+0x19d0/0x2930 01305 lock_acquire+0x1dc/0x458 01305 fs_reclaim_acquire+0x9c/0xe0 01305 __kmem_cache_alloc_node+0x48/0x278 01305 __kmalloc_node_track_caller+0x5c/0x278 01305 krealloc+0x94/0x180 01305 bch2_printbuf_make_room.part.0+0xac/0x118 01305 bch2_prt_printf+0x150/0x1e8 01305 bch2_btree_bkey_cached_common_to_text+0x170/0x298 01305 bch2_btree_trans_to_text+0x244/0x348 01305 print_cycle+0x7c/0xb0 01305 break_cycle+0x254/0x528 01305 bch2_check_for_deadlock+0x59c/0xa58 01305 bch2_btree_deadlock_read+0x174/0x200 01305 full_proxy_read+0x94/0xf0 01305 vfs_read+0x15c/0x3a8 01305 ksys_read+0xb8/0x148 01305 __arm64_sys_read+0x48/0x60 01305 invoke_syscall.constprop.0+0x64/0x138 01305 do_el0_svc+0x84/0x138 01305 el0_svc+0x34/0x80 01305 el0t_64_sync_handler+0xb0/0xb8 01305 el0t_64_sync+0x14c/0x150 01305 01305 other info that might help us debug this: 01305 01305 Chain exists of: 01305 fs_reclaim --> &c->lock#2 --> &lock->wait_lock 01305 01305 Possible unsafe locking scenario: 01305 01305 CPU0 CPU1 01305 ---- ---- 01305 lock(&lock->wait_lock); 01305 lock(&c->lock#2); 01305 lock(&lock->wait_lock); 01305 lock(fs_reclaim); 01305 01305 * DEADLOCK * Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_trans_unlock_noassert()	Kent Overstreet	2023-10-22	1	-0/+8
\| \| \| \| \| \|	This fixes a spurious assert in the btree node read path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: drop_locks_do()	Kent Overstreet	2023-10-22	1	-4/+1
\| \| \| \| \| \| \| \| \|	Add a new helper for the common pattern of: - trans_unlock() - do something - trans_relock() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: trans_for_each_path_safe()	Kent Overstreet	2023-10-22	1	-3/+4
\| \| \| \| \| \| \| \| \|	bch2_btree_trans_to_text() is used on btree_trans objects that are owned by different threads - when printing out deadlock cycles - so we need a safe version of trans_for_each_path(), else we race with seeing a btree_path that was just allocated and not fully initialized: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	six locks: Kill six_lock_pcpu_(alloc\|free)	Kent Overstreet	2023-10-22	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	six_lock_pcpu_alloc() is an unsafe interface: it's not safe to allocate or free the percpu reader count on an existing lock that's in use, the only safe time to allocate percpu readers is when the lock is first being initialized. This patch adds a flags parameter to six_lock_init(), and instead of six_lock_pcpu_free() we now expose six_lock_exit(), which does the same thing but is less likely to be misused. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	six locks: six_lock_readers_add()	Kent Overstreet	2023-10-22	1	-10/+0
\| \| \| \| \| \| \|	This moves a helper out of the bcachefs code that shouldn't have been there, since it touches six lock internals. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Centralize btree node lock initialization	Kent Overstreet	2023-10-22	1	-1/+17
\| \| \| \| \| \| \|	This fixes some confusion in the lockdep code due to initializing btree node/key cache locks with the same lockdep key, but different names. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix erasure coding locking	Kent Overstreet	2023-10-22	1	-0/+13
\| \| \| \| \| \| \|	This adds a new helper, bch2_trans_mutex_lock(), for locking a mutex - dropping and retaking btree locks as needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: New backtrace utility code	Kent Overstreet	2023-10-22	1	-1/+1
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: trans->notrace_relock_fail	Kent Overstreet	2023-10-22	1	-1/+1
\| \| \| \| \| \| \| \| \|	When we unlock in order to submit IO, the next relock event is likely to fail if submit_bio() blocked - we shouldn't those events in our _fail stats, since those are expected events and shouldn't cause test failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Use six_lock_ip()	Kent Overstreet	2023-10-22	1	-1/+2
\| \| \| \| \| \| \|	This uses the new _ip() interface to six locks and hooks it up to btree_path->ip_allocated, when available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_trans_relock_notrace()	Kent Overstreet	2023-10-22	1	-0/+15
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fixes for building in userspace	Kent Overstreet	2023-10-22	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Assorted checkpatch fixes	Kent Overstreet	2023-10-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	fixup bcachefs: Deadlock cycle detector	Kent Overstreet	2023-10-22	1	-0/+7
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	fixup bcachefs: Deadlock cycle detector	Kent Overstreet	2023-10-22	1	-2/+26
\| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix lock_graph_remove_non_waiters()	Kent Overstreet	2023-10-22	1	-96/+76
\| \| \| \| \| \| \| \| \|	We were removing 1 more entry than we were supposed to - oops. Also some other simplifications and cleanups, and bring back the abort preference code in a better fashion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Simplify break_cycle()	Kent Overstreet	2023-10-22	1	-9/+10
\| \| \| \| \| \| \| \|	We'd like to prioritize aborting transactions that have done less work - however, it appears breaking cycles by telling other threads to abort may still be buggy, so disable that for now. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Print cycle on unrecoverable deadlock	Kent Overstreet	2023-10-22	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \|	Some lock operations can't fail; a cycle of nofail locks is impossible to recover from. So we want to get rid of these nofail locking operations, but as this is tricky it'll be done incrementally. If such a cycle happens, this patch prints out which codepaths are involved so we know what to work on next. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_trans_locked()	Kent Overstreet	2023-10-22	1	-0/+10
\| \| \| \| \| \|	Useful debugging function. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Improve btree_deadlock debugfs output	Kent Overstreet	2023-10-22	1	-16/+38
\| \| \| \| \| \| \| \|	This changes bch2_check_for_deadlock() to print the longest chains it finds - when we have a deadlock because the cycle detector isn't finding something, this will let us see what it's missing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: bch2_btree_node_relock_notrace()	Kent Overstreet	2023-10-22	1	-2/+4
\| \| \| \| \| \| \| \| \|	Most of the node_relock_fail trace events are generated from bch2_btree_path_verify_level(), when debugcheck_iterators is enabled - but we're not interested in these trace events, they don't indicate that we're in a slowpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Ensure bch2_btree_node_lock_write_nofail() never fails	Kent Overstreet	2023-10-22	1	-0/+34
\| \| \| \| \| \| \| \| \|	In order for bch2_btree_node_lock_write_nofail() to never produce a deadlock, we must ensure we're never holding read locks when using it. Fortunately, it's only used from code paths where any read locks may be safely dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Delete old deadlock avoidance code	Kent Overstreet	2023-10-22	1	-92/+8
\| \| \| \| \| \|	This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
*	bcachefs: Print deadlock cycle in debugfs	Kent Overstreet	2023-10-22	1	-19/+25
\| \| \| \| \| \| \| \| \| \|	In the event that we're not finished debugging the cycle detector, this adds a new file to debugfs that shows what the cycle detector finds, if anything. By comparing this with btree_transactions, which shows held locks for every btree_transaction, we'll be able to determine if it's the cycle detector that's buggy or something else. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Deadlock cycle detector	Kent Overstreet	2023-10-22	1	-3/+243
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
*	bcachefs: Fix bch2_btree_node_upgrade()	Kent Overstreet	2023-10-22	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, if we were trying to upgrade from a read to an intent lock but we held an additional read lock via another btree_path, bch2_btree_node_upgrade() would always fail, in six_lock_tryupgrade(). This patch factors out the code that __bch2_btree_node_lock_write() uses to temporarily drop extra read locks, so that six_lock_tryupgrade() can succeed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
*	bcachefs: Fix redundant transaction restart	Kent Overstreet	2023-10-22	1	-4/+3
\| \| \| \| \| \| \|	Little bit of tidying up, this makes the counters a little bit clearer as to what's happening. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Convert more locking code to btree_bkey_cached_common	Kent Overstreet	2023-10-22	1	-5/+6
\| \| \| \| \| \| \| \| \|	Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: btree_bkey_cached_common->cached	Kent Overstreet	2023-10-22	1	-2/+1
\| \| \| \| \| \| \| \|	Add a type descriptor to btree_bkey_cached_common - there's no reason not to since we've got padding that was otherwise unused, and this is a nice cleanup (and helpful in later patches). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Fix six_lock_readers_add()	Kent Overstreet	2023-10-22	1	-2/+4
\| \| \| \| \| \| \|	Have to be careful with bit fields - when subtracting, this was overflowing into the write_locking bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: New locking functions	Kent Overstreet	2023-10-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
*	bcachefs: Add persistent counters for all tracepoints	Kent Overstreet	2023-10-22	1	-6/+6
\| \| \| \| \| \| \|	Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
*	bcachefs: Improve bch2_btree_node_relock()	Kent Overstreet	2023-10-22	1	-7/+1
\| \| \| \| \| \| \|	This moves the IS_ERR_OR_NULL() check to the inline part, since that's a fast path event. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>