linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	platform/x86: silicom-platform: Add missing "Description:" for power_cycle ↵	Hans de Goede	2024-01-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	sysfs attr The Documentation/ABI/testing/sysfs-platform-silicom entry for the power_cycle sysfs attr is missing the "Description:" keyword, add this. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240108140655.547261-1-hdegoede@redhat.com
*	platform/x86: intel-wmi-sbl-fw-update: Fix function name in error message	Armin Wolf	2024-01-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since when the driver was converted to use the bus-based WMI interface, the old GUID-based WMI functions are not used anymore. Update the error message to avoid confusing users. Compile-tested only. Fixes: 75c487fcb69c ("platform/x86: intel-wmi-sbl-fw-update: Use bus-based WMI interface") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240106224126.13803-1-W_Armin@gmx.de Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
*	platform/x86: p2sb: Use pci_resource_n() in p2sb_read_bar0()	Shin'ichiro Kawasaki	2024-01-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Accesses to resource[] member of struct pci_dev shall be wrapped with pci_resource_n() for future compatibility. Call the helper function in p2sb_read_bar0(). Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20240108062059.3583028-3-shinichiro.kawasaki@wdc.com Tested-by Klara Modin <klarasmodin@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
*	platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe	Shin'ichiro Kawasaki	2024-01-22	1	-41/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	p2sb_bar() unhides P2SB device to get resources from the device. It guards the operation by locking pci_rescan_remove_lock so that parallel rescans do not find the P2SB device. However, this lock causes deadlock when PCI bus rescan is triggered by /sys/bus/pci/rescan. The rescan locks pci_rescan_remove_lock and probes PCI devices. When PCI devices call p2sb_bar() during probe, it locks pci_rescan_remove_lock again. Hence the deadlock. To avoid the deadlock, do not lock pci_rescan_remove_lock in p2sb_bar(). Instead, do the lock at fs_initcall. Introduce p2sb_cache_resources() for fs_initcall which gets and caches the P2SB resources. At p2sb_bar(), refer the cache and return to the caller. Before operating the device at P2SB DEVFN for resource cache, check that its device class is PCI_CLASS_MEMORY_OTHER 0x0580 that PCH specifications define. This avoids unexpected operation to other devices at the same DEVFN. Link: https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/ Fixes: 9745fb07474f ("platform/x86/intel: Add Primary to Sideband (P2SB) bridge support") Cc: stable@vger.kernel.org Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20240108062059.3583028-2-shinichiro.kawasaki@wdc.com Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by Klara Modin <klarasmodin@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
*	platform/x86: intel-uncore-freq: Fix types in sysfs callbacks	Nathan Chancellor	2024-01-22	2	-57/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When booting a kernel with CONFIG_CFI_CLANG, there is a CFI failure when accessing any of the values under /sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00: $ cat /sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00/max_freq_khz fish: Job 1, 'cat /sys/devices/system/cpu/int…' terminated by signal SIGSEGV (Address boundary error) $ sudo dmesg &\| grep 'CFI failure' [ 170.953925] CFI failure at kobj_attr_show+0x19/0x30 (target: show_max_freq_khz+0x0/0xc0 [intel_uncore_frequency_common]; expected type: 0xd34078c5 The sysfs callback functions such as show_domain_id() are written as if they are going to be called by dev_attr_show() but as the above message shows, they are instead called by kobj_attr_show(). kCFI checks that the destination of an indirect jump has the exact same type as the prototype of the function pointer it is called through and fails when they do not. These callbacks are called through kobj_attr_show() because uncore_root_kobj was initialized with kobject_create_and_add(), which means uncore_root_kobj has a ->sysfs_ops of kobj_sysfs_ops from kobject_create(), which uses kobj_attr_show() as its ->show() value. The only reason there has not been a more noticeable problem until this point is that 'struct kobj_attribute' and 'struct device_attribute' have the same layout, so getting the callback from container_of() works the same with either value. Change all the callbacks and their uses to be compatible with kobj_attr_show() and kobj_attr_store(), which resolves the kCFI failure and allows the sysfs files to work properly. Closes: https://github.com/ClangBuiltLinux/linux/issues/1974 Fixes: ae7b2ce57851 ("platform/x86/intel/uncore-freq: Use sysfs API to create attributes") Cc: stable@vger.kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20240104-intel-uncore-freq-kcfi-fix-v1-1-bf1e8939af40@kernel.org Signed-off-by: Hans de Goede <hdegoede@redhat.com>
*	platform/x86: wmi: Fix wmi_dev_probe()	Dan Carpenter	2024-01-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This has a reversed if statement so it accidentally disables the wmi method before returning. Fixes: 704af3a40747 ("platform/x86: wmi: Remove chardev interface") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/9c81251b-bc87-4ca3-bb86-843dc85e5145@moroto.mountain Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
*	platform/x86: wmi: Fix notify callback locking	Armin Wolf	2024-01-22	1	-24/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an legacy WMI event handler is removed, an WMI event could have called the handler just before it was removed, meaning the handler could still be running after wmi_remove_notify_handler() returns. Something similar could also happens when using the WMI bus, as the WMI core might still call the notify() callback from an WMI driver even if its remove() callback was just called. Fix this by introducing a rw semaphore which ensures that the event state of a WMI device does not change while the WMI core is handling an event for it. Tested on a Dell Inspiron 3505 and a Acer Aspire E1-731. Fixes: 1686f5444546 ("platform/x86: wmi: Incorporate acpi_install_notify_handler") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240103192707.115512-5-W_Armin@gmx.de Signed-off-by: Hans de Goede <hdegoede@redhat.com>
*	platform/x86: wmi: Decouple legacy WMI notify handlers from wmi_block_list	Armin Wolf	2024-01-22	1	-50/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now, legacy WMI notify handler functions where using the wmi_block_list, which did no refcounting on the returned WMI device. This meant that the WMI device could disappear at any moment, potentially leading to various errors. Fix this by using bus_find_device() which returns an actual reference to the found WMI device. Tested on a Dell Inspiron 3505 and a Acer Aspire E1-731. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240103192707.115512-4-W_Armin@gmx.de Signed-off-by: Hans de Goede <hdegoede@redhat.com>
*	platform/x86: wmi: Return immediately if an suitable WMI event is found	Armin Wolf	2024-01-22	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 58f6425eb92f ("WMI: Cater for multiple events with same GUID") allowed legacy WMI notify handlers to be installed for multiple WMI devices with the same GUID. However this is useless since the legacy GUID-based interface is blacklisted from seeing WMI devices with duplicated GUIDs. Return immediately if a suitable WMI event is found in wmi_install/remove_notify_handler() since searching for other suitable events is pointless. Tested on a Dell Inspiron 3505 and a Acer Aspire E1-731. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240103192707.115512-3-W_Armin@gmx.de Signed-off-by: Hans de Goede <hdegoede@redhat.com>
*	platform/x86: wmi: Fix error handling in legacy WMI notify handler functions	Armin Wolf	2024-01-22	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When wmi_install_notify_handler()/wmi_remove_notify_handler() are unable to enable/disable the WMI device, they unconditionally return an error to the caller. When registering legacy WMI notify handlers, this means that the callback remains registered despite wmi_install_notify_handler() having returned an error. When removing legacy WMI notify handlers, this means that the callback is removed despite wmi_remove_notify_handler() having returned an error. Fix this by only warning when the WMI device could not be enabled. This behaviour matches the bus-based WMI interface. Tested on a Dell Inspiron 3505 and a Acer Aspire E1-731. Fixes: 58f6425eb92f ("WMI: Cater for multiple events with same GUID") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240103192707.115512-2-W_Armin@gmx.de Signed-off-by: Hans de Goede <hdegoede@redhat.com>
*	Linux 6.8-rc1v6.8-rc1	Linus Torvalds	2024-01-21	1	-2/+2
\|
*	Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs	Linus Torvalds	2024-01-21	78	-1426/+1629
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull more bcachefs updates from Kent Overstreet: "Some fixes, Some refactoring, some minor features: - Assorted prep work for disk space accounting rewrite - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this makes our trigger context more explicit - A few fixes to avoid excessive transaction restarts on multithreaded workloads: fstests (in addition to ktest tests) are now checking slowpath counters, and that's shaking out a few bugs - Assorted tracepoint improvements - Starting to break up bcachefs_format.h and move on disk types so they're with the code they belong to; this will make room to start documenting the on disk format better. - A few minor fixes" * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits) bcachefs: Improve inode_to_text() bcachefs: logged_ops_format.h bcachefs: reflink_format.h bcachefs; extents_format.h bcachefs: ec_format.h bcachefs: subvolume_format.h bcachefs: snapshot_format.h bcachefs: alloc_background_format.h bcachefs: xattr_format.h bcachefs: dirent_format.h bcachefs: inode_format.h bcachefs; quota_format.h bcachefs: sb-counters_format.h bcachefs: counters.c -> sb-counters.c bcachefs: comment bch_subvolume bcachefs: bch_snapshot::btime bcachefs: add missing __GFP_NOWARN bcachefs: opts->compression can now also be applied in the background bcachefs: Prep work for variable size btree node buffers bcachefs: grab s_umount only if snapshotting ...
\| *	bcachefs: Improve inode_to_text()	Kent Overstreet	2024-01-21	1	-7/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Add line breaks - inode_to_text() is now much easier to read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: logged_ops_format.h	Kent Overstreet	2024-01-21	2	-27/+31
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: reflink_format.h	Kent Overstreet	2024-01-21	3	-47/+48
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs; extents_format.h	Kent Overstreet	2024-01-21	2	-279/+284
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: ec_format.h	Kent Overstreet	2024-01-21	2	-16/+20
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: subvolume_format.h	Kent Overstreet	2024-01-21	2	-32/+36
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: snapshot_format.h	Kent Overstreet	2024-01-21	2	-33/+37
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: alloc_background_format.h	Kent Overstreet	2024-01-21	2	-93/+94
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: xattr_format.h	Kent Overstreet	2024-01-21	2	-15/+20
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: dirent_format.h	Kent Overstreet	2024-01-21	2	-39/+43
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: inode_format.h	Kent Overstreet	2024-01-21	2	-164/+167
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs; quota_format.h	Kent Overstreet	2024-01-21	2	-42/+48
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: sb-counters_format.h	Kent Overstreet	2024-01-21	2	-95/+100
\| \| \| \| \| \| \| \| \| \| \| \|	bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: counters.c -> sb-counters.c	Kent Overstreet	2024-01-21	5	-8/+7
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: comment bch_subvolume	Kent Overstreet	2024-01-21	1	-0/+3
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: bch_snapshot::btime	Kent Overstreet	2024-01-21	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: add missing __GFP_NOWARN	Kent Overstreet	2024-01-21	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: opts->compression can now also be applied in the background	Kent Overstreet	2024-01-21	11	-23/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Prep work for variable size btree node buffers	Kent Overstreet	2024-01-21	18	-97/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: grab s_umount only if snapshotting	Su Yue	2024-01-21	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] * DEADLOCK * [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exit	Su Yue	2024-01-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut. It should be freed by kvfree not kfree. Or umount will triger: [ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008 [ 406.830676 ] #PF: supervisor read access in kernel mode [ 406.831643 ] #PF: error_code(0x0000) - not-present page [ 406.832487 ] PGD 0 P4D 0 [ 406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI [ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90 [ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 406.835796 ] RIP: 0010:kfree+0x62/0x140 [ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6 [ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286 [ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4 [ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000 [ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001 [ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80 [ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000 [ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000 [ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0 [ 406.841464 ] Call Trace: [ 406.841583 ] <TASK> [ 406.841682 ] ? __die+0x1f/0x70 [ 406.841828 ] ? page_fault_oops+0x159/0x470 [ 406.842014 ] ? fixup_exception+0x22/0x310 [ 406.842198 ] ? exc_page_fault+0x1ed/0x200 [ 406.842382 ] ? asm_exc_page_fault+0x22/0x30 [ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs] [ 406.842842 ] ? kfree+0x62/0x140 [ 406.842988 ] ? kfree+0x104/0x140 [ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs] [ 406.843390 ] kobject_put+0xb7/0x170 [ 406.843552 ] deactivate_locked_super+0x2f/0xa0 [ 406.843756 ] cleanup_mnt+0xba/0x150 [ 406.843917 ] task_work_run+0x59/0xa0 [ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0 [ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40 [ 406.844510 ] do_syscall_64+0x4e/0xf0 [ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 406.844907 ] RIP: 0033:0x7f0a2664e4fb Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: bios must be 512 byte algined	Kent Overstreet	2024-01-21	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes: 023f9ac9f70f bcachefs: Delete dio read alignment check Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: remove redundant variable tmp	Colin Ian King	2024-01-21	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The variable tmp is being assigned a value but it isn't being read afterwards. The assignment is redundant and so tmp can be removed. Cleans up clang scan build warning: warning: Although the value stored to 'ret' is used in the enclosing expression, the value is never actually read from 'ret' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Improve trace_trans_restart_relock	Kent Overstreet	2024-01-21	5	-24/+44
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Fix excess transaction restarts in __bchfs_fallocate()	Kent Overstreet	2024-01-21	4	-16/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	drop_locks_do() should not be used in a fastpath without first trying the do in nonblocking mode - the unlock and relock will cause excessive transaction restarts and potentially livelocking with other threads that are contending for the same locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: extents_to_bp_state	Kent Overstreet	2024-01-21	1	-48/+41
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: bkey_and_val_eq()	Kent Overstreet	2024-01-21	1	-3/+8
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Better journal tracepoints	Kent Overstreet	2024-01-21	2	-60/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Factor out bch2_journal_bufs_to_text(), and use it in the journal_entry_full() tracepoint; when we can't get a journal reservation we need to know the outstanding journal entry sizes to know if the problem is due to excessive flushing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Print size of superblock with space allocated	Kent Overstreet	2024-01-21	1	-1/+3
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Avoid flushing the journal in the discard path	Kent Overstreet	2024-01-21	1	-19/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When issuing discards, we may need to flush the journal if there's too many buckets that can't be discarded until a journal flush. But the heuristic was bad; we should be comparing the number of buckets that need to flushes against the number of free buckets, not the number of buckets we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Improve move_extent tracepoint	Kent Overstreet	2024-01-21	5	-7/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also print out the data_opts, so that we can see what specifically is being done to an extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Add missing bch2_moving_ctxt_flush_all()	Kent Overstreet	2024-01-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a bug with rebalance IOs getting stuck with reads completed, but writes never being issued. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Re-add move_extent_write tracepoint	Kent Overstreet	2024-01-21	2	-23/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It appears this was accidentally deleted at some point - also, do a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: bch2_kthread_io_clock_wait() no longer sleeps until full amount	Kent Overstreet	2024-01-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Drop t he loop in bch2_kthread_io_clock_wait(): this allows the code that uses it to be woken up for other reasons, and fixes a bug where rebalance wouldn't wake up when a scan was requested. This raises the possibility of spurious wakeups, but callers should always be able to handle that reasonably well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Add .val_to_text() for KEY_TYPE_cookie	Kent Overstreet	2024-01-21	1	-0/+9
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Don't pass memcmp() as a pointer	Kent Overstreet	2024-01-21	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some (buggy!) compilers have issues with this. Fixes: https://github.com/koverstreet/bcachefs/issues/625 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: Reduce would_deadlock restarts	Kent Overstreet	2024-01-21	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't have to take locks in any particular ordering - we'll make forward progress just fine - but if we try to stick to an ordering, it can help to avoid excessive would_deadlock transaction restarts. This tweaks the reflink path to take extents btree locks in the right order. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
\| *	bcachefs: bch2_trans_account_disk_usage_change()	Kent Overstreet	2024-01-21	3	-29/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The disk space accounting rewrite is splitting out accounting for each replicas set - those are moving to btree keys, instead of percpu counters. This breaks bch2_trans_fs_usage_apply() up, splitting out the part we will still need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>