summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* cpufreq: scmi: Enable boost supportSibi Sankar2024-03-151-1/+19
| | | | | | | | | | | | | Certain platforms host a number of higher OPPs that are exclusive to CPUs within specific CPUfreq policies and not all CPUs within that CPUfreq policy are able to achieve those higher OPPs due to power constraints. These OPPs are marked as turbo in the freq_table and in the presence of such OPPs, let's enable boost by default. Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* firmware: arm_scmi: Add support for marking certain frequencies as turboSibi Sankar2024-03-151-0/+3
| | | | | | | | | | | All opps above the sustained frequency are treated as turbo, so mark them accordingly. Suggested-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* Merge branch 'opp/boost-data' into cpufreq/arm/linux-nextViresh Kumar2024-03-152-0/+3
|\
| * OPP: Extend dev_pm_opp_data with turbo supportSibi Sankar2024-03-112-0/+3
| | | | | | | | | | | | | | | | Let's extend the dev_pm_opp_data with a turbo variable, to allow users to specify if it's a boost frequency for a dynamically added OPP. Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: dt: always allocate zeroed cpumaskMarek Szyprowski2024-03-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0499a78369ad ("ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512") changed the handling of cpumasks on ARM 64bit, what resulted in the strange issues and warnings during cpufreq-dt initialization on some big.LITTLE platforms. This was caused by mixing OPPs between big and LITTLE cores, because OPP-sharing information between big and LITTLE cores is computed on cpumask, which in turn was not zeroed on allocation. Fix this by switching to zalloc_cpumask_var() call. Fixes: dc279ac6e5b4 ("cpufreq: dt: Refactor initialization to handle probe deferral properly") CC: stable@vger.kernel.org # v5.10+ Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: scmi: Set transition_delay_usPierre Gondois2024-03-061-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | Make use of the newly added callbacks: - rate_limit_get() - fast_switch_rate_limit() to populate policies's `transition_delay_us`, defined as the 'Preferred average time interval between consecutive invocations of the driver to set the frequency for this policy.' Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | firmware: arm_scmi: Populate fast channel rate_limitPierre Gondois2024-03-065-10/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arm SCMI spec. v3.2, s4.5.3.12 PERFORMANCE_DESCRIBE_FASTCHANNEL defines a per-domain rate_limit for performance requests: """ Rate Limit in microseconds, indicating the minimum time required between successive requests. A value of 0 indicates that this field is not applicable or supported on the platform. """" The field is first defined in SCMI v2.0. Add support to fetch this value and advertise it through a fast_switch_rate_limit() callback. Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | firmware: arm_scmi: Populate perf commands rate_limitPierre Gondois2024-03-062-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arm SCMI spec. v3.2, s4.5.3.4 PERFORMANCE_DOMAIN_ATTRIBUTES defines a per-domain rate_limit for performance requests: """ Rate Limit in microseconds, indicating the minimum time required between successive requests. A value of 0 indicates that this field is not supported by the platform. This field does not apply to FastChannels. """" The field is first defined in SCMI v1.0. Add support to fetch this value and advertise it through a rate_limit_get() callback. Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: qcom-hw: add CONFIG_COMMON_CLK dependencyArnd Bergmann2024-02-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is still possible to compile-test a kernel without CONFIG_COMMON_CLK for some ancient ARM boards or other architectures, but this causes a link failure in the qcom-cpufreq-hw driver: ERROR: modpost: "devm_clk_hw_register" [drivers/cpufreq/qcom-cpufreq-hw.ko] undefined! ERROR: modpost: "devm_of_clk_add_hw_provider" [drivers/cpufreq/qcom-cpufreq-hw.ko] undefined! ERROR: modpost: "of_clk_hw_onecell_get" [drivers/cpufreq/qcom-cpufreq-hw.ko] undefined! Add a Kconfig dependency here to make sure this always work. Apparently this bug has been in the kernel for a while without me running into it on randconfig builds as COMMON_CLK is almost always enabled. I have cross-checked by building an allmodconfig kernel with COMMON_CLK disabled, which showed no other driver having this problem. Fixes: 4370232c727b ("cpufreq: qcom-hw: Add CPU clock provider support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: dt-platdev: block SDM670 in cpufreq-dt-platdevRichard Acayan2024-02-151-0/+1
| | | | | | | | | | | | | | | | | | The Snapdragon 670 uses the Qualcomm driver for CPU frequency scaling. Block this driver from loading on it so the driver does not pollute dmesg with an error. Signed-off-by: Richard Acayan <mailingradian@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: mediatek-hw: Don't error out if supply is not foundNícolas F. R. A. Prado2024-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | devm_regulator_get_optional() returns -ENODEV if no supply can be found. By introducing its usage, commit 788715b5f21c ("cpufreq: mediatek-hw: Wait for CPU supplies before probing") caused the driver to fail probe if no supply was present in any of the CPU DT nodes. Use devm_regulator_get() instead since the CPUs do require supplies even if not described in the DT. It will gracefully return a dummy regulator if none is found in the DT node, allowing probe to succeed. Fixes: 788715b5f21c ("cpufreq: mediatek-hw: Wait for CPU supplies before probing") Reported-by: kernelci.org bot <bot@kernelci.org> Closes: https://linux.kernelci.org/test/case/id/65b0b169710edea22852a3fa/ Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | Documentation: power: Use kcalloc() instead of kzalloc()Erick Archer2024-01-232-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, in the example code use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function. At the same time, modify the translations accordingly. Signed-off-by: Erick Archer <erick.archer@gmx.com> Reviewed-by: Hu Haowen <2023002089@link.tyut.edu.cn> Reviewed-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Hu Haowen <2023002089@link.tyut.edu.cn> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: mediatek-hw: Wait for CPU supplies before probingNícolas F. R. A. Prado2024-01-231-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before proceeding with the probe and enabling frequency scaling for the CPUs, make sure that all supplies feeding the CPUs have probed. This fixes an issue observed on MT8195-Tomato where if the mediatek-cpufreq-hw driver enabled the hardware (by writing to REG_FREQ_ENABLE) before the SPMI controller driver (spmi-mtk-pmif), behind which lies the big CPU supply, probed the platform would hang shortly after with "rcu: INFO: rcu_preempt detected stalls on CPUs/tasks" being printed in the log. Fixes: 4855e26bcf4d ("cpufreq: mediatek-hw: Add support for CPUFREQ HW") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: brcmstb-avs-cpufreq: add check for cpufreq_cpu_get's return valueAnastasia Belova2024-01-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | cpufreq_cpu_get may return NULL. To avoid NULL-dereference check it and return 0 in case of error. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: de322e085995 ("cpufreq: brcmstb-avs-cpufreq: AVS CPUfreq driver for Broadcom STB SoCs") Signed-off-by: Anastasia Belova <abelova@astralinux.ru> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: imx6: use regmap to read ocotp registertianyu22024-01-231-30/+15
|/ | | | | | | | | Reading the ocotp register directly is unsafe and will cause the system to hang if its clock is not turned on in CCM. The regmap interface has clk enabled, which can solve this problem. Signed-off-by: tianyu2 <tianyu2@kernelsoft.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* Linux 6.8-rc1v6.8-rc1Linus Torvalds2024-01-211-2/+2
|
* Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefsLinus Torvalds2024-01-2178-1426/+1629
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more bcachefs updates from Kent Overstreet: "Some fixes, Some refactoring, some minor features: - Assorted prep work for disk space accounting rewrite - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this makes our trigger context more explicit - A few fixes to avoid excessive transaction restarts on multithreaded workloads: fstests (in addition to ktest tests) are now checking slowpath counters, and that's shaking out a few bugs - Assorted tracepoint improvements - Starting to break up bcachefs_format.h and move on disk types so they're with the code they belong to; this will make room to start documenting the on disk format better. - A few minor fixes" * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits) bcachefs: Improve inode_to_text() bcachefs: logged_ops_format.h bcachefs: reflink_format.h bcachefs; extents_format.h bcachefs: ec_format.h bcachefs: subvolume_format.h bcachefs: snapshot_format.h bcachefs: alloc_background_format.h bcachefs: xattr_format.h bcachefs: dirent_format.h bcachefs: inode_format.h bcachefs; quota_format.h bcachefs: sb-counters_format.h bcachefs: counters.c -> sb-counters.c bcachefs: comment bch_subvolume bcachefs: bch_snapshot::btime bcachefs: add missing __GFP_NOWARN bcachefs: opts->compression can now also be applied in the background bcachefs: Prep work for variable size btree node buffers bcachefs: grab s_umount only if snapshotting ...
| * bcachefs: Improve inode_to_text()Kent Overstreet2024-01-211-7/+18
| | | | | | | | | | | | Add line breaks - inode_to_text() is now much easier to read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: logged_ops_format.hKent Overstreet2024-01-212-27/+31
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: reflink_format.hKent Overstreet2024-01-213-47/+48
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs; extents_format.hKent Overstreet2024-01-212-279/+284
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: ec_format.hKent Overstreet2024-01-212-16/+20
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: subvolume_format.hKent Overstreet2024-01-212-32/+36
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: snapshot_format.hKent Overstreet2024-01-212-33/+37
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: alloc_background_format.hKent Overstreet2024-01-212-93/+94
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: xattr_format.hKent Overstreet2024-01-212-15/+20
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: dirent_format.hKent Overstreet2024-01-212-39/+43
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: inode_format.hKent Overstreet2024-01-212-164/+167
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs; quota_format.hKent Overstreet2024-01-212-42/+48
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: sb-counters_format.hKent Overstreet2024-01-212-95/+100
| | | | | | | | | | | | bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: counters.c -> sb-counters.cKent Overstreet2024-01-215-8/+7
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: comment bch_subvolumeKent Overstreet2024-01-211-0/+3
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bch_snapshot::btimeKent Overstreet2024-01-212-0/+3
| | | | | | | | | | | | | | Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: add missing __GFP_NOWARNKent Overstreet2024-01-211-1/+1
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: opts->compression can now also be applied in the backgroundKent Overstreet2024-01-2111-23/+24
| | | | | | | | | | | | | | | | | | The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Prep work for variable size btree node buffersKent Overstreet2024-01-2118-97/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: grab s_umount only if snapshottingSu Yue2024-01-211-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exitSu Yue2024-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut. It should be freed by kvfree not kfree. Or umount will triger: [ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008 [ 406.830676 ] #PF: supervisor read access in kernel mode [ 406.831643 ] #PF: error_code(0x0000) - not-present page [ 406.832487 ] PGD 0 P4D 0 [ 406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI [ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90 [ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 406.835796 ] RIP: 0010:kfree+0x62/0x140 [ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6 [ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286 [ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4 [ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000 [ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001 [ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80 [ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000 [ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000 [ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0 [ 406.841464 ] Call Trace: [ 406.841583 ] <TASK> [ 406.841682 ] ? __die+0x1f/0x70 [ 406.841828 ] ? page_fault_oops+0x159/0x470 [ 406.842014 ] ? fixup_exception+0x22/0x310 [ 406.842198 ] ? exc_page_fault+0x1ed/0x200 [ 406.842382 ] ? asm_exc_page_fault+0x22/0x30 [ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs] [ 406.842842 ] ? kfree+0x62/0x140 [ 406.842988 ] ? kfree+0x104/0x140 [ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs] [ 406.843390 ] kobject_put+0xb7/0x170 [ 406.843552 ] deactivate_locked_super+0x2f/0xa0 [ 406.843756 ] cleanup_mnt+0xba/0x150 [ 406.843917 ] task_work_run+0x59/0xa0 [ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0 [ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40 [ 406.844510 ] do_syscall_64+0x4e/0xf0 [ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 406.844907 ] RIP: 0033:0x7f0a2664e4fb Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bios must be 512 byte alginedKent Overstreet2024-01-211-0/+4
| | | | | | | | | | | | Fixes: 023f9ac9f70f bcachefs: Delete dio read alignment check Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: remove redundant variable tmpColin Ian King2024-01-211-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The variable tmp is being assigned a value but it isn't being read afterwards. The assignment is redundant and so tmp can be removed. Cleans up clang scan build warning: warning: Although the value stored to 'ret' is used in the enclosing expression, the value is never actually read from 'ret' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Improve trace_trans_restart_relockKent Overstreet2024-01-215-24/+44
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Fix excess transaction restarts in __bchfs_fallocate()Kent Overstreet2024-01-214-16/+35
| | | | | | | | | | | | | | | | | | drop_locks_do() should not be used in a fastpath without first trying the do in nonblocking mode - the unlock and relock will cause excessive transaction restarts and potentially livelocking with other threads that are contending for the same locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: extents_to_bp_stateKent Overstreet2024-01-211-48/+41
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: bkey_and_val_eq()Kent Overstreet2024-01-211-3/+8
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Better journal tracepointsKent Overstreet2024-01-212-60/+79
| | | | | | | | | | | | | | | | | | Factor out bch2_journal_bufs_to_text(), and use it in the journal_entry_full() tracepoint; when we can't get a journal reservation we need to know the outstanding journal entry sizes to know if the problem is due to excessive flushing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Print size of superblock with space allocatedKent Overstreet2024-01-211-1/+3
| | | | | | | | Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Avoid flushing the journal in the discard pathKent Overstreet2024-01-211-19/+41
| | | | | | | | | | | | | | | | | | | | | | When issuing discards, we may need to flush the journal if there's too many buckets that can't be discarded until a journal flush. But the heuristic was bad; we should be comparing the number of buckets that need to flushes against the number of free buckets, not the number of buckets we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Improve move_extent tracepointKent Overstreet2024-01-215-7/+48
| | | | | | | | | | | | | | Also print out the data_opts, so that we can see what specifically is being done to an extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Add missing bch2_moving_ctxt_flush_all()Kent Overstreet2024-01-211-0/+1
| | | | | | | | | | | | | | This fixes a bug with rebalance IOs getting stuck with reads completed, but writes never being issued. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
| * bcachefs: Re-add move_extent_write tracepointKent Overstreet2024-01-212-23/+20
| | | | | | | | | | | | | | It appears this was accidentally deleted at some point - also, do a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>