| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3d32aaa7e66d5c1479a3c31d6c2c5d45dd0d3b89 upstream.
syzkaller found the following problematic rwsem locking (with write
lock already held):
down_read+0x9d/0x450 kernel/locking/rwsem.c:1509
dm_get_inactive_table+0x2b/0xc0 drivers/md/dm-ioctl.c:773
__dev_status+0x4fd/0x7c0 drivers/md/dm-ioctl.c:844
table_clear+0x197/0x280 drivers/md/dm-ioctl.c:1537
In table_clear, it first acquires a write lock
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L1520
down_write(&_hash_lock);
Then before the lock is released at L1539, there is a path shown above:
table_clear -> __dev_status -> dm_get_inactive_table -> down_read
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L773
down_read(&_hash_lock);
It tries to acquire the same read lock again, resulting in the deadlock
problem.
Fix this by moving table_clear()'s __dev_status() call to after its
up_write(&_hash_lock);
Cc: stable@vger.kernel.org
Reported-by: Zheng Zhang <zheng.zhang@email.ucr.edu>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 98dba02d9a93eec11bffbb93c7c51624290702d2 upstream.
This command will crash with NULL pointer dereference:
dmsetup create flakey --table \
"0 `blockdev --getsize /dev/ram0` flakey /dev/ram0 0 0 1 2 corrupt_bio_byte 512"
Fix the crash by checking if arg_name is non-NULL before comparing it.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
| |
commit 6b79a428c02769f2a11f8ae76bf866226d134887 upstream.
Otherwise the journal_io_cache will leak if dm_register_target() fails.
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 26208a7cffd0c7cbf14237ccd20c7270b3ffeb7e ]
raid10_sync_request() will add 'r10bio->remaining' for both rdev and
replacement rdev. However, if the read io fails, recovery_request_write()
returns without issuing the write io, in this case, end_sync_request()
is only called once and 'remaining' is leaked, cause an io hang.
Fix the problem by decreasing 'remaining' according to if 'bio' and
'repl_bio' is valid.
Fixes: 24afd80d99f8 ("md/raid10: handle recovery of replacement devices.")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230310073855.1337560-5-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3bc57292278a0b6ac4656cad94c14f2453344b57 ]
slot_store() uses kstrtouint() to get a slot number, but stores the
result in an "int" variable (by casting a pointer).
This can result in a negative slot number if the unsigned int value is
very large.
A negative number means that the slot is empty, but setting a negative
slot number this way will not remove the device from the array. I don't
think this is a serious problem, but it could cause confusion and it is
best to fix it.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit fb294b1c0ba982144ca467a75e7d01ff26304e2b upstream.
The loop in dmcrypt_write may be running for unbounded amount of time,
thus we need cond_resched() in it.
This commit fixes the following warning:
[ 3391.153255][ C12] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [dmcrypt_write/2:2897]
...
[ 3391.387210][ C12] Call trace:
[ 3391.390338][ C12] blk_attempt_bio_merge.part.6+0x38/0x158
[ 3391.395970][ C12] blk_attempt_plug_merge+0xc0/0x1b0
[ 3391.401085][ C12] blk_mq_submit_bio+0x398/0x550
[ 3391.405856][ C12] submit_bio_noacct+0x308/0x380
[ 3391.410630][ C12] dmcrypt_write+0x1e4/0x208 [dm_crypt]
[ 3391.416005][ C12] kthread+0x130/0x138
[ 3391.419911][ C12] ret_from_fork+0x10/0x18
Reported-by: yangerkun <yangerkun@huawei.com>
Fixes: dc2676210c42 ("dm crypt: offload writes to thread")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d3aa3e060c4a80827eb801fc448debc9daa7c46b upstream.
Check alloc_precpu()'s return value and return an error from
dm_stats_init() if it fails. Update alloc_dev() to fail if
dm_stats_init() does.
Otherwise, a NULL pointer dereference will occur in dm_stats_cleanup()
even if dm-stats isn't being actively used.
Fixes: fd2ed4d25270 ("dm: add statistics support")
Cc: stable@vger.kernel.org
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit f50714b57aecb6b3dc81d578e295f86d9c73f078 upstream.
When we need to zero some range on a block device, the function
__blkdev_issue_zero_pages submits a write bio with the bio vector pointing
to the zero page. If we use dm-flakey with corrupt bio writes option, it
will corrupt the content of the zero page which results in crashes of
various userspace programs. Glibc assumes that memory returned by mmap is
zeroed and it uses it for calloc implementation; if the newly mapped
memory is not zeroed, calloc will return non-zeroed memory.
Fix this bug by testing if the page is equal to ZERO_PAGE(0) and
avoiding the corruption in this case.
Cc: stable@vger.kernel.org
Fixes: a00f5276e266 ("dm flakey: Properly corrupt multi-page bios.")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit aa56b9b75996ff4c76a0a4181c2fa0206c3d91cc upstream.
If "corrupt_bio_byte" is set to corrupt reads and corrupt_bio_flags is
used, dm-flakey would erroneously return all writes as errors. Likewise,
if "corrupt_bio_byte" is set to corrupt writes, dm-flakey would return
errors for all reads.
Fix the logic so that if fc->corrupt_bio_byte is non-zero, dm-flakey
will not abort reads on writes with an error.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 76227f6dc805e9e960128bcc6276647361e0827c ]
Otherwise on resource constrained systems these workqueues may be too
greedy.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e4f80303c2353952e6e980b23914e4214487f2a6 ]
Otherwise on resource constrained systems these workqueues may be too
greedy.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 0b22ff5360f5c4e11050b89206370fdf7dc0a226 ]
Commit acfe0ad74d2e1 ("dm: allocate a special workqueue for deferred
device removal") switched from using system workqueue to a single
workqueue local to DM. But it didn't eliminate the call to
flush_scheduled_work() that was introduced purely for the benefit of
deferred device removal with commit 2c140a246dc ("dm: allow remove to
be deferred").
Since DM core uses its own workqueue (and queue_work) there is no need
to call flush_scheduled_work() from local_exit(). local_exit()'s
destroy_workqueue(deferred_remove_workqueue) handles flushing work
started with queue_work().
Fixes: acfe0ad74d2e1 ("dm: allocate a special workqueue for deferred device removal")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 6b9973861cb2e96dcd0bb0f1baddc5c034207c5c upstream.
Otherwise the commit that will be aborted will be associated with the
metadata objects that will be torn down. Must write needs_check flag
to metadata with a reset block manager.
Found through code-inspection (and compared against dm-thin.c).
Cc: stable@vger.kernel.org
Fixes: 028ae9f76f29 ("dm cache: add fail io mode and needs_check flag")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 6a459d8edbdbe7b24db42a5a9f21e6aa9e00c2aa upstream.
Dm_cache also has the same UAF problem when dm_resume()
and dm_destroy() are concurrent.
Therefore, cancelling timer again in destroy().
Cc: stable@vger.kernel.org
Fixes: c6b4fcbad044e ("dm: add cache target")
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 88430ebcbc0ec637b710b947738839848c20feff upstream.
When dm_resume() and dm_destroy() are concurrent, it will
lead to UAF, as follows:
BUG: KASAN: use-after-free in __run_timers+0x173/0x710
Write of size 8 at addr ffff88816d9490f0 by task swapper/0/0
<snip>
Call Trace:
<IRQ>
dump_stack_lvl+0x73/0x9f
print_report.cold+0x132/0xaa2
_raw_spin_lock_irqsave+0xcd/0x160
__run_timers+0x173/0x710
kasan_report+0xad/0x110
__run_timers+0x173/0x710
__asan_store8+0x9c/0x140
__run_timers+0x173/0x710
call_timer_fn+0x310/0x310
pvclock_clocksource_read+0xfa/0x250
kvm_clock_read+0x2c/0x70
kvm_clock_get_cycles+0xd/0x20
ktime_get+0x5c/0x110
lapic_next_event+0x38/0x50
clockevents_program_event+0xf1/0x1e0
run_timer_softirq+0x49/0x90
__do_softirq+0x16e/0x62c
__irq_exit_rcu+0x1fa/0x270
irq_exit_rcu+0x12/0x20
sysvec_apic_timer_interrupt+0x8e/0xc0
One of the concurrency UAF can be shown as below:
use free
do_resume |
__find_device_hash_cell |
dm_get |
atomic_inc(&md->holders) |
| dm_destroy
| __dm_destroy
| if (!dm_suspended_md(md))
| atomic_read(&md->holders)
| msleep(1)
dm_resume |
__dm_resume |
dm_table_resume_targets |
pool_resume |
do_waker #add delay work |
dm_put |
atomic_dec(&md->holders) |
| dm_table_destroy
| pool_dtr
| __pool_dec
| __pool_destroy
| destroy_workqueue
| kfree(pool) # free pool
time out
__do_softirq
run_timer_softirq # pool has already been freed
This can be easily reproduced using:
1. create thin-pool
2. dmsetup suspend pool
3. dmsetup resume pool
4. dmsetup remove_all # Concurrent with 3
The root cause of this UAF bug is that dm_resume() adds timer after
dm_destroy() skips cancelling the timer because of suspend status.
After timeout, it will call run_timer_softirq(), however pool has
already been freed. The concurrency UAF bug will happen.
Therefore, cancelling timer again in __pool_destroy().
Cc: stable@vger.kernel.org
Fixes: 991d9fa02da0d ("dm: add thin provisioning target")
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 7991dbff6849f67e823b7cc0c15e5a90b0549b9f upstream.
Recently we found a softlock up problem in dm thin pool btree lookup
code due to corrupted metadata:
Kernel panic - not syncing: softlockup: hung tasks
CPU: 7 PID: 2669225 Comm: kworker/u16:3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: dm-thin do_worker [dm_thin_pool]
Call Trace:
<IRQ>
dump_stack+0x9c/0xd3
panic+0x35d/0x6b9
watchdog_timer_fn.cold+0x16/0x25
__run_hrtimer+0xa2/0x2d0
</IRQ>
RIP: 0010:__relink_lru+0x102/0x220 [dm_bufio]
__bufio_new+0x11f/0x4f0 [dm_bufio]
new_read+0xa3/0x1e0 [dm_bufio]
dm_bm_read_lock+0x33/0xd0 [dm_persistent_data]
ro_step+0x63/0x100 [dm_persistent_data]
btree_lookup_raw.constprop.0+0x44/0x220 [dm_persistent_data]
dm_btree_lookup+0x16f/0x210 [dm_persistent_data]
dm_thin_find_block+0x12c/0x210 [dm_thin_pool]
__process_bio_read_only+0xc5/0x400 [dm_thin_pool]
process_thin_deferred_bios+0x1a4/0x4a0 [dm_thin_pool]
process_one_work+0x3c5/0x730
Following process may generate a broken btree mixed with fresh and
stale btree nodes, which could get dm thin trapped in an infinite loop
while looking up data block:
Transaction 1: pmd->root = A, A->B->C // One path in btree
pmd->root = X, X->Y->Z // Copy-up
Transaction 2: X,Z is updated on disk, Y write failed.
// Commit failed, dm thin becomes read-only.
process_bio_read_only
dm_thin_find_block
__find_block
dm_btree_lookup(pmd->root)
The pmd->root points to a broken btree, Y may contain stale node
pointing to any block, for example X, which gets dm thin trapped into
a dead loop while looking up Z.
Fix this by setting pmd->root in __open_metadata(), so that dm thin
will use the last transaction's pmd->root if commit failed.
Fetch a reproducer in [Link].
Linke: https://bugzilla.kernel.org/show_bug.cgi?id=216790
Cc: stable@vger.kernel.org
Fixes: 991d9fa02da0 ("dm: add thin provisioning target")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 352b837a5541690d4f843819028cf2b8be83d424 upstream.
Same ABBA deadlock pattern fixed in commit 4b60f452ec51 ("dm thin: Fix
ABBA deadlock between shrink_slab and dm_pool_abort_metadata") to
DM-cache's metadata.
Reported-by: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: stable@vger.kernel.org
Fixes: 028ae9f76f29 ("dm cache: add fail io mode and needs_check flag")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 341097ee53573e06ab9fc675d96a052385b851fa upstream.
There's a crash in mempool_free when running the lvm test
shell/lvchange-rebuild-raid.sh.
The reason for the crash is this:
* super_written calls atomic_dec_and_test(&mddev->pending_writes) and
wake_up(&mddev->sb_wait). Then it calls rdev_dec_pending(rdev, mddev)
and bio_put(bio).
* so, the process that waited on sb_wait and that is woken up is racing
with bio_put(bio).
* if the process wins the race, it calls bioset_exit before bio_put(bio)
is executed.
* bio_put(bio) attempts to free a bio into a destroyed bio set - causing
a crash in mempool_free.
We fix this bug by moving bio_put before atomic_dec_and_test.
We also move rdev_dec_pending before atomic_dec_and_test as suggested by
Neil Brown.
The function md_end_flush has a similar bug - we must call bio_put before
we decrement the number of in-progress bios.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 11557f0067 P4D 11557f0067 PUD 0
Oops: 0002 [#1] PREEMPT SMP
CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Workqueue: kdelayd flush_expired_bios [dm_delay]
RIP: 0010:mempool_free+0x47/0x80
Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00
RSP: 0018:ffff88910036bda8 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8
RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900
R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000
R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05
FS: 0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0
Call Trace:
<TASK>
clone_endio+0xf4/0x1c0 [dm_mod]
clone_endio+0xf4/0x1c0 [dm_mod]
__submit_bio+0x76/0x120
submit_bio_noacct_nocheck+0xb6/0x2a0
flush_expired_bios+0x28/0x2f [dm_delay]
process_one_work+0x1b4/0x300
worker_thread+0x45/0x3e0
? rescuer_thread+0x380/0x380
kthread+0xc2/0x100
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd]
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit b611ad14006e5be2170d9e8e611bf49dff288911 ]
fail run raid1 array when we assemble array with the inactive disk only,
but the mdx_raid1 thread were not stop, Even if the associated resources
have been released. it will caused a NULL dereference when we do poweroff.
This causes the following Oops:
[ 287.587787] BUG: kernel NULL pointer dereference, address: 0000000000000070
[ 287.594762] #PF: supervisor read access in kernel mode
[ 287.599912] #PF: error_code(0x0000) - not-present page
[ 287.605061] PGD 0 P4D 0
[ 287.607612] Oops: 0000 [#1] SMP NOPTI
[ 287.611287] CPU: 3 PID: 5265 Comm: md0_raid1 Tainted: G U 5.10.146 #0
[ 287.619029] Hardware name: xxxxxxx/To be filled by O.E.M, BIOS 5.19 06/16/2022
[ 287.626775] RIP: 0010:md_check_recovery+0x57/0x500 [md_mod]
[ 287.632357] Code: fe 01 00 00 48 83 bb 10 03 00 00 00 74 08 48 89 ......
[ 287.651118] RSP: 0018:ffffc90000433d78 EFLAGS: 00010202
[ 287.656347] RAX: 0000000000000000 RBX: ffff888105986800 RCX: 0000000000000000
[ 287.663491] RDX: ffffc90000433bb0 RSI: 00000000ffffefff RDI: ffff888105986800
[ 287.670634] RBP: ffffc90000433da0 R08: 0000000000000000 R09: c0000000ffffefff
[ 287.677771] R10: 0000000000000001 R11: ffffc90000433ba8 R12: ffff888105986800
[ 287.684907] R13: 0000000000000000 R14: fffffffffffffe00 R15: ffff888100b6b500
[ 287.692052] FS: 0000000000000000(0000) GS:ffff888277f80000(0000) knlGS:0000000000000000
[ 287.700149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 287.705897] CR2: 0000000000000070 CR3: 000000000320a000 CR4: 0000000000350ee0
[ 287.713033] Call Trace:
[ 287.715498] raid1d+0x6c/0xbbb [raid1]
[ 287.719256] ? __schedule+0x1ff/0x760
[ 287.722930] ? schedule+0x3b/0xb0
[ 287.726260] ? schedule_timeout+0x1ed/0x290
[ 287.730456] ? __switch_to+0x11f/0x400
[ 287.734219] md_thread+0xe9/0x140 [md_mod]
[ 287.738328] ? md_thread+0xe9/0x140 [md_mod]
[ 287.742601] ? wait_woken+0x80/0x80
[ 287.746097] ? md_register_thread+0xe0/0xe0 [md_mod]
[ 287.751064] kthread+0x11a/0x140
[ 287.754300] ? kthread_park+0x90/0x90
[ 287.757974] ret_from_fork+0x1f/0x30
In fact, when raid1 array run fail, we need to do
md_unregister_thread() before raid1_free().
Signed-off-by: Jiang Li <jiang.li@ugreen.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 4fe1ec995483737f3d2a14c3fe1d8fe634972979 upstream.
__list_versions will first estimate the required space using the
"dm_target_iterate(list_version_get_needed, &needed)" call and then will
fill the space using the "dm_target_iterate(list_version_get_info,
&iter_info)" call. Each of these calls locks the targets using the
"down_read(&_lock)" and "up_read(&_lock)" calls, however between the first
and second "dm_target_iterate" there is no lock held and the target
modules can be loaded at this point, so the second "dm_target_iterate"
call may need more space than what was the first "dm_target_iterate"
returned.
The code tries to handle this overflow (see the beginning of
list_version_get_info), however this handling is incorrect.
The code sets "param->data_size = param->data_start + needed" and
"iter_info.end = (char *)vers+len" - "needed" is the size returned by the
first dm_target_iterate call; "len" is the size of the buffer allocated by
userspace.
"len" may be greater than "needed"; in this case, the code will write up
to "len" bytes into the buffer, however param->data_size is set to
"needed", so it may write data past the param->data_size value. The ioctl
interface copies only up to param->data_size into userspace, thus part of
the result will be truncated.
Fix this bug by setting "iter_info.end = (char *)vers + needed;" - this
guarantees that the second "dm_target_iterate" call will write only up to
the "needed" buffer and it will exit with "DM_BUFFER_FULL_FLAG" if it
overflows the "needed" space - in this case, userspace will allocate a
larger buffer and retry.
Note that there is also a bug in list_version_get_needed - we need to add
"strlen(tt->name) + 1" to the needed size, not "strlen(tt->name)".
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 1727fd5015d8f93474148f94e34cda5aa6ad4a43 upstream.
Current code produces a warning as shown below when total characters
in the constituent block device names plus the slashes exceeds 200.
snprintf() returns the number of characters generated from the given
input, which could cause the expression “200 – len” to wrap around
to a large positive number. Fix this by using scnprintf() instead,
which returns the actual number of characters written into the buffer.
[ 1513.267938] ------------[ cut here ]------------
[ 1513.267943] WARNING: CPU: 15 PID: 37247 at <snip>/lib/vsprintf.c:2509 vsnprintf+0x2c8/0x510
[ 1513.267944] Modules linked in: <snip>
[ 1513.267969] CPU: 15 PID: 37247 Comm: mdadm Not tainted 5.4.0-1085-azure #90~18.04.1-Ubuntu
[ 1513.267969] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022
[ 1513.267971] RIP: 0010:vsnprintf+0x2c8/0x510
<-snip->
[ 1513.267982] Call Trace:
[ 1513.267986] snprintf+0x45/0x70
[ 1513.267990] ? disk_name+0x71/0xa0
[ 1513.267993] dump_zones+0x114/0x240 [raid0]
[ 1513.267996] ? _cond_resched+0x19/0x40
[ 1513.267998] raid0_run+0x19e/0x270 [raid0]
[ 1513.268000] md_run+0x5e0/0xc50
[ 1513.268003] ? security_capable+0x3f/0x60
[ 1513.268005] do_md_run+0x19/0x110
[ 1513.268006] md_ioctl+0x195e/0x1f90
[ 1513.268007] blkdev_ioctl+0x91f/0x9f0
[ 1513.268010] block_ioctl+0x3d/0x50
[ 1513.268012] do_vfs_ioctl+0xa9/0x640
[ 1513.268014] ? __fput+0x162/0x260
[ 1513.268016] ksys_ioctl+0x75/0x80
[ 1513.268017] __x64_sys_ioctl+0x1a/0x20
[ 1513.268019] do_syscall_64+0x5e/0x200
[ 1513.268021] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 766038846e875 ("md/raid0: replace printk() with pr_*()")
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 5e2cf333b7bd5d3e62595a44d598a254c697cd74 ]
A complicated deadlock exists when using the journal and an elevated
group_thrtead_cnt. It was found with loop devices, but its not clear
whether it can be seen with real disks. The deadlock can occur simply
by writing data with an fio script.
When the deadlock occurs, multiple threads will hang in different ways:
1) The group threads will hang in the blk-wbt code with bios waiting to
be submitted to the block layer:
io_schedule+0x70/0xb0
rq_qos_wait+0x153/0x210
wbt_wait+0x115/0x1b0
io_schedule+0x70/0xb0
rq_qos_wait+0x153/0x210
wbt_wait+0x115/0x1b0
__rq_qos_throttle+0x38/0x60
blk_mq_submit_bio+0x589/0xcd0
wbt_wait+0x115/0x1b0
__rq_qos_throttle+0x38/0x60
blk_mq_submit_bio+0x589/0xcd0
__submit_bio+0xe6/0x100
submit_bio_noacct_nocheck+0x42e/0x470
submit_bio_noacct+0x4c2/0xbb0
ops_run_io+0x46b/0x1a30
handle_stripe+0xcd3/0x36b0
handle_active_stripes.constprop.0+0x6f6/0xa60
raid5_do_work+0x177/0x330
Or:
io_schedule+0x70/0xb0
rq_qos_wait+0x153/0x210
wbt_wait+0x115/0x1b0
__rq_qos_throttle+0x38/0x60
blk_mq_submit_bio+0x589/0xcd0
__submit_bio+0xe6/0x100
submit_bio_noacct_nocheck+0x42e/0x470
submit_bio_noacct+0x4c2/0xbb0
flush_deferred_bios+0x136/0x170
raid5_do_work+0x262/0x330
2) The r5l_reclaim thread will hang in the same way, submitting a
bio to the block layer:
io_schedule+0x70/0xb0
rq_qos_wait+0x153/0x210
wbt_wait+0x115/0x1b0
__rq_qos_throttle+0x38/0x60
blk_mq_submit_bio+0x589/0xcd0
__submit_bio+0xe6/0x100
submit_bio_noacct_nocheck+0x42e/0x470
submit_bio_noacct+0x4c2/0xbb0
submit_bio+0x3f/0xf0
md_super_write+0x12f/0x1b0
md_update_sb.part.0+0x7c6/0xff0
md_update_sb+0x30/0x60
r5l_do_reclaim+0x4f9/0x5e0
r5l_reclaim_thread+0x69/0x30b
However, before hanging, the MD_SB_CHANGE_PENDING flag will be
set for sb_flags in r5l_write_super_and_discard_space(). This
flag will never be cleared because the submit_bio() call never
returns.
3) Due to the MD_SB_CHANGE_PENDING flag being set, handle_stripe()
will do no processing on any pending stripes and re-set
STRIPE_HANDLE. This will cause the raid5d thread to enter an
infinite loop, constantly trying to handle the same stripes
stuck in the queue.
The raid5d thread has a blk_plug that holds a number of bios
that are also stuck waiting seeing the thread is in a loop
that never schedules. These bios have been accounted for by
blk-wbt thus preventing the other threads above from
continuing when they try to submit bios. --Deadlock.
To fix this, add the same wait_event() that is used in raid5_do_work()
to raid5d() such that if MD_SB_CHANGE_PENDING is set, the thread will
schedule and wait until the flag is cleared. The schedule action will
flush the plug which will allow the r5l_reclaim thread to continue,
thus preventing the deadlock.
However, md_check_recovery() calls can also clear MD_SB_CHANGE_PENDING
from the same thread and can thus deadlock if the thread is put to
sleep. So avoid waiting if md_check_recovery() is being called in the
loop.
It's not clear when the deadlock was introduced, but the similar
wait_event() call in raid5_do_work() was added in 2017 by this
commit:
16d997b78b15 ("md/raid5: simplfy delaying of writes while metadata
is updated.")
Link: https://lore.kernel.org/r/7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e2eed85bc75138a9eeb63863d20f8904ac42a577 ]
When doing degrade/recover tests using the journal a kernel BUG
is hit at drivers/md/raid5.c:4381 in handle_parity_checks5():
BUG_ON(!test_bit(R5_UPTODATE, &dev->flags));
This was found to occur because handle_stripe_fill() was skipped
for stripes in the journal due to a condition in that function.
Thus blocks were not fetched and R5_UPTODATE was not set when
the code reached handle_parity_checks5().
To fix this, don't skip handle_stripe_fill() unless the stripe is
for read.
Fixes: 07e83364845e ("md/r5cache: shift complex rmw from read path to write path")
Link: https://lore.kernel.org/linux-raid/e05c4239-41a9-d2f7-3cfa-4aa9d2cea8c1@deltatee.com/
Suggested-by: Song Liu <song@kernel.org>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 0dd84b319352bb8ba64752d4e45396d8b13e6018 upstream.
From the link [1], we can see raid1d was running even after the path
raid_dtr -> md_stop -> __md_stop.
Let's stop write first in destructor to align with normal md-raid to
fix the KASAN issue.
[1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZcec6hQFm8nhg@mail.gmail.com/T/#m7f12bf90481c02c6d2da68c64aeed4779b7df74a
Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 104212471b1c1817b311771d817fb692af983173 ]
In line 2884, "raid5_release_stripe(sh);" drops the reference to sh and
may cause sh to be released. However, sh is subsequently used in lines
2886 "if (sh->batch_head && sh != sh->batch_head)". This may result in an
use-after-free bug.
It can be fixed by moving "raid5_release_stripe(sh);" to the bottom of
the function.
Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 1fbeea217d8f297fe0e0956a1516d14ba97d0396 upstream.
There is this warning when using a kernel with the address sanitizer
and running this testsuite:
https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/swraid/scsi_raid
==================================================================
BUG: KASAN: slab-out-of-bounds in raid_status+0x1747/0x2820 [dm_raid]
Read of size 4 at addr ffff888079d2c7e8 by task lvcreate/13319
CPU: 0 PID: 13319 Comm: lvcreate Not tainted 5.18.0-0.rc3.<snip> #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0x6a/0x9c
print_address_description.constprop.0+0x1f/0x1e0
print_report.cold+0x55/0x244
kasan_report+0xc9/0x100
raid_status+0x1747/0x2820 [dm_raid]
dm_ima_measure_on_table_load+0x4b8/0xca0 [dm_mod]
table_load+0x35c/0x630 [dm_mod]
ctl_ioctl+0x411/0x630 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
__x64_sys_ioctl+0x12a/0x1a0
do_syscall_64+0x5b/0x80
The warning is caused by reading conf->max_nr_stripes in raid_status. The
code in raid_status reads mddev->private, casts it to struct r5conf and
reads the entry max_nr_stripes.
However, if we have different raid type than 4/5/6, mddev->private
doesn't point to struct r5conf; it may point to struct r0conf, struct
r1conf, struct r10conf or struct mpconf. If we cast a pointer to one
of these structs to struct r5conf, we will be reading invalid memory
and KASAN warns about it.
Fix this bug by reading struct r5conf only if raid type is 4, 5 or 6.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 7dad24db59d2d2803576f2e3645728866a056dab upstream.
There is a KASAN warning in raid_resume when running the lvm test
lvconvert-raid.sh. The reason for the warning is that mddev->raid_disks
is greater than rs->raid_disks, so the loop touches one entry beyond
the allocated length.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e120a5f1e78fab6223544e425015f393d90d6f0d ]
Otherwise PR ops may be issued while the broader DM device is being
reconfigured, etc.
Fixes: 9c72bad1f31a ("dm: call PR reserve/unreserve on each underlying device")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d17f744e883b2f8d13cca252d71cfe8ace346f7d upstream.
There's a KASAN warning in raid10_remove_disk when running the lvm
test lvconvert-raid-reshape.sh. We fix this warning by verifying that the
value "number" is valid.
BUG: KASAN: slab-out-of-bounds in raid10_remove_disk+0x61/0x2a0 [raid10]
Read of size 8 at addr ffff889108f3d300 by task mdX_raid10/124682
CPU: 3 PID: 124682 Comm: mdX_raid10 Not tainted 5.19.0-rc6 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x45/0x57a
? __lock_text_start+0x18/0x18
? raid10_remove_disk+0x61/0x2a0 [raid10]
kasan_report+0xa8/0xe0
? raid10_remove_disk+0x61/0x2a0 [raid10]
raid10_remove_disk+0x61/0x2a0 [raid10]
Buffer I/O error on dev dm-76, logical block 15344, async page read
? __mutex_unlock_slowpath.constprop.0+0x1e0/0x1e0
remove_and_add_spares+0x367/0x8a0 [md_mod]
? super_written+0x1c0/0x1c0 [md_mod]
? mutex_trylock+0xac/0x120
? _raw_spin_lock+0x72/0xc0
? _raw_spin_lock_bh+0xc0/0xc0
md_check_recovery+0x848/0x960 [md_mod]
raid10d+0xcf/0x3360 [raid10]
? sched_clock_cpu+0x185/0x1a0
? rb_erase+0x4d4/0x620
? var_wake_function+0xe0/0xe0
? psi_group_change+0x411/0x500
? preempt_count_sub+0xf/0xc0
? _raw_spin_lock_irqsave+0x78/0xc0
? __lock_text_start+0x18/0x18
? raid10_sync_request+0x36c0/0x36c0 [raid10]
? preempt_count_sub+0xf/0xc0
? _raw_spin_unlock_irqrestore+0x19/0x40
? del_timer_sync+0xa9/0x100
? try_to_del_timer_sync+0xc0/0xc0
? _raw_spin_lock_irqsave+0x78/0xc0
? __lock_text_start+0x18/0x18
? _raw_spin_unlock_irq+0x11/0x24
? __list_del_entry_valid+0x68/0xa0
? finish_wait+0xa3/0x100
md_thread+0x161/0x260 [md_mod]
? unregister_md_personality+0xa0/0xa0 [md_mod]
? _raw_spin_lock_irqsave+0x78/0xc0
? prepare_to_wait_event+0x2c0/0x2c0
? unregister_md_personality+0xa0/0xa0 [md_mod]
kthread+0x148/0x180
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Allocated by task 124495:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x80/0xa0
setup_conf+0x140/0x5c0 [raid10]
raid10_run+0x4cd/0x740 [raid10]
md_run+0x6f9/0x1300 [md_mod]
raid_ctr+0x2531/0x4ac0 [dm_raid]
dm_table_add_target+0x2b0/0x620 [dm_mod]
table_load+0x1c8/0x400 [dm_mod]
ctl_ioctl+0x29e/0x560 [dm_mod]
dm_compat_ctl_ioctl+0x7/0x20 [dm_mod]
__do_compat_sys_ioctl+0xfa/0x160
do_syscall_64+0x90/0xc0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Last potentially related work creation:
kasan_save_stack+0x1e/0x40
__kasan_record_aux_stack+0x9e/0xc0
kvfree_call_rcu+0x84/0x480
timerfd_release+0x82/0x140
L __fput+0xfa/0x400
task_work_run+0x80/0xc0
exit_to_user_mode_prepare+0x155/0x160
syscall_exit_to_user_mode+0x12/0x40
do_syscall_64+0x42/0xc0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Second to last potentially related work creation:
kasan_save_stack+0x1e/0x40
__kasan_record_aux_stack+0x9e/0xc0
kvfree_call_rcu+0x84/0x480
timerfd_release+0x82/0x140
__fput+0xfa/0x400
task_work_run+0x80/0xc0
exit_to_user_mode_prepare+0x155/0x160
syscall_exit_to_user_mode+0x12/0x40
do_syscall_64+0x42/0xc0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
The buggy address belongs to the object at ffff889108f3d200
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 0 bytes to the right of
256-byte region [ffff889108f3d200, ffff889108f3d300)
The buggy address belongs to the physical page:
page:000000007ef2a34c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1108f3c
head:000000007ef2a34c order:2 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=2)
raw: 4000000000010200 0000000000000000 dead000000000001 ffff889100042b40
raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff889108f3d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff889108f3d280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff889108f3d300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff889108f3d380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff889108f3d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 617b365872a247480e9dcd50a32c8d1806b21861 upstream.
There's a KASAN warning in raid5_add_disk when running the LVM testsuite.
The warning happens in the test
lvconvert-raid-reshape-linear_to_raid6-single-type.sh. We fix the warning
by verifying that rdev->saved_raid_disk is within limits.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 332bd0778775d0cf105c4b9e03e460b590749916 upstream.
On dm-raid table load (using raid_ctr), dm-raid allocates an array
rs->devs[rs->raid_disks] for the raid device members. rs->raid_disks
is defined by the number of raid metadata and image tupples passed
into the target's constructor.
In the case of RAID layout changes being requested, that number can be
different from the current number of members for existing raid sets as
defined in their superblocks. Example RAID layout changes include:
- raid1 legs being added/removed
- raid4/5/6/10 number of stripes changed (stripe reshaping)
- takeover to higher raid level (e.g. raid5 -> raid6)
When accessing array members, rs->raid_disks must be used in control
loops instead of the potentially larger value in rs->md.raid_disks.
Otherwise it will cause memory access beyond the end of the rs->devs
array.
Fix this by changing code that is prone to out-of-bounds access.
Also fix validate_raid_redundancy() to validate all devices that are
added. Also, use braces to help clean up raid_iterate_devices().
The out-of-bounds memory accesses was discovered using KASAN.
This commit was verified to pass all LVM2 RAID tests (with KASAN
enabled).
Cc: stable@vger.kernel.org
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9ae6e8b1c9bbf6874163d1243e393137313762b7 upstream.
During postsuspend dm-era does the following:
1. Archives the current era
2. Commits the metadata, as part of the RPC call for archiving the
current era
3. Stops the worker
Until the worker stops, it might write to the metadata again. Moreover,
these writes are not flushed to disk immediately, but are cached by the
dm-bufio client, which writes them back asynchronously.
As a result, the committed metadata of a suspended dm-era device might
not be consistent with the in-core metadata.
In some cases, this can result in the corruption of the on-disk
metadata. Suppose the following sequence of events:
1. Load a new table, e.g. a snapshot-origin table, to a device with a
dm-era table
2. Suspend the device
3. dm-era commits its metadata, but the worker does a few more metadata
writes until it stops, as part of digesting an archived writeset
4. These writes are cached by the dm-bufio client
5. Load the dm-era table to another device.
6. The new instance of the dm-era target loads the committed, on-disk
metadata, which don't include the extra writes done by the worker
after the metadata commit.
7. Resume the new device
8. The new dm-era target instance starts using the metadata
9. Resume the original device
10. The destructor of the old dm-era target instance is called and
destroys the dm-bufio client, which results in flushing the cached
writes to disk
11. These writes might overwrite the writes done by the new dm-era
instance, hence corrupting its metadata.
Fix this by committing the metadata after the worker stops running.
stop_worker uses flush_workqueue to flush the current work. However, the
work item may re-queue itself and flush_workqueue doesn't wait for
re-queued works to finish.
This could result in the worker changing the metadata after they have
been committed, or writing to the metadata concurrently with the commit
in the postsuspend thread.
Use drain_workqueue instead, which waits until the work and all
re-queued works finish.
Fixes: eec40579d8487 ("dm: add era target")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit ea23994edc4169bd90d7a9b5908c6ccefd82fa40 upstream.
The RAID0 layout is irrelevant if all members have the same size so the
array has only one zone. It is *also* irrelevant if the array has two
zones and the second zone has only one device, for example if the array
has two members of different sizes.
So in that case it makes sense to allow assembly even when the layout is
undefined, like what is done when the array has only one zone.
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 1e267742283a4b5a8ca65755c44166be27e9aa0f ]
Generally, the md_unregister_thread is called with reconfig_mutex, but
raid_message in dm-raid doesn't hold reconfig_mutex to unregister thread,
so md_unregister_thread can be called simulitaneously from two call sites
in theory.
Then after previous commit which remove the protection of reconfig_mutex
for md_unregister_thread completely, the potential issue could be worse
than before.
Let's take pers_lock at the beginning of function to ensure reentrancy.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 64c54d9244a4efe9bc6e9c98e13c4bbb8bb39083 upstream.
The bug is here:
if (!rdev || rdev->desc_nr != nr) {
The list iterator value 'rdev' will *always* be set and non-NULL
by rdev_for_each_rcu(), so it is incorrect to assume that the
iterator value will be NULL if the list is empty or no element
found (In fact, it will be a bogus pointer to an invalid struct
object containing the HEAD). Otherwise it will bypass the check
and lead to invalid memory access passing the check.
To fix the bug, use a new variable 'iter' as the list iterator,
while using the original variable 'pdev' as a dedicated pointer to
point to the found element.
Cc: stable@vger.kernel.org
Fixes: 70bcecdb1534 ("md-cluster: Improve md_reload_sb to be less error prone")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit fc8738343eefc4ea8afb6122826dea48eacde514 upstream.
The bug is here:
if (!rdev)
The list iterator value 'rdev' will *always* be set and non-NULL
by rdev_for_each(), so it is incorrect to assume that the iterator
value will be NULL if the list is empty or no element found.
Otherwise it will bypass the NULL check and lead to invalid memory
access passing the check.
To fix the bug, use a new variable 'iter' as the list iterator,
while using the original variable 'rdev' as a dedicated pointer to
point to the found element.
Cc: stable@vger.kernel.org
Fixes: 2aa82191ac36 ("md-cluster: Perform a lazy update")
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 4caae58406f8ceb741603eee460d79bacca9b1b5 upstream.
The device-mapper framework provides a mechanism to mark targets as
immutable (and hence fail table reloads that try to change the target
type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's
feature flags to prevent switching the verity target with a different
target type.
Fixes: a4ffc152198e ("dm: add verity target")
Cc: stable@vger.kernel.org
Signed-off-by: Sarthak Kukreti <sarthakkukreti@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit bfe2b0146c4d0230b68f5c71a64380ff8d361f8b upstream.
dm-stats can be used with a very large number of entries (it is only
limited by 1/4 of total system memory), so add rescheduling points to
the loops that iterate over the entries.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 567dd8f34560fa221a6343729474536aa7ede4fd upstream.
The device mapper dm-crypt target is using scnprintf("%02x", cc->key[i]) to
report the current key to userspace. However, this is not a constant-time
operation and it may leak information about the key via timing, via cache
access patterns or via the branch predictor.
Change dm-crypt's key printing to use "%c" instead of "%02x". Also
introduce hex2asc() that carefully avoids any branching or memory
accesses when converting a number in the range 0 ... 15 to an ascii
character.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d3f2a14b8906df913cb04a706367b012db94a6e8 upstream.
The "r" variable shadows an earlier "r" that has function scope. It
means that we accidentally return success instead of an error code.
Smatch has a warning for this:
drivers/md/dm-integrity.c:4503 dm_integrity_ctr()
warn: missing error code 'r'
Fixes: 7eada909bfd7 ("dm: add integrity target")
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9f6dc633761006f974701d4c88da71ab68670749 upstream.
Commit d208b89401e0 ("dm: fix mempool NULL pointer race when
completing IO") didn't go far enough.
When bio_end_io_acct ends the count of in-flight I/Os may reach zero
and the DM device may be suspended. There is a possibility that the
suspend races with dm_stats_account_io.
Fix this by adding percpu "pending_io" counters to track outstanding
dm_io. Move kicking of suspend queue to dm_io_dec_pending(). Also,
rename md_in_flight_bios() to dm_in_flight_bios() and update it to
iterate all pending_io counters.
Fixes: d208b89401e0 ("dm: fix mempool NULL pointer race when completing IO")
Cc: stable@vger.kernel.org
Co-developed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d208b89401e073de986dc891037c5a668f5d5d95 upstream.
dm_io_dec_pending() calls end_io_acct() first and will then dec md
in-flight pending count. But if a task is swapping DM table at same
time this can result in a crash due to mempool->elements being NULL:
task1 task2
do_resume
->do_suspend
->dm_wait_for_completion
bio_endio
->clone_endio
->dm_io_dec_pending
->end_io_acct
->wakeup task1
->dm_swap_table
->__bind
->__bind_mempools
->bioset_exit
->mempool_exit
->free_io
[ 67.330330] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
......
[ 67.330494] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[ 67.330510] pc : mempool_free+0x70/0xa0
[ 67.330515] lr : mempool_free+0x4c/0xa0
[ 67.330520] sp : ffffff8008013b20
[ 67.330524] x29: ffffff8008013b20 x28: 0000000000000004
[ 67.330530] x27: ffffffa8c2ff40a0 x26: 00000000ffff1cc8
[ 67.330535] x25: 0000000000000000 x24: ffffffdada34c800
[ 67.330541] x23: 0000000000000000 x22: ffffffdada34c800
[ 67.330547] x21: 00000000ffff1cc8 x20: ffffffd9a1304d80
[ 67.330552] x19: ffffffdada34c970 x18: 000000b312625d9c
[ 67.330558] x17: 00000000002dcfbf x16: 00000000000006dd
[ 67.330563] x15: 000000000093b41e x14: 0000000000000010
[ 67.330569] x13: 0000000000007f7a x12: 0000000034155555
[ 67.330574] x11: 0000000000000001 x10: 0000000000000001
[ 67.330579] x9 : 0000000000000000 x8 : 0000000000000000
[ 67.330585] x7 : 0000000000000000 x6 : ffffff80148b5c1a
[ 67.330590] x5 : ffffff8008013ae0 x4 : 0000000000000001
[ 67.330596] x3 : ffffff80080139c8 x2 : ffffff801083bab8
[ 67.330601] x1 : 0000000000000000 x0 : ffffffdada34c970
[ 67.330609] Call trace:
[ 67.330616] mempool_free+0x70/0xa0
[ 67.330627] bio_put+0xf8/0x110
[ 67.330638] dec_pending+0x13c/0x230
[ 67.330644] clone_endio+0x90/0x180
[ 67.330649] bio_endio+0x198/0x1b8
[ 67.330655] dec_pending+0x190/0x230
[ 67.330660] clone_endio+0x90/0x180
[ 67.330665] bio_endio+0x198/0x1b8
[ 67.330673] blk_update_request+0x214/0x428
[ 67.330683] scsi_end_request+0x2c/0x300
[ 67.330688] scsi_io_completion+0xa0/0x710
[ 67.330695] scsi_finish_command+0xd8/0x110
[ 67.330700] scsi_softirq_done+0x114/0x148
[ 67.330708] blk_done_softirq+0x74/0xd0
[ 67.330716] __do_softirq+0x18c/0x374
[ 67.330724] irq_exit+0xb4/0xb8
[ 67.330732] __handle_domain_irq+0x84/0xc0
[ 67.330737] gic_handle_irq+0x148/0x1b0
[ 67.330744] el1_irq+0xe8/0x190
[ 67.330753] lpm_cpuidle_enter+0x4f8/0x538
[ 67.330759] cpuidle_enter_state+0x1fc/0x398
[ 67.330764] cpuidle_enter+0x18/0x20
[ 67.330772] do_idle+0x1b4/0x290
[ 67.330778] cpu_startup_entry+0x20/0x28
[ 67.330786] secondary_start_kernel+0x160/0x170
Fix this by:
1) Establishing pointers to 'struct dm_io' members in
dm_io_dec_pending() so that they may be passed into end_io_acct()
_after_ free_io() is called.
2) Moving end_io_acct() after free_io().
Cc: stable@vger.kernel.org
Signed-off-by: Jiazi Li <lijiazi@xiaomi.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit cd9c88da171a62c4b0f1c70e50c75845969fbc18 ]
It appears like cmd could be a Spectre v1 gadget as it's supplied by a
user and used as an array index. Prevent the contents of kernel memory
from being leaked to userspace via speculative execution by using
array_index_nospec.
Signed-off-by: Jordy Zomer <jordy@pwning.systems>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6fc51504388c1a1a53db8faafe9fff78fccc7c87 ]
Explicitly convert unsigned int in the right of the conditional
expression to int to match the left side operand and the return type,
fixing the following compiler warning:
drivers/md/dm-crypt.c:2593:43: warning: signed and unsigned
type in conditional expression [-Wsign-compare]
Fixes: c538f6ec9f56 ("dm crypt: add ability to use keys from the kernel key retention service")
Signed-off-by: Aashish Sharma <shraash@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit cba23ac158db7f3cd48a923d6861bee2eb7a2978 ]
Corrupted metadata could warrant returning error from sm_ll_lookup_bitmap().
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 85bca3c05b6cca31625437eedf2060e846c4bbad ]
Corrupt metadata could trigger an out of bounds write.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
commit 1b8d2789dad0005fd5e7d35dab26a8e1203fb6da upstream.
Move dm_tm_unlock() after dm_tm_dec().
Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 7df835a32a8bedf7ce88efcfa7c9b245b52ff139 ]
Commit b0140891a8cea3 ("md: Fix race when creating a new md device.")
not only moved assigning mddev->gendisk before calling add_disk, which
fixes the races described in the commit log, but also added a
mddev->open_mutex critical section over add_disk and creation of the
md kobj. Adding a kobject after add_disk is racy vs deleting the gendisk
right after adding it, but md already prevents against that by holding
a mddev->active reference.
On the other hand taking this lock added a lock order reversal with what
is not disk->open_mutex (used to be bdev->bd_mutex when the commit was
added) for partition devices, which need that lock for the internal open
for the partition scan, and a recent commit also takes it for
non-partitioned devices, leading to further lockdep splatter.
Fixes: b0140891a8ce ("md: Fix race when creating a new md device.")
Fixes: d62633873590 ("block: support delayed holder registration")
Reported-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 528b16bfc3ae5f11638e71b3b63a81f9999df727 upstream.
On systems with many cores using dm-crypt, heavy spinlock contention in
percpu_counter_compare() can be observed when the page allocation limit
for a given device is reached or close to be reached. This is due
to percpu_counter_compare() taking a spinlock to compute an exact
result on potentially many CPUs at the same time.
Switch to non-exact comparison of allocated and allowed pages by using
the value returned by percpu_counter_read_positive() to avoid taking
the percpu_counter spinlock.
This may over/under estimate the actual number of allocated pages by at
most (batch-1) * num_online_cpus().
Currently, batch is bounded by 32. The system on which this issue was
first observed has 256 CPUs and 512GB of RAM. With a 4k page size, this
change may over/under estimate by 31MB. With ~10G (2%) allowed dm-crypt
allocations, this seems an acceptable error. Certainly preferred over
running into the spinlock contention.
This behavior was reproduced on an EC2 c5.24xlarge instance with 96 CPUs
and 192GB RAM as follows, but can be provoked on systems with less CPUs
as well.
* Disable swap
* Tune vm settings to promote regular writeback
$ echo 50 > /proc/sys/vm/dirty_expire_centisecs
$ echo 25 > /proc/sys/vm/dirty_writeback_centisecs
$ echo $((128 * 1024 * 1024)) > /proc/sys/vm/dirty_background_bytes
* Create 8 dmcrypt devices based on files on a tmpfs
* Create and mount an ext4 filesystem on each crypt devices
* Run stress-ng --hdd 8 within one of above filesystems
Total %system usage collected from sysstat goes to ~35%. Write throughput
on the underlying loop device is ~2GB/s. perf profiling an individual
kworker kcryptd thread shows the following profile, indicating spinlock
contention in percpu_counter_compare():
99.98% 0.00% kworker/u193:46 [kernel.kallsyms] [k] ret_from_fork
|
--ret_from_fork
kthread
worker_thread
|
--99.92%--process_one_work
|
|--80.52%--kcryptd_crypt
| |
| |--62.58%--mempool_alloc
| | |
| | --62.24%--crypt_page_alloc
| | |
| | --61.51%--__percpu_counter_compare
| | |
| | --61.34%--__percpu_counter_sum
| | |
| | |--58.68%--_raw_spin_lock_irqsave
| | | |
| | | --58.30%--native_queued_spin_lock_slowpath
| | |
| | --0.69%--cpumask_next
| | |
| | --0.51%--_find_next_bit
| |
| |--10.61%--crypt_convert
| | |
| | |--6.05%--xts_crypt
...
After applying this patch and running the same test, %system usage is
lowered to ~7% and write throughput on the loop device increases
to ~2.7GB/s. perf report shows mempool_alloc() as ~8% rather than ~62%
in the profile and not hitting the percpu_counter() spinlock anymore.
|--8.15%--mempool_alloc
| |
| |--3.93%--crypt_page_alloc
| | |
| | --3.75%--__alloc_pages
| | |
| | --3.62%--get_page_from_freelist
| | |
| | --3.22%--rmqueue_bulk
| | |
| | --2.59%--_raw_spin_lock
| | |
| | --2.57%--native_queued_spin_lock_slowpath
| |
| --3.05%--_raw_spin_lock_irqsave
| |
| --2.49%--native_queued_spin_lock_slowpath
Suggested-by: DJ Gregor <dj@corelight.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Arne Welzel <arne.welzel@corelight.com>
Fixes: 5059353df86e ("dm crypt: limit the number of allocated pages")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit b6e58b5466b2959f83034bead2e2e1395cca8aeb upstream.
remove_raw() in dm_btree_remove() may fail due to IO read error
(e.g. read the content of origin block fails during shadowing),
and the value of shadow_spine::root is uninitialized, but
the uninitialized value is still assign to new_root in the
end of dm_btree_remove().
For dm-thin, the value of pmd->details_root or pmd->root will become
an uninitialized value, so if trying to read details_info tree again
out-of-bound memory may occur as showed below:
general protection fault, probably for non-canonical address 0x3fdcb14c8d7520
CPU: 4 PID: 515 Comm: dmsetup Not tainted 5.13.0-rc6
Hardware name: QEMU Standard PC
RIP: 0010:metadata_ll_load_ie+0x14/0x30
Call Trace:
sm_metadata_count_is_more_than_one+0xb9/0xe0
dm_tm_shadow_block+0x52/0x1c0
shadow_step+0x59/0xf0
remove_raw+0xb2/0x170
dm_btree_remove+0xf4/0x1c0
dm_pool_delete_thin_device+0xc3/0x140
pool_message+0x218/0x2b0
target_message+0x251/0x290
ctl_ioctl+0x1c4/0x4d0
dm_ctl_ioctl+0xe/0x20
__x64_sys_ioctl+0x7b/0xb0
do_syscall_64+0x40/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixing it by only assign new_root when removal succeeds
Signed-off-by: Hou Tao <houtao1@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|