summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* block/rnbd-srv: Remove force_close file after holding a lockGioh Kim2021-04-203-6/+7
| | | | | | | | | | | | | | | | | We changed the rnbd_srv_sess_dev_force_close to use try-lock because rnbd_srv_sess_dev_force_close and process_msg_close can generate a deadlock. Now rnbd_srv_sess_dev_force_close would do nothing if it fails to get the lock. So removing the force_close file should be moved to after the lock. Or the force_close file is removed but the others are not removed. Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20210419073722.15351-11-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block/rnbd-srv: Prevent a deadlock generated by accessing sysfs in parallelGioh Kim2021-04-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We got a warning message below. When server tries to close one session by force, it locks the sysfs interface and locks the srv_sess lock. The problem is that client can send a request to close at the same time. By close request, server locks the srv_sess lock and locks the sysfs to remove the sysfs interfaces. The simplest way to prevent that situation could be just use mutex_trylock. [ 234.153965] ====================================================== [ 234.154093] WARNING: possible circular locking dependency detected [ 234.154219] 5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10 Tainted: G O [ 234.154381] ------------------------------------------------------ [ 234.154531] kworker/1:1H/618 is trying to acquire lock: [ 234.154651] ffff8887a09db0a8 (kn->count#132){++++}, at: kernfs_remove_by_name_ns+0x40/0x80 [ 234.154819] but task is already holding lock: [ 234.154965] ffff8887ae5f6518 (&srv_sess->lock){+.+.}, at: rnbd_srv_rdma_ev+0x144/0x1590 [rnbd_server] [ 234.155132] which lock already depends on the new lock. [ 234.155311] the existing dependency chain (in reverse order) is: [ 234.155462] -> #1 (&srv_sess->lock){+.+.}: [ 234.155614] __mutex_lock+0x134/0xcb0 [ 234.155761] rnbd_srv_sess_dev_force_close+0x36/0x50 [rnbd_server] [ 234.155889] rnbd_srv_dev_session_force_close_store+0x69/0xc0 [rnbd_server] [ 234.156042] kernfs_fop_write+0x13f/0x240 [ 234.156162] vfs_write+0xf3/0x280 [ 234.156278] ksys_write+0xba/0x150 [ 234.156395] do_syscall_64+0x62/0x270 [ 234.156513] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 234.156632] -> #0 (kn->count#132){++++}: [ 234.156782] __lock_acquire+0x129e/0x23a0 [ 234.156900] lock_acquire+0xf3/0x210 [ 234.157043] __kernfs_remove+0x42b/0x4c0 [ 234.157161] kernfs_remove_by_name_ns+0x40/0x80 [ 234.157282] remove_files+0x3f/0xa0 [ 234.157399] sysfs_remove_group+0x4a/0xb0 [ 234.157519] rnbd_srv_destroy_dev_session_sysfs+0x19/0x30 [rnbd_server] [ 234.157648] rnbd_srv_rdma_ev+0x14c/0x1590 [rnbd_server] [ 234.157775] process_io_req+0x29a/0x6a0 [rtrs_server] [ 234.157924] __ib_process_cq+0x8c/0x100 [ib_core] [ 234.158709] ib_cq_poll_work+0x31/0xb0 [ib_core] [ 234.158834] process_one_work+0x4e5/0xaa0 [ 234.158958] worker_thread+0x65/0x5c0 [ 234.159078] kthread+0x1e0/0x200 [ 234.159194] ret_from_fork+0x24/0x30 [ 234.159309] other info that might help us debug this: [ 234.159513] Possible unsafe locking scenario: [ 234.159658] CPU0 CPU1 [ 234.159775] ---- ---- [ 234.159891] lock(&srv_sess->lock); [ 234.160005] lock(kn->count#132); [ 234.160128] lock(&srv_sess->lock); [ 234.160250] lock(kn->count#132); [ 234.160364] *** DEADLOCK *** [ 234.160536] 3 locks held by kworker/1:1H/618: [ 234.160677] #0: ffff8883ca1ed528 ((wq_completion)ib-comp-wq){+.+.}, at: process_one_work+0x40a/0xaa0 [ 234.160840] #1: ffff8883d2d5fe10 ((work_completion)(&cq->work)){+.+.}, at: process_one_work+0x40a/0xaa0 [ 234.161003] #2: ffff8887ae5f6518 (&srv_sess->lock){+.+.}, at: rnbd_srv_rdma_ev+0x144/0x1590 [rnbd_server] [ 234.161168] stack backtrace: [ 234.161312] CPU: 1 PID: 618 Comm: kworker/1:1H Tainted: G O 5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10 [ 234.161490] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012 [ 234.161643] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [ 234.161765] Call Trace: [ 234.161910] dump_stack+0x96/0xe0 [ 234.162028] check_noncircular+0x29e/0x2e0 [ 234.162148] ? print_circular_bug+0x100/0x100 [ 234.162267] ? register_lock_class+0x1ad/0x8a0 [ 234.162385] ? __lock_acquire+0x68e/0x23a0 [ 234.162505] ? trace_event_raw_event_lock+0x190/0x190 [ 234.162626] __lock_acquire+0x129e/0x23a0 [ 234.162746] ? register_lock_class+0x8a0/0x8a0 [ 234.162866] lock_acquire+0xf3/0x210 [ 234.162982] ? kernfs_remove_by_name_ns+0x40/0x80 [ 234.163127] __kernfs_remove+0x42b/0x4c0 [ 234.163243] ? kernfs_remove_by_name_ns+0x40/0x80 [ 234.163363] ? kernfs_fop_readdir+0x3b0/0x3b0 [ 234.163482] ? strlen+0x1f/0x40 [ 234.163596] ? strcmp+0x30/0x50 [ 234.163712] kernfs_remove_by_name_ns+0x40/0x80 [ 234.163832] remove_files+0x3f/0xa0 [ 234.163948] sysfs_remove_group+0x4a/0xb0 [ 234.164068] rnbd_srv_destroy_dev_session_sysfs+0x19/0x30 [rnbd_server] [ 234.164196] rnbd_srv_rdma_ev+0x14c/0x1590 [rnbd_server] [ 234.164345] ? _raw_spin_unlock_irqrestore+0x43/0x50 [ 234.164466] ? lockdep_hardirqs_on+0x1a8/0x290 [ 234.164597] ? mlx4_ib_poll_cq+0x927/0x1280 [mlx4_ib] [ 234.164732] ? rnbd_get_sess_dev+0x270/0x270 [rnbd_server] [ 234.164859] process_io_req+0x29a/0x6a0 [rtrs_server] [ 234.164982] ? rnbd_get_sess_dev+0x270/0x270 [rnbd_server] [ 234.165130] __ib_process_cq+0x8c/0x100 [ib_core] [ 234.165279] ib_cq_poll_work+0x31/0xb0 [ib_core] [ 234.165404] process_one_work+0x4e5/0xaa0 [ 234.165550] ? pwq_dec_nr_in_flight+0x160/0x160 [ 234.165675] ? do_raw_spin_lock+0x119/0x1d0 [ 234.165796] worker_thread+0x65/0x5c0 [ 234.165914] ? process_one_work+0xaa0/0xaa0 [ 234.166031] kthread+0x1e0/0x200 [ 234.166147] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 234.166268] ret_from_fork+0x24/0x30 [ 234.251591] rnbd_server L243: </dev/loop1@close_device_session>: Device closed [ 234.604221] rnbd_server L264: RTRS Session close_device_session disconnected Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20210419073722.15351-10-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block/rnbd-clt: Replace {NO_WAIT,WAIT} with RTRS_PERMIT_{WAIT,NOWAIT}Gioh Kim2021-04-203-30/+22
| | | | | | | | | | | | | | | | | | | | They are defined with the same value and similar meaning, let's remove one of them, then we can remove {WAIT,NOWAIT}. Also change the type of 'wait' from 'int' to 'enum wait_type' to make it clear. Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210419073722.15351-9-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block/rnbd: Kill destroy_device_cbGuoqing Jiang2021-04-201-11/+4
| | | | | | | | | | | | | We can use destroy_device directly since destroy_device_cb is just the wrapper of destroy_device. Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210419073722.15351-8-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block/rnbd: Kill rnbd_clt_destroy_default_groupGuoqing Jiang2021-04-203-7/+1
| | | | | | | | | | | | | | | | No need to have it since we can call sysfs_remove_group in the rnbd_clt_destroy_sysfs_files. Then rnbd_clt_destroy_sysfs_files is paired with it's counterpart rnbd_clt_create_sysfs_files. Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210419073722.15351-7-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block/rnbd-clt: Move add_disk(dev->gd) to rnbd_clt_setup_gen_diskGuoqing Jiang2021-04-201-2/+1
| | | | | | | | | | | | | It makes more sense to add gendisk in rnbd_clt_setup_gen_disk, instead of do it in rnbd_clt_map_device. Signed-off-by: Guoqing Jiang <guoqing.jiang@gmx.com> Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210419073722.15351-6-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block/rnbd-clt: Remove some arguments from rnbd_client_setup_deviceGuoqing Jiang2021-04-201-4/+3
| | | | | | | | | | | | | Remove them since both sess and idx can be dereferenced from dev. And sess is not used in the function. Signed-off-by: Guoqing Jiang <guoqing.jiang@gmx.com> Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210419073722.15351-5-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block/rnbd-clt: Remove some arguments from insert_dev_if_not_exists_devpathGuoqing Jiang2021-04-201-5/+4
| | | | | | | | | | | | Remove 'pathname' and 'sess' since we can dereference it from 'dev'. Signed-off-by: Guoqing Jiang <guoqing.jiang@gmx.com> Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210419073722.15351-4-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Documentation/sysfs-block-rnbd: Add descriptions for remap_device and resizeGioh Kim2021-04-201-0/+12
| | | | | | | | | Two sysfs entries, remap_device and resize, are missing. Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Link: https://lore.kernel.org/r/20210419073722.15351-3-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* MAINTAINERS: Change maintainer for rnbd moduleDanil Kipnis2021-04-201-2/+2
| | | | | | | | | | | | | Danil steps down, Haris will take over. Also update email address to ionos.com, the old cloud.ionos.com will still work for some time. Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Acked-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Link: https://lore.kernel.org/r/20210419073722.15351-2-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* floppy: remove redundant assignment to variable stColin Ian King2021-04-201-1/+0
| | | | | | | | | | | | | The variable st is being assigned a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Denis Efremov <efremov@linux.com> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20210415130020.1959951-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* floppy: cleanups: remove FLOPPY_SILENT_DCL_CLEAR undefDenis Efremov2021-04-201-2/+0
| | | | | | | | | | FLOPPY_SILENT_DCL_CLEAR is not defined anywhere and comes from pre-git era. Just drop this undef. There is FD_SILENT_DCL_CLEAR which is really used. Signed-off-by: Denis Efremov <efremov@linux.com> Link: https://lore.kernel.org/r/20210416083449.72700-6-efremov@linux.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* floppy: cleanups: use memcpy() to copy reply_bufferDenis Efremov2021-04-201-4/+1
| | | | | | | | | Use memcpy() in raw_cmd_done() to copy reply_buffer instead of a for loop. Signed-off-by: Denis Efremov <efremov@linux.com> Link: https://lore.kernel.org/r/20210416083449.72700-5-efremov@linux.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* floppy: cleanups: use memset() to zero reply_bufferDenis Efremov2021-04-201-3/+1
| | | | | | | | | Use memset() to zero reply buffer in raw_cmd_copyin() instead of a for loop. Signed-off-by: Denis Efremov <efremov@linux.com> Link: https://lore.kernel.org/r/20210416083449.72700-4-efremov@linux.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* floppy: cleanups: use ST0 as reply_buffer index 0Denis Efremov2021-04-201-6/+6
| | | | | | | | | Use ST0 as 0 index for reply_buffer array. get_fdc_version() is the only function that uses index 0 directly instead of the ST0 define. Signed-off-by: Denis Efremov <efremov@linux.com> Link: https://lore.kernel.org/r/20210416083449.72700-3-efremov@linux.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* floppy: cleanups: remove trailing whitespacesDenis Efremov2021-04-201-23/+23
| | | | | | | | Cleanup trailing whitespaces as checkpatch.pl suggests. Signed-off-by: Denis Efremov <efremov@linux.com> Link: https://lore.kernel.org/r/20210416083449.72700-2-efremov@linux.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'md-next' of ↵Jens Axboe2021-04-152-72/+69
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.13/drivers Pull MD updates from Song: "1. mddev_find_or_alloc() clean up, from Christoph. 2. Fix NULL pointer deref with external bitmap, from Sudhakar." * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md/bitmap: wait for external bitmap writes to complete during tear down md: do not return existing mddevs from mddev_find_or_alloc md: refactor mddev_find_or_alloc md: factor out a mddev_alloc_unit helper from mddev_find
| * md/bitmap: wait for external bitmap writes to complete during tear downSudhakar Panneerselvam2021-04-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NULL pointer dereference was observed in super_written() when it tries to access the mddev structure. [The below stack trace is from an older kernel, but the problem described in this patch applies to the mainline kernel.] [ 1194.474861] task: ffff8fdd20858000 task.stack: ffffb99d40790000 [ 1194.488000] RIP: 0010:super_written+0x29/0xe1 [ 1194.499688] RSP: 0018:ffff8ffb7fcc3c78 EFLAGS: 00010046 [ 1194.512477] RAX: 0000000000000000 RBX: ffff8ffb7bf4a000 RCX: ffff8ffb78991048 [ 1194.527325] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ffb56b8a200 [ 1194.542576] RBP: ffff8ffb7fcc3c90 R08: 000000000000000b R09: 0000000000000000 [ 1194.558001] R10: ffff8ffb56b8a298 R11: 0000000000000000 R12: ffff8ffb56b8a200 [ 1194.573070] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1194.588117] FS: 0000000000000000(0000) GS:ffff8ffb7fcc0000(0000) knlGS:0000000000000000 [ 1194.604264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1194.617375] CR2: 00000000000002b8 CR3: 00000021e040a002 CR4: 00000000007606e0 [ 1194.632327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1194.647865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1194.663316] PKRU: 55555554 [ 1194.674090] Call Trace: [ 1194.683735] <IRQ> [ 1194.692948] bio_endio+0xae/0x135 [ 1194.703580] blk_update_request+0xad/0x2fa [ 1194.714990] blk_update_bidi_request+0x20/0x72 [ 1194.726578] __blk_end_bidi_request+0x2c/0x4d [ 1194.738373] __blk_end_request_all+0x31/0x49 [ 1194.749344] blk_flush_complete_seq+0x377/0x383 [ 1194.761550] flush_end_io+0x1dd/0x2a7 [ 1194.772910] blk_finish_request+0x9f/0x13c [ 1194.784544] scsi_end_request+0x180/0x25c [ 1194.796149] scsi_io_completion+0xc8/0x610 [ 1194.807503] scsi_finish_command+0xdc/0x125 [ 1194.818897] scsi_softirq_done+0x81/0xde [ 1194.830062] blk_done_softirq+0xa4/0xcc [ 1194.841008] __do_softirq+0xd9/0x29f [ 1194.851257] irq_exit+0xe6/0xeb [ 1194.861290] do_IRQ+0x59/0xe3 [ 1194.871060] common_interrupt+0x1c6/0x382 [ 1194.881988] </IRQ> [ 1194.890646] RIP: 0010:cpuidle_enter_state+0xdd/0x2a5 [ 1194.902532] RSP: 0018:ffffb99d40793e68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff43 [ 1194.917317] RAX: ffff8ffb7fce27c0 RBX: ffff8ffb7fced800 RCX: 000000000000001f [ 1194.932056] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000 [ 1194.946428] RBP: ffffb99d40793ea0 R08: 0000000000000004 R09: 0000000000002ed2 [ 1194.960508] R10: 0000000000002664 R11: 0000000000000018 R12: 0000000000000003 [ 1194.974454] R13: 000000000000000b R14: ffffffff925715a0 R15: 0000011610120d5a [ 1194.988607] ? cpuidle_enter_state+0xcc/0x2a5 [ 1194.999077] cpuidle_enter+0x17/0x19 [ 1195.008395] call_cpuidle+0x23/0x3a [ 1195.017718] do_idle+0x172/0x1d5 [ 1195.026358] cpu_startup_entry+0x73/0x75 [ 1195.035769] start_secondary+0x1b9/0x20b [ 1195.044894] secondary_startup_64+0xa5/0xa5 [ 1195.084921] RIP: super_written+0x29/0xe1 RSP: ffff8ffb7fcc3c78 [ 1195.096354] CR2: 00000000000002b8 bio in the above stack is a bitmap write whose completion is invoked after the tear down sequence sets the mddev structure to NULL in rdev. During tear down, there is an attempt to flush the bitmap writes, but for external bitmaps, there is no explicit wait for all the bitmap writes to complete. For instance, md_bitmap_flush() is called to flush the bitmap writes, but the last call to md_bitmap_daemon_work() in md_bitmap_flush() could generate new bitmap writes for which there is no explicit wait to complete those writes. The call to md_bitmap_update_sb() will return simply for external bitmaps and the follow-up call to md_update_sb() is conditional and may not get called for external bitmaps. This results in a kernel panic when the completion routine, super_written() is called which tries to reference mddev in the rdev that has been set to NULL(in unbind_rdev_from_array() by tear down sequence). The solution is to call md_super_wait() for external bitmaps after the last call to md_bitmap_daemon_work() in md_bitmap_flush() to ensure there are no pending bitmap writes before proceeding with the tear down. Cc: stable@vger.kernel.org Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com> Reviewed-by: Zhao Heming <heming.zhao@suse.com> Signed-off-by: Song Liu <song@kernel.org>
| * md: do not return existing mddevs from mddev_find_or_allocChristoph Hellwig2021-04-151-23/+23
| | | | | | | | | | | | | | | | | | Instead of returning an existing mddev, just for it to be discarded later directly return -EEXIST. Rename the function to mddev_alloc now that it doesn't find an existing mddev. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
| * md: refactor mddev_find_or_allocChristoph Hellwig2021-04-151-36/+24
| | | | | | | | | | | | | | | | Allocate the new mddev first speculatively, which greatly simplifies the code flow. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
| * md: factor out a mddev_alloc_unit helper from mddev_findChristoph Hellwig2021-04-151-20/+27
| | | | | | | | | | | | | | | | Split out a self contained helper to find a free minor for the md "unit" number. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
* | Merge tag 'nvme-5.13-2021-04-15' of git://git.infradead.org/nvme into ↵Jens Axboe2021-04-1511-575/+669
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for-5.13/drivers Pull NVMe updates from Christoph: "nvme updates for Linux 5.13 - refactor the ioctl code - fix a segmentation fault during io parsing error in nvmet-tcp (Elad Grupi) - fix NULL derefence in nvme_ctrl_fast_io_fail_tmo_show/store (Gopal Tiwari) - properly respect the sgl_threshold flag in nvme-pci (Niklas Cassel) - misc cleanups (Niklas Cassel, Amit Engel, Minwoo Im, Colin Ian King)" * tag 'nvme-5.13-2021-04-15' of git://git.infradead.org/nvme: nvme: fix NULL derefence in nvme_ctrl_fast_io_fail_tmo_show/store nvme: let namespace probing continue for unsupported features nvme: factor out nvme_ns_open and nvme_ns_release helpers nvme: move nvme_ns_head_ops to multipath.c nvme: factor out a nvme_tryget_ns_head helper nvme: move the ioctl code to a separate file nvme: don't bother to look up a namespace for controller ioctls nvme: simplify block device ioctl handling for the !multipath case nvme: simplify the compat ioctl handling nvme: factor out a nvme_ns_ioctl helper nvme: pass a user pointer to nvme_nvm_ioctl nvme: cleanup setting the disk name nvme: add a nvme_ns_head_multipath helper nvme: remove single trailing whitespace nvme-multipath: remove single trailing whitespace nvme-pci: remove single trailing whitespace nvme-pci: don't simple map sgl when sgls are disabled nvmet: fix a spelling mistake "nubmer" -> "number" nvmet-fc: simplify nvmet_fc_alloc_hostport nvmet-tcp: fix a segmentation fault during io parsing error
| * nvme: fix NULL derefence in nvme_ctrl_fast_io_fail_tmo_show/storeGopal Tiwari2021-04-151-0/+2
| | | | | | | | | | | | | | | | | | | | Adding entry for dev_attr_fast_io_fail_tmo to avoid the kernel crash while reading and writing the fast_io_fail_tmo. Fixes: 09fbed636382 (nvme: export fast_io_fail_tmo to sysfs) Signed-off-by: Gopal Tiwari <gtiwari@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: let namespace probing continue for unsupported featuresChristoph Hellwig2021-04-152-3/+12
| | | | | | | | | | | | | | | | | | | | Instead of failing to scan the namespace entirely when unsupported features are detected, just mark the gendisk hidden but allow other access like the upcoming per-namespace character device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com>
| * nvme: factor out nvme_ns_open and nvme_ns_release helpersChristoph Hellwig2021-04-151-4/+12
| | | | | | | | | | | | | | | | | | These will be reused for the per-namespace character devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
| * nvme: move nvme_ns_head_ops to multipath.cChristoph Hellwig2021-04-153-29/+32
| | | | | | | | | | | | | | | | | | | | Move the multipath block_device_operations to multipath.c, where they belong. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
| * nvme: factor out a nvme_tryget_ns_head helperChristoph Hellwig2021-04-151-4/+7
| | | | | | | | | | | | | | | | | | | | Add a helper to avoid opencoding ns_head->ref manipulations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
| * nvme: move the ioctl code to a separate fileChristoph Hellwig2021-04-154-449/+468
| | | | | | | | | | | | | | | | | | | | Split out the ioctl code from core.c into a new file. Also update copyrights while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
| * nvme: don't bother to look up a namespace for controller ioctlsChristoph Hellwig2021-04-151-24/+42
| | | | | | | | | | | | | | | | | | | | Don't bother to look up a namespace just to drop if after retreiving the controller for the multipath case. Just look up a live controller for the subsystem directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com>
| * nvme: simplify block device ioctl handling for the !multipath caseChristoph Hellwig2021-04-151-36/+47
| | | | | | | | | | | | | | | | | | | | Only use the existing ioctl handler for the multipath case, and add a simpler one that reverts to the pre-multipath case for not shared use case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com>
| * nvme: simplify the compat ioctl handlingChristoph Hellwig2021-04-151-43/+26
| | | | | | | | | | | | | | | | | | | | Don't bother defining a separate compat_ioctl handler, and just handle the NVME_IOCTL_SUBMIT_IO32 case inline. Also only defined it for those ABIs (currently just i386 vs x86_64) that are affected. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com>
| * nvme: factor out a nvme_ns_ioctl helperChristoph Hellwig2021-04-151-21/+21
| | | | | | | | | | | | | | | | | | Factor out a helper for the namespace based ioctls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
| * nvme: pass a user pointer to nvme_nvm_ioctlChristoph Hellwig2021-04-153-7/+7
| | | | | | | | | | | | | | | | | | Pass the proper user pointer instead of the not all that useful integer representation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com>
| * nvme: cleanup setting the disk nameChristoph Hellwig2021-04-153-28/+27
| | | | | | | | | | | | | | | | | | | | | | Return false from nvme_set_disk_name and let the caller set the non-multipath name instead of duplicating the naming information in two places. Also remove the pointless local variables for the disk name and flags and the not needed ctrl argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com>
| * nvme: add a nvme_ns_head_multipath helperMinwoo Im2021-04-152-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Move the multipath gendisk out of #ifdef CONFIG_NVME_MULTIPATH and add a new nvme_ns_head_multipath that uses it to check if a ns_head has a multipath device associated with it. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> [hch: added the IS_ENABLED, converted a few existing users] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
| * nvme: remove single trailing whitespaceNiklas Cassel2021-04-151-1/+1
| | | | | | | | | | | | | | | | | | There is a single trailing whitespace in core.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-multipath: remove single trailing whitespaceNiklas Cassel2021-04-151-1/+1
| | | | | | | | | | | | | | | | | | There is a single trailing whitespace in multipath.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-pci: remove single trailing whitespaceNiklas Cassel2021-04-151-1/+1
| | | | | | | | | | | | | | | | | | There is a single trailing whitespace in pci.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-pci: don't simple map sgl when sgls are disabledNiklas Cassel2021-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | According to the module parameter description for sgl_threshold, a value of 0 means that SGLs are disabled. If SGLs are disabled, we should respect that, even for the case where the request is made up of a single physical segment. Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvmet: fix a spelling mistake "nubmer" -> "number"Colin Ian King2021-04-151-1/+1
| | | | | | | | | | | | | | | | There is a spelling mistake in a pr_err error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvmet-fc: simplify nvmet_fc_alloc_hostportAmit Engel2021-04-151-31/+46
| | | | | | | | | | | | | | | | | | | | | | Once a host is already created, avoid allocate additional hostports that will be thrown away. add an helper function to handle host search. Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Amit Engel <amit.engel@dell.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvmet-tcp: fix a segmentation fault during io parsing errorElad Grupi2021-04-151-8/+31
|/ | | | | | | | | | | | | In case there is an io that contains inline data and it goes to parsing error flow, command response will free command and iov before clearing the data on the socket buffer. This will delay the command response until receive flow is completed. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Elad Grupi <elad.grupi@dell.com> Signed-off-by: Hou Pu <houpu.main@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* lightnvm: deprecated OCSSD support and schedule it for removal in Linux 5.15Christoph Hellwig2021-04-132-1/+5
| | | | | | | | | | | | | | | | | | Lightnvm was an innovative idea to expose more low-level control over SSDs. But it failed to get properly standardized and remains a non-standarized extension to NVMe that requires vendor specific quirks for a few now mostly obsolete SSD devices. The standardized ZNS command set for NVMe has take over a lot of the approaches and allows for fully standardized operation. Remove the Linux code to support open channel SSDs as the few production deployments of the above mentioned SSDs are using userspace driver stacks instead of the fairly limited Linux support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Link: https://lore.kernel.org/r/20210413105257.159260-5-matias.bjorling@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: remove duplicate include in lightnvm.hZhang Yunkai2021-04-132-3/+0
| | | | | | | | | | 'linux/blkdev.h' and 'uapi/linux/lightnvm.h' included in 'lightnvm.h' is duplicated.It is also included in the 5th and 7th line. Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Link: https://lore.kernel.org/r/20210413105257.159260-4-matias.bjorling@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: return the correct return valueTian Tao2021-04-131-1/+1
| | | | | | | | | | When memdup_user returns an error, memdup_user has two different return values, use PTR_ERR to get the correct return value. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Link: https://lore.kernel.org/r/20210413105257.159260-3-matias.bjorling@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: use kobj_to_dev()Chaitanya Kulkarni2021-04-131-1/+1
| | | | | | | | | | | | This fixs coccicheck warning: drivers/nvme//host/lightnvm.c:1243:60-61: WARNING opportunity for kobj_to_dev() Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Link: https://lore.kernel.org/r/20210413105257.159260-2-matias.bjorling@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove the -ERESTARTSYS handling in blkdev_get_by_devChristoph Hellwig2021-04-121-6/+0
| | | | | | | Now that md has been cleaned up we can get rid of this hack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* null_blk: add option for managing virtual boundaryMax Gurtovoy2021-04-122-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | This will enable changing the virtual boundary of null blk devices. For now, null blk devices didn't have any restriction on the scatter/gather elements received from the block layer. Add a module parameter and a configfs option that will control the virtual boundary. This will enable testing the efficiency of the block layer bounce buffer in case a suitable application will send discontiguous IO to the given device. Initial testing with patched FIO showed the following results (64 jobs, 128 iodepth, 1 nullb device): IO size READ (virt=false) READ (virt=true) Write (virt=false) Write (virt=true) ---------- ------------------- ----------------- ------------------- ------------------- 1k 10.7M 8482k 10.8M 8471k 2k 10.4M 8266k 10.4M 8271k 4k 10.4M 8274k 10.3M 8226k 8k 10.2M 8131k 9800k 7933k 16k 9567k 7764k 8081k 6828k 32k 8865k 7309k 5570k 5153k 64k 7695k 6586k 2682k 2617k 128k 5346k 5489k 1320k 1296k Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210412095523.278632-1-mgurtovoy@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* gdrom: fix compilation errorChaitanya Kulkarni2021-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Use the right name for the struct request variable that removes the following compilation error :- make --silent --keep-going --jobs=8 O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=sh CROSS_COMPILE=sh4-linux-gnu- 'CC=sccache sh4-linux-gnu-gcc' 'HOSTCC=sccache gcc' In file included from /builds/linux/include/linux/scatterlist.h:9, from /builds/linux/include/linux/dma-mapping.h:10, from /builds/linux/drivers/cdrom/gdrom.c:16: /builds/linux/drivers/cdrom/gdrom.c: In function 'gdrom_readdisk_dma': /builds/linux/drivers/cdrom/gdrom.c:586:61: error: 'rq' undeclared (first use in this function) 586 | __raw_writel(page_to_phys(bio_page(req->bio)) + bio_offset(rq->bio), | ^~ Fixes: 1d2c82001a5f ("gdrom: support highmem") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bcache: fix a regression of code compiling failure in debug.cColy Li2021-04-111-1/+1
| | | | | | | | | | | | | | | | | | | The patch "bcache: remove PTR_CACHE" introduces a compiling failure in debug.c with following error message, In file included from drivers/md/bcache/bcache.h:182:0, from drivers/md/bcache/debug.c:9: drivers/md/bcache/debug.c: In function 'bch_btree_verify': drivers/md/bcache/debug.c:53:19: error: 'c' undeclared (first use in this function) bio_set_dev(bio, c->cache->bdev); ^ This patch fixes the regression by replacing c->cache->bdev by b->c-> cache->bdev. Signed-off-by: Coly Li <colyli@suse.de> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210411134316.80274-8-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>