summaryrefslogtreecommitdiffstats
path: root/drivers/nvme
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'block-5.14-2021-07-24' of git://git.kernel.dk/linux-blockLinus Torvalds2021-07-245-18/+31
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: - NVMe pull request (Christoph): - tracing fix (Keith Busch) - fix multipath head refcounting (Hannes Reinecke) - Write Zeroes vs PI fix (me) - drop a bogus WARN_ON (Zhihao Cheng) - Increase max blk-cgroup policy size, now that mq-deadline uses it too (Oleksandr) * tag 'block-5.14-2021-07-24' of git://git.kernel.dk/linux-block: nvme: set the PRACT bit when using Write Zeroes with T10 PI nvme: fix nvme_setup_command metadata trace event nvme: fix refcounting imbalance when all paths are down nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING block: increase BLKCG_MAX_POLS
| * nvme: set the PRACT bit when using Write Zeroes with T10 PIChristoph Hellwig2021-07-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | When using Write Zeroes on a namespace that has protection information enabled they behavior without the PRACT bit counter-intuitive and will generally lead to validation failures when reading the written blocks. Fix this by always setting the PRACT bit that generates matching PI data on the fly. Fixes: 6e02318eaea5 ("nvme: add support for the Write Zeroes command") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
| * nvme: fix nvme_setup_command metadata trace eventKeith Busch2021-07-211-3/+3
| | | | | | | | | | | | | | | | | | | | The metadata address is set after the trace event, so the trace is not capturing anything useful. Rather than logging the memory address, it's useful to know if the command carries a metadata payload, so change the trace event to log that true/false state instead. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: fix refcounting imbalance when all paths are downHannes Reinecke2021-07-213-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the last path to a ns_head drops the current code removes the ns_head from the subsystem list, but will only delete the disk itself if the last reference to the ns_head drops. This is causing an refcounting imbalance eg when applications have a reference to the disk, as then they'll never get notified that the disk is in fact dead. This patch moves the call 'del_gendisk' into nvme_mpath_check_last_path(), ensuring that the disk can be properly removed and applications get the appropriate notifications. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTINGZhihao Cheng2021-07-211-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Followling process: nvme_probe nvme_reset_ctrl nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING) queue_work(nvme_reset_wq, &ctrl->reset_work) --------------> nvme_remove nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING) worker_thread process_one_work nvme_reset_work WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING) , which will trigger WARN_ON in nvme_reset_work(): [ 127.534298] WARNING: CPU: 0 PID: 139 at drivers/nvme/host/pci.c:2594 [ 127.536161] CPU: 0 PID: 139 Comm: kworker/u8:7 Not tainted 5.13.0 [ 127.552518] Call Trace: [ 127.552840] ? kvm_sched_clock_read+0x25/0x40 [ 127.553936] ? native_send_call_func_single_ipi+0x1c/0x30 [ 127.555117] ? send_call_function_single_ipi+0x9b/0x130 [ 127.556263] ? __smp_call_single_queue+0x48/0x60 [ 127.557278] ? ttwu_queue_wakelist+0xfa/0x1c0 [ 127.558231] ? try_to_wake_up+0x265/0x9d0 [ 127.559120] ? ext4_end_io_rsv_work+0x160/0x290 [ 127.560118] process_one_work+0x28c/0x640 [ 127.561002] worker_thread+0x39a/0x700 [ 127.561833] ? rescuer_thread+0x580/0x580 [ 127.562714] kthread+0x18c/0x1e0 [ 127.563444] ? set_kthread_struct+0x70/0x70 [ 127.564347] ret_from_fork+0x1f/0x30 The preceding problem can be easily reproduced by executing following script (based on blktests suite): test() { pdev="$(_get_pci_dev_from_blkdev)" sysfs="/sys/bus/pci/devices/${pdev}" for ((i = 0; i < 10; i++)); do echo 1 > "$sysfs/remove" echo 1 > /sys/bus/pci/rescan done } Since the device ctrl could be updated as an non-RESETTING state by repeating probe/remove in userspace (which is a normal situation), we can replace stack dumping WARN_ON with a warnning message. Fixes: 82b057caefaff ("nvme-pci: fix multiple ctrl removal schedulin") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
* | Merge tag 'block-5.14-2021-07-16' of git://git.kernel.dk/linux-blockLinus Torvalds2021-07-162-12/+59
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: - NVMe fixes via Christoph: - fix various races in nvme-pci when shutting down just after probing (Casey Chen) - fix a net_device leak in nvme-tcp (Prabhakar Kushwaha) - Fix regression in xen-blkfront by cleaning up the removal state machine (Christoph) - Fix tag_set and queue cleanup ordering regression in nbd (Wang) - Fix tag_set and queue cleanup ordering regression in pd (Guoqing) * tag 'block-5.14-2021-07-16' of git://git.kernel.dk/linux-block: xen-blkfront: sanitize the removal state machine nbd: fix order of cleaning up the queue and freeing the tagset pd: fix order of cleaning up the queue and freeing the tagset nvme-pci: do not call nvme_dev_remove_admin from nvme_remove nvme-pci: fix multiple races in nvme_setup_io_queues nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE
| * nvme-pci: do not call nvme_dev_remove_admin from nvme_removeCasey Chen2021-07-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nvme_dev_remove_admin could free dev->admin_q and the admin_tagset while they are being accessed by nvme_dev_disable(), which can be called by nvme_reset_work via nvme_remove_dead_ctrl. Commit cb4bfda62afa ("nvme-pci: fix hot removal during error handling") intended to avoid requests being stuck on a removed controller by killing the admin queue. But the later fix c8e9e9b7646e ("nvme-pci: unquiesce admin queue on shutdown"), together with nvme_dev_disable(dev, true) right before nvme_dev_remove_admin() could help dispatch requests and fail them early, so we don't need nvme_dev_remove_admin() any more. Fixes: cb4bfda62afa ("nvme-pci: fix hot removal during error handling") Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-pci: fix multiple races in nvme_setup_io_queuesCasey Chen2021-07-131-8/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Below two paths could overlap each other if we power off a drive quickly after powering it on. There are multiple races in nvme_setup_io_queues() because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit. nvme_reset_work() nvme_remove() nvme_setup_io_queues() nvme_dev_disable() ... ... A1 clear NVMEQ_ENABLED bit for admin queue lock retry: B1 nvme_suspend_io_queues() A2 pci_free_irq() admin queue B2 nvme_suspend_queue() admin queue A3 pci_free_irq_vectors() nvme_pci_disable() A4 nvme_setup_irqs(); B3 pci_free_irq_vectors() ... unlock A5 queue_request_irq() for admin queue set NVMEQ_ENABLED bit ... nvme_create_io_queues() A6 result = queue_request_irq(); set NVMEQ_ENABLED bit ... fail to allocate enough IO queues: A7 nvme_suspend_io_queues() goto retry If B3 runs in between A1 and A2, it will crash if irqaction haven't been freed by A2. B2 is supposed to free admin queue IRQ but it simply can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit. Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit gets cleared. After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3 is checking irqaction. A3 also could race with B2 if B2 is freeing IRQ while A3 is checking irqaction. Fix: A2 and A3 take lock for mutual exclusion. A3 could race with B3 since they could run free_msi_irqs() in parallel. Fix: A3 takes lock for mutual exclusion. A4 could fail to allocate all needed IRQ vectors if A3 and A4 are interrupted by B3. Fix: A4 takes lock for mutual exclusion. If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL. They are just allocated by A5/A6. Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit. A7 could get chance to pci_free_irq() for certain IO queue while B3 is checking irqaction. Fix: A7 takes lock. nvme_dev->online_queues need to be protected by shutdown_lock. Since it is not atomic, both paths could modify it using its own copy. Co-developed-by: Yuanyuan Zhong <yzhong@purestorage.com> Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACEPrabhakar Kushwaha2021-07-131-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_get_by_name() finds network device by name but it also increases the reference count. If a nvme-tcp queue is present and the network device driver is removed before nvme_tcp, we will face the following continuous log: "kernel:unregister_netdevice: waiting for <eth> to become free. Usage count = 2" And rmmod further halts. Similar case arises during reboot/shutdown with nvme-tcp queue present and both never completes. To fix this, use __dev_get_by_name() which finds network device by name without increasing any reference counter. Fixes: 3ede8f72a9a2 ("nvme-tcp: allow selecting the network interface for connections") Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> [hch: remove the ->ndev member entirely] Signed-off-by: Christoph Hellwig <hch@lst.de>
* | Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-blockLinus Torvalds2021-07-0911-61/+47
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block updates from Jens Axboe: "A combination of changes that ended up depending on both the driver and core branch (and/or the IDE removal), and a few late arriving fixes. In detail: - Fix io ticks wrap-around issue (Chunguang) - nvme-tcp sock locking fix (Maurizio) - s390-dasd fixes (Kees, Christoph) - blk_execute_rq polling support (Keith) - blk-cgroup RCU iteration fix (Yu) - nbd backend ID addition (Prasanna) - Partition deletion fix (Yufen) - Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph) - Removal of now dead block request types due to IDE removal (Christoph) - Loop probing and control device cleanups (Christoph) - Device uevent fix (Christoph) - Misc cleanups/fixes (Tetsuo, Christoph)" * tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits) blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs block: fix the problem of io_ticks becoming smaller nvme-tcp: can't set sk_user_data without write_lock loop: remove unused variable in loop_set_status() block: remove the bdgrab in blk_drop_partitions block: grab a device refcount in disk_uevent s390/dasd: Avoid field over-reading memcpy() dasd: unexport dasd_set_target_state block: check disk exist before trying to add partition ubd: remove dead code in ubd_setup_common nvme: use return value from blk_execute_rq() block: return errors from blk_execute_rq() nvme: use blk_execute_rq() for passthrough commands block: support polling through blk_execute_rq block: remove REQ_OP_SCSI_{IN,OUT} block: mark blk_mq_init_queue_data static loop: rewrite loop_exit using idr_for_each_entry loop: split loop_lookup loop: don't allow deleting an unspecified loop device loop: move loop_ctl_mutex locking into loop_add ...
| * nvme-tcp: can't set sk_user_data without write_lockMaurizio Lombardi2021-07-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sk_user_data pointer is supposed to be modified only while holding the write_lock "sk_callback_lock", otherwise we could race with other threads and crash the kernel. we can't take the write_lock in nvmet_tcp_state_change() because it would cause a deadlock, but the release_work queue will set the pointer to NULL later so we can simply remove the assignment. Fixes: b5332a9f3f3d ("nvmet-tcp: fix incorrect locking in state_change sk callback") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: use return value from blk_execute_rq()Keith Busch2021-06-304-18/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't have an nvme status to report if the driver's .queue_rq() returns an error without dispatching the requested nvme command. Check the return value from blk_execute_rq() for all passthrough commands so the caller may know their command was not successful. If the command is from the target passthrough interface and fails to dispatch, synthesize the response back to the host as a internal target error. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-5-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme: use blk_execute_rq() for passthrough commandsKeith Busch2021-06-308-47/+19
| | | | | | | | | | | | | | | | | | | | | | The generic blk_execute_rq() knows how to handle polled completions. Use that instead of implementing an nvme specific handler. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2021-07-021-1/+71
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, ibmvfc, megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with elx and mpi3mr being new drivers. The major core change is a rework to drop the status byte handling macros and the old bit shifted definitions and the rest of the updates are minor fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits) scsi: aha1740: Avoid over-read of sense buffer scsi: arcmsr: Avoid over-read of sense buffer scsi: ips: Avoid over-read of sense buffer scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe() scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame() scsi: elx: libefc: Fix less than zero comparison of a unsigned int scsi: elx: efct: Fix pointer error checking in debugfs init scsi: elx: efct: Fix is_originator return code type scsi: elx: efct: Fix link error for _bad_cmpxchg scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel() scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session() scsi: elx: efct: Fix error handling in efct_hw_init() scsi: elx: efct: Remove redundant initialization of variable lun scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected" scsi: lpfc: Fix build error in lpfc_scsi.c scsi: target: iscsi: Remove redundant continue statement scsi: qla4xxx: Remove redundant continue statement scsi: ppa: Switch to use module_parport_driver() scsi: imm: Switch to use module_parport_driver() scsi: mpt3sas: Fix error return value in _scsih_expander_add() ...
| * scsi: nvme: Added a new sysfs attribute appid_storeMuneendra Kumar2021-06-101-1/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new sysfs attribute, appid_store, which can be used to set the application identifier in the blkcg associated with a cgroup id. Below is the interface provided to set the app_id: echo "<cgroupid>:<appid>" >> /sys/class/fc/fc_udev_device/appid_store echo "457E:100000109b521d27" >> /sys/class/fc/fc_udev_device/appid_store Link: https://lore.kernel.org/r/20210608043556.274139-4-muneendra.kumar@broadcom.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Merge tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-blockLinus Torvalds2021-06-3024-377/+1214
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver updates from Jens Axboe: "Pretty calm round, mostly just NVMe and a bit of MD: - NVMe updates (via Christoph) - improve the APST configuration algorithm (Alexey Bogoslavsky) - look for StorageD3Enable on companion ACPI device (Mario Limonciello) - allow selecting the network interface for TCP connections (Martin Belanger) - misc cleanups (Amit Engel, Chaitanya Kulkarni, Colin Ian King, Christoph) - move the ACPI StorageD3 code to drivers/acpi/ and add quirks for certain AMD CPUs (Mario Limonciello) - zoned device support for nvmet (Chaitanya Kulkarni) - fix the rules for changing the serial number in nvmet (Noam Gottlieb) - various small fixes and cleanups (Dan Carpenter, JK Kim, Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert Uytterhoeven, Daniel Wagner) - MD updates (Via Song) - iostats rewrite (Guoqing Jiang) - raid5 lock contention optimization (Gal Ofri) - Fall through warning fix (Gustavo) - Misc fixes (Gustavo, Jiapeng)" * tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block: (78 commits) nvmet: use NVMET_MAX_NAMESPACES to set nn value loop: Fix missing discard support when using LOOP_CONFIGURE nvme.h: add missing nvme_lba_range_type endianness annotations nvme: remove zeroout memset call for struct nvme-pci: remove zeroout memset call for struct nvmet: remove zeroout memset call for struct nvmet: add ZBD over ZNS backend support nvmet: add Command Set Identifier support nvmet: add nvmet_req_bio put helper for backends nvmet: add req cns error complete helper block: export blk_next_bio() nvmet: remove local variable nvmet: use nvme status value directly nvmet: use u32 type for the local variable nsid nvmet: use u32 for nvmet_subsys max_nsid nvmet: use req->cmd directly in file-ns fast path nvmet: use req->cmd directly in bdev-ns fast path nvmet: make ver stable once connection established nvmet: allow mn change if subsys not discovered nvmet: make sn stable once connection was established ...
| * | nvmet: use NVMET_MAX_NAMESPACES to set nn valueChaitanya Kulkarni2021-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For Spec regarding MNAN value:- If the controller supports Asymmetric Namespace Access Reporting, then this field shall be set to a non-zero value that is less than or equal to the NN value. Instead of using subsys->max_nsid that gets calculated dynamically, use NVMET_MAX_NAMESPACES value to report NN. This way we will maintain the MNAN value spec compliant with NN. Without this patch, code results in the following error :- [337976.409142] nvme nvme1: Invalid MNAN value 1024 Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme: remove zeroout memset call for structChaitanya Kulkarni2021-06-171-13/+6
| | | | | | | | | | | | | | | | | | | | | | | | Declare and initialize structure variables to zero values so that we can remove zeroout memset calls in the host/core.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-pci: remove zeroout memset call for structChaitanya Kulkarni2021-06-171-16/+8
| | | | | | | | | | | | | | | | | | | | | | | | Declare and initialize structure variables to zero values so that we can remove zeroout memset calls in the host/pci.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: remove zeroout memset call for structChaitanya Kulkarni2021-06-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Declare and initialize structure variables to zero values so that we can remove zeroout memset calls in the target/rdma.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: add ZBD over ZNS backend supportChaitanya Kulkarni2021-06-176-11/+707
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to communicate with a non-volatile memory subsystem using zones for NVMe protocol-based controllers. NVMeOF already support the ZNS NVMe Protocol compliant devices on the target in the passthru mode. There are generic zoned block devices like Shingled Magnetic Recording (SMR) HDDs that are not based on the NVMe protocol. This patch adds ZNS backend support for non-ZNS zoned block devices as NVMeOF targets. This support includes implementing the new command set NVME_CSI_ZNS, adding different command handlers for ZNS command set such as NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append, NVMe Zone Management Send and NVMe Zone Management Receive. With the new command set identifier, we also update the target command effects logs to reflect the ZNS compliant commands. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: add Command Set Identifier supportChaitanya Kulkarni2021-06-173-18/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NVMe TP 4056 allows controllers to support different command sets. NVMeoF target currently only supports namespaces that contain traditional logical blocks that may be randomly read and written. In some applications there is a value in exposing namespaces that contain logical blocks that have special access rules (e.g. sequentially write required namespace such as Zoned Namespace (ZNS)). In order to support the Zoned Block Devices (ZBD) backend, controllers need to have support for ZNS Command Set Identifier (CSI). In this preparation patch, we adjust the code such that it can now support the default command set identifier. We update the namespace data structure to store the CSI value which defaults to NVME_CSI_NVM that represents traditional logical blocks namespace type. The CSI support is required to implement the ZBD backend for NVMeOF with host side NVMe ZNS interface, since ZNS commands belong to the different command set than the default one. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: add nvmet_req_bio put helper for backendsChaitanya Kulkarni2021-06-173-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In current code there exists two backends which are using inline bio optimization, that adds a duplicate code for freeing the bio. For Zoned Block Device backend we also use the same optimzation and it will lead to having duplicate code in the three backends: generic bdev, passsthru, and generic zns. Add a helper function to avoid duplicate code and update the respective backends. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: add req cns error complete helperChaitanya Kulkarni2021-06-172-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We report error and complete the request when identify cns value is not handled in nvmet_execute_identify(). This error reporting is also needed for Zone Block Device backend for NVMeOF target. Add a helper nvmet_req_cns_error_compplete() to report an error and complete the request when idenitfy command cns not handled value. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: remove local variableChaitanya Kulkarni2021-06-171-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In function errno_to_nvme_status() we store the value of the NVMe status into the local variable and don't do anything useful with that but just return. Remove the local variable and return the value directly from switch. This also removed extra break statements. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: use nvme status value directlyChaitanya Kulkarni2021-06-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no point in keeping the status variable that is used only once in the function nvmet_async_events_failall(). Remove the variable and use the value directly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: use u32 type for the local variable nsidChaitanya Kulkarni2021-06-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In function nvmet_max_nsid() we calculate the max nsid by iterating over the XArray and store it in the variable nsid that has type of unsigned long. Since the value of this function is stored into the subsys->max_nsid which is of type u32, change the local variable nsid type and the return type of the same function to u32. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: use u32 for nvmet_subsys max_nsidChaitanya Kulkarni2021-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use u32 type for the nsid_max member of the nvmet_subsys structure. This avoids the type confusion when updating the subsys->nax_nsid from ns->nsid. This also matches the nvmet_ns->nsid member. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: use req->cmd directly in file-ns fast pathChaitanya Kulkarni2021-06-171-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function nvmet_file_parse_io_cmd() is called from the fast path. The local variable to that function cmd is only used once. Remove the local variable and use req->cmd directly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: use req->cmd directly in bdev-ns fast pathChaitanya Kulkarni2021-06-171-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function nvmet_bdev_parse_io_cmd() is called from the fast path. The local variable to that function cmd is only used once. Remove the local variable and use req->cmd directly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: make ver stable once connection establishedNoam Gottlieb2021-06-171-5/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | Once some host has connected to the nvmf target, make sure that the version number is stable and cannot be changed. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: allow mn change if subsys not discoveredNoam Gottlieb2021-06-174-40/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, once the subsystem's model_number is set for the first time there is no way to change it. However, as long as no connection was established to nvmf target, there is no reason for such restriction and we should allow to change the subsystem's model_number as many times as needed. In addition, in order to simplfy the changes and make the model number flow more similar to the rest of the attributes in the Identify Controller data structure, we set a default value for the model number at the initiation of the subsystem. Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: make sn stable once connection was establishedNoam Gottlieb2021-06-173-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | Once some host has connected to the target, make sure that the serial number is stable and cannot be changed. Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: change sn size and check validityNoam Gottlieb2021-06-175-18/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the NVM specification, the serial_number should be 20 bytes (bytes 23:04 of the Identify Controller data structure), and should contain only ASCII characters. In accordance, the serial_number size is changed to 20 bytes and before any attempt to store a new value in serial_number we check that the input is valid - i.e. contains only ASCII characters, is not empty and does not exceed 20 bytes. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet-fc: do not check for invalid target port in nvmet_fc_handle_fcp_rqst()Hannes Reinecke2021-06-171-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When parsing a request in nvmet_fc_handle_fcp_rqst() we should not check for invalid target ports; if we do the command is aborted from the fcp layer, causing the host to assume a transport error. Rather we should still forward this request to the nvmet layer, which will then correctly fail the command with an appropriate error status. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-fabrics: remove memset in connect io qChaitanya Kulkarni2021-06-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Declare and initialize structure variable to the zero values so that we can get rid of the zeroout memset call. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-fabrics: remove memset in connect admin qChaitanya Kulkarni2021-06-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Declare and initialize structure variable to the zero values so that we can get rid of the zeroout memset call. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-fabrics: remove memset in nvmf_reg_write32()Chaitanya Kulkarni2021-06-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Declare and initialize structure variable to the zero values so that we can get rid of the zeroout memset call. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-fabrics: remove memset in nvmf_reg_read64()Chaitanya Kulkarni2021-06-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Declare and initialize structure variable to the zero values so that we can get rid of the zeroout memset call. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-tcp: use ctrl sgl check helperChaitanya Kulkarni2021-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Use the helper to check NVMe controller's SGL support. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-pci: use ctrl sgl check helperChaitanya Kulkarni2021-06-171-2/+2
| | | | | | | | | | | | | | | | | | | | | Use the helper to check NVMe controller's SGL support. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-fc: use ctrl sgl check helperChaitanya Kulkarni2021-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Use the helper to check NVMe controller's SGL support. Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme: add a helper to check ctrl sgl supportChaitanya Kulkarni2021-06-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For various transports such as fc/tcp/pci it is common to check if NVMe SGLs are supported or not by the controller. In this preparation patch we add a helper to avoid the open coding of such checks in the various transport. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-pci: remove trailing lines for helpersChaitanya Kulkarni2021-06-171-2/+0
| | | | | | | | | | | | | | | | | | | | | Remove the extra white line at the end of the functions. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-pci: fix var. type for increasing cq_headJK Kim2021-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nvmeq->cq_head is compared with nvmeq->q_depth and changed the value and cq_phase for handling the next cq db. but, nvmeq->q_depth's type is u32 and max. value is 0x10000 when CQP.MSQE is 0xffff and io_queue_depth is 0x10000. current temp. variable for comparing with nvmeq->q_depth is overflowed when previous nvmeq->cq_head is 0xffff. in this case, nvmeq->cq_phase is not updated. so, fix data type for temp. variable to u32. Signed-off-by: JK Kim <jongkang.kim2@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-tcp: fix error codes in nvme_tcp_setup_ctrl()Dan Carpenter2021-06-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These error paths currently return success but they should return -EOPNOTSUPP. Fixes: 73ffcefcfca0 ("nvme-tcp: check sgl supported by target") Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme: factor out a nvme_validate_passthru_nsid helperChaitanya Kulkarni2021-06-161-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper nvme_validate_passthru_nsid() to validate the nsid that removes the nsid validation and error message print code from nvme_user_cmd() and nvme_user_cmd64(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme: fix grammar in the CONFIG_NVME_MULTIPATH kconfig help textGeert Uytterhoeven2021-06-161-1/+1
| | | | | | | | | | | | | | | | | | | | | Fix a singular/plural mismatch in the CONFIG_NVME_MULTIPATH help text. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme: remove superfluous bio_set_dev in nvme_requeue_workDaniel Wagner2021-06-161-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ce86dad222e9 ("nvme-multipath: reset bdev to ns head when failover") moved the reset code where the bio is added to the requeue_list for the failover path. But it left the original bio_set_dev in nvme_requeue_work. There is a second path to nvme_requee_work. It is via nvme_ns_head_submit_bio. Though we don't have to set bio->bi_bdev for this path either, as it points to the correct bdev already. Let's remove the bio_set_dev. It's updating the bio->bi_bdev with the same pointer and thus it's unnecessary. Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme: verify MNAN value if ANA is enabledDaniel Wagner2021-06-161-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The controller is required to have a non-zero MNAN value if it supports ANA: If the controller supports Asymmetric Namespace Access Reporting, then this field shall be set to a non-zero value that is less than or equal to the NN value. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>