| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull block fixes from Jens Axboe:
- null_blk zone fixes (Damien, Kanchan)
- NVMe pull request from Christoph:
- improve zone revalidation (Keith Busch)
- gracefully handle zero length messages in nvme-rdma (zhenwei pi)
- nvme-fc error handling fixes (James Smart)
- nvmet tracing NULL pointer dereference fix (Chaitanya Kulkarni)"
- xsysace platform fixes (Andy)
- scatterlist type cleanup (David)
- blk-cgroup memory fixes (Gabriel)
- nbd block size update fix (Ming)
- Flush completion state fix (Ming)
- bio_add_hw_page() iteration fix (Naohiro)
* tag 'block-5.10-2020-10-30' of git://git.kernel.dk/linux-block:
blk-mq: mark flush request as IDLE in flush_end_io()
lib/scatterlist: use consistent sg_copy_buffer() return type
xsysace: use platform_get_resource() and platform_get_irq_optional()
null_blk: Fix locking in zoned mode
null_blk: Fix zone reset all tracing
nbd: don't update block size after device is started
block: advance iov_iter on bio_add_hw_page failure
null_blk: synchronization fix for zoned device
nvmet: fix a NULL pointer dereference when tracing the flush command
nvme-fc: remove nvme_fc_terminate_io()
nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery
nvme-fc: remove err_work work item
nvme-fc: track error_recovery while connecting
nvme-rdma: handle unexpected nvme completion data length
nvme: ignore zone validate errors on subsequent scans
blk-cgroup: Pre-allocate tree node on blkg_conf_prep
blk-cgroup: Fix memleak on error path
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When target side trace in turned on and flush command is issued from the
host it results in the following Oops.
[ 856.789724] BUG: kernel NULL pointer dereference, address: 0000000000000068
[ 856.790686] #PF: supervisor read access in kernel mode
[ 856.791262] #PF: error_code(0x0000) - not-present page
[ 856.791863] PGD 6d7110067 P4D 6d7110067 PUD 66f0ad067 PMD 0
[ 856.792527] Oops: 0000 [#1] SMP NOPTI
[ 856.792950] CPU: 15 PID: 7034 Comm: nvme Tainted: G OE 5.9.0nvme-5.9+ #71
[ 856.793790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214
[ 856.794956] RIP: 0010:trace_event_raw_event_nvmet_req_init+0x13e/0x170 [nvmet]
[ 856.795734] Code: 41 5c 41 5d c3 31 d2 31 f6 e8 4e 9b b8 e0 e9 0e ff ff ff 49 8b 55 00 48 8b 38 8b 0
[ 856.797740] RSP: 0018:ffffc90001be3a60 EFLAGS: 00010246
[ 856.798375] RAX: 0000000000000000 RBX: ffff8887e7d2c01c RCX: 0000000000000000
[ 856.799234] RDX: 0000000000000020 RSI: 0000000057e70ea2 RDI: ffff8887e7d2c034
[ 856.800088] RBP: ffff88869f710578 R08: ffff888807500d40 R09: 00000000fffffffe
[ 856.800951] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff8887e7d2c034
[ 856.801807] R13: ffff88869f7105c8 R14: 0000000000000040 R15: ffff88869f710440
[ 856.802667] FS: 00007f6a22bd8780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000
[ 856.803635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 856.804367] CR2: 0000000000000068 CR3: 00000006d73e0000 CR4: 00000000003506e0
[ 856.805283] Call Trace:
[ 856.805613] nvmet_req_init+0x27c/0x480 [nvmet]
[ 856.806200] nvme_loop_queue_rq+0xcb/0x1d0 [nvme_loop]
[ 856.806862] blk_mq_dispatch_rq_list+0x123/0x7b0
[ 856.807459] ? kvm_sched_clock_read+0x14/0x30
[ 856.808025] __blk_mq_sched_dispatch_requests+0xc7/0x170
[ 856.808708] blk_mq_sched_dispatch_requests+0x30/0x60
[ 856.809372] __blk_mq_run_hw_queue+0x70/0x100
[ 856.809935] __blk_mq_delay_run_hw_queue+0x156/0x170
[ 856.810574] blk_mq_run_hw_queue+0x86/0xe0
[ 856.811104] blk_mq_sched_insert_request+0xef/0x160
[ 856.811733] blk_execute_rq+0x69/0xc0
[ 856.812212] ? blk_mq_rq_ctx_init+0xd0/0x230
[ 856.812784] nvme_execute_passthru_rq+0x57/0x130 [nvme_core]
[ 856.813461] nvme_submit_user_cmd+0xeb/0x300 [nvme_core]
[ 856.814099] nvme_user_cmd.isra.82+0x11e/0x1a0 [nvme_core]
[ 856.814752] blkdev_ioctl+0x1dc/0x2c0
[ 856.815197] block_ioctl+0x3f/0x50
[ 856.815606] __x64_sys_ioctl+0x84/0xc0
[ 856.816074] do_syscall_64+0x33/0x40
[ 856.816533] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 856.817168] RIP: 0033:0x7f6a222ed107
[ 856.817617] Code: 44 00 00 48 8b 05 81 cd 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 8
[ 856.819901] RSP: 002b:00007ffca848f058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 856.820846] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6a222ed107
[ 856.821726] RDX: 00007ffca848f060 RSI: 00000000c0484e43 RDI: 0000000000000003
[ 856.822603] RBP: 0000000000000003 R08: 000000000000003f R09: 0000000000000005
[ 856.823478] R10: 00007ffca848ece0 R11: 0000000000000202 R12: 00007ffca84912d3
[ 856.824359] R13: 00007ffca848f4d0 R14: 0000000000000002 R15: 000000000067e900
[ 856.825236] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corel
Move the nvmet_req_init() tracepoint after we parse the command in
nvmet_req_init() so that we can get rid of the duplicate
nvmet_find_namespace() call.
Rename __assign_disk_name() -> __assign_req_name(). Now that we call
tracepoint after parsing the command simplify the newly added
__assign_req_name() which fixes this bug.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
__nvme_fc_terminate_io() is now called by only 1 place, in reset_work.
Consoldate and move the functionality of terminate_io into reset_work.
In reset_work, rather than calling the create_association directly,
schedule the connect work element to do its thing. After scheduling,
flush the connect work element to continue with semantic of not
returning until connect has been attempted at least once.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
nvme_fc_error_recovery() special cases handling when in CONNECTING state
and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself
special cases CONNECTING state and calls the routine to abort outstanding
ios.
Simplify the sequence by putting the call to abort outstanding I/Os
directly in nvme_fc_error_recovery.
Move the location of __nvme_fc_abort_outstanding_ios(), and
nvme_fc_terminate_exchange() which is called by it, to avoid adding
function prototypes for nvme_fc_error_recovery().
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
err_work was created to handle errors (mainly I/O timeouts) while in
CONNECTING state. The flag for err_work_active is also unneeded.
Remove err_work_active and err_work. The actions to abort I/Os are moved
inline to nvme_error_recovery().
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Whenever there are errors during CONNECTING, the driver recovers by
aborting all outstanding ios and counts on the io completion to fail them
and thus the connection/association they are on. However, the connection
failure depends on a failure state from the core routines. Not all
commands that are issued by the core routine are guaranteed to cause a
failure of the core routine. They may be treated as a failure status and
the status is then ignored.
As such, whenever the transport enters error_recovery while CONNECTING,
it will set a new flag indicating an association failed. The
create_association routine which creates and initializes the controller,
will monitor the state of the flag as well as the core routine error
status and ensure the association fails if there was an error.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Receiving a zero length message leads to the following warnings because
the CQE is processed twice:
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 0 at lib/refcount.c:28
RIP: 0010:refcount_warn_saturate+0xd9/0xe0
Call Trace:
<IRQ>
nvme_rdma_recv_done+0xf3/0x280 [nvme_rdma]
__ib_process_cq+0x76/0x150 [ib_core]
...
Sanity check the received data length, to avoids this.
Thanks to Chao Leng & Sagi for suggestions.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Revalidating nvme zoned namespaces requires IO commands, and there are
controller states that prevent IO. For example, a sanitize in progress
is required to fail all IO, but we don't want to remove a namespace
we've previously added just because the controller is in such a state.
Suppress the error in this case.
Reported-by: Michael Nguyen <michael.nguyen@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
handler triggers a completion and another thread does rdma_connect() or
the handler directly calls rdma_connect().
In all cases rdma_connect() needs to hold the handler_mutex, but when
handler's are invoked this is already held by the core code. This causes
ULPs using the 2nd method to deadlock.
Provide a rdma_connect_locked() and have all ULPs call it from their
handlers.
Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com
Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had several complaints about a 10s reconnect delay (the default)
when there was an error while there is connectivity to a subsystem.
The max_reconnects and reconnect_delay are set in common code prior to
calling the transport to create the controller.
This change checks if the default reconnect delay is being used, and if
so, it adjusts it to a shorter period (2s) for the nvme-fc transport.
It does so by calculating the controller loss tmo window, changing the
value of the reconnect delay, and then recalculating the maximum number
of reconnect attempts allowed.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On reconnect, the code currently does not freeze the controller before
possibly updating the number hw queues for the controller.
Add the freeze before updating the number of hw queues. Note: the queues
are already started and remain started through the reconnect.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The loop that backs out of hw io queue creation continues through index
0, which corresponds to the admin queue as well.
Fix the loop so it only proceeds through indexes 1..n which correspond to
I/O queues.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, an I/O timeout unconditionally invokes
nvme_fc_error_recovery() which checks for LIVE or CONNECTING state. If
live, the routine resets the controller which initiates a reconnect -
which is valid. If CONNECTING, err_work is scheduled. Err_work then
calls the terminate_io routine, which also checks for CONNECTING and
noops any further action on outstanding I/O. The result is nothing
happened to the timed out io. As such, if the command was dropped on
the wire, it will never timeout / complete, and the connect process
will hang.
Change the behavior of the io timeout routine to unconditionally abort
the I/O. I/O completion handling will note that an io failed due to an
abort and will terminate the connection / association as needed. If the
abort was unable to happen, continue with a call to
nvme_fc_error_recovery(). To ensure something different happens in
nvme_fc_error_recovery() rework it so at it will abort all I/Os on the
association to force a failure.
As I/O aborts now may occur outside of delete_association, counting for
completion must be wary and only count those aborted during
delete_association when TERMIO is set on the controller.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default, we set the passthru request allocation flag such that it
returns the error in the following code path and we fail the I/O when
BLK_MQ_REQ_NOWAIT is used for request allocation :-
nvme_alloc_request()
blk_mq_alloc_request()
blk_mq_queue_enter()
if (flag & BLK_MQ_REQ_NOWAIT)
return -EBUSY; <-- return if busy.
On some controllers using BLK_MQ_REQ_NOWAIT ends up in I/O error where
the controller is perfectly healthy and not in a degraded state.
Block layer request allocation does allow us to wait instead of
immediately returning the error when we BLK_MQ_REQ_NOWAIT flag is not
used. This has shown to fix the I/O error problem reported under
heavy random write workload.
Remove the BLK_MQ_REQ_NOWAIT parameter for passthru request allocation
which resolves this issue.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clean up some confusing elements of nvmet_passthru_map_sg() by returning
early if the request is greater than the maximum bio size. This allows
us to drop the sg_cnt variable.
This should not result in any functional change but makes the code
clearer and more understandable. The original code allocated a truncated
bio then would return EINVAL when bio_add_pc_page() filled that bio. The
new code just returns EINVAL early if this would happen.
Fixes: c1fef73f793b ("nvmet: add passthru code to process commands")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Suggested-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nvmet_passthru_map_sg() only supports mapping a single BIO, not a chain
so the effective maximum transfer should also be limitted by
BIO_MAX_PAGES (presently this works out to 1MB).
For PCI passthru devices the max_sectors would typically be more
limitting than BIO_MAX_PAGES, but this may not be true for all passthru
devices.
Fixes: c1fef73f793b ("nvmet: add passthru code to process commands")
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When connecting a controller with a zero kato value using the following
command line
nvme connect -t tcp -n NQN -a ADDR -s PORT --keep-alive-tmo=0
the warning below can be reproduced:
WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90
with trace:
mod_delayed_work_on+0x59/0x90
nvmet_update_cc+0xee/0x100 [nvmet]
nvmet_execute_prop_set+0x72/0x80 [nvmet]
nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp]
nvmet_tcp_io_work+0x63f/0xb2d [nvmet_tcp]
...
This is caused by queuing up an uninitialized work. Althrough the
keep-alive timer is disabled during allocating the controller (fixed in
0d3b6a8d213a), ka_work still has a chance to run (called by
nvmet_start_ctrl).
Fixes: 0d3b6a8d213a ("nvmet: Disable keep-alive timer when kato is cleared to 0h")
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Like commit 5611ec2b9814 ("nvme-pci: prevent SK hynix PC400 from using
Write Zeroes command"), Sandisk Skyhawk has the same issue:
[ 6305.633887] blk_update_request: operation not supported error, dev nvme0n1, sector 340812032 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
So also disable Write Zeroes command on Sandisk Skyhawk.
BugLink: https://bugs.launchpad.net/bugs/1899503
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
| |
The request's rq_disk isn't set for passthrough IO commands, so tracing
uses qid 0 for these which incorrectly decodes as an admin command. Use
the request_queue's queuedata instead since that value is always set for
the IO queues, and never set for the admin queue.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A crash happened due to injecting error test.
When a CQE has incorrect command id due do an error injection, the host
may find a request which is already freed. Dereferencing req->mr->rkey
causes a crash in nvme_rdma_process_nvme_rsp because the mr is already
freed.
Add a check for the mr to fix it.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A crash can happened when a connect is rejected. The host establishes
the connection after received ConnectReply, and then continues to send
the fabrics Connect command. If the controller does not receive the
ReadyToUse capsule, host may receive a ConnectReject reply.
Call nvme_rdma_destroy_queue_ib after the host received the
RDMA_CM_EVENT_REJECTED event. Then when the fabrics Connect command
times out, nvme_rdma_timeout calls nvme_rdma_complete_rq to fail the
request. A crash happenes due to use after free in
nvme_rdma_complete_rq.
nvme_rdma_destroy_queue_ib is redundant when handling the
RDMA_CM_EVENT_REJECTED event as nvme_rdma_destroy_queue_ib is already
called in connection failure handler.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
|
|
|
|
|
|
|
| |
Translate zoned resource errors to the appropriate blk_status_t.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull block driver updates from Jens Axboe:
"Here are the driver updates for 5.10.
A few SCSI updates in here too, in coordination with Martin as they
depend on core block changes for the shared tag bitmap.
This contains:
- NVMe pull requests via Christoph:
- fix keep alive timer modification (Amit Engel)
- order the PCI ID list more sensibly (Andy Shevchenko)
- cleanup the open by controller helper (Chaitanya Kulkarni)
- use an xarray for the CSE log lookup (Chaitanya Kulkarni)
- support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
- fix nvme_ns_report_zones (Christoph Hellwig)
- add a sanity check to nvmet-fc (James Smart)
- fix interrupt allocation when too many polled queues are
specified (Jeffle Xu)
- small nvmet-tcp optimization (Mark Wunderlich)
- fix a controller refcount leak on init failure (Chaitanya
Kulkarni)
- misc cleanups (Chaitanya Kulkarni)
- major refactoring of the scanning code (Christoph Hellwig)
- MD updates via Song:
- Bug fixes in bitmap code, from Zhao Heming
- Fix a work queue check, from Guoqing Jiang
- Fix raid5 oops with reshape, from Song Liu
- Clean up unused code, from Jason Yan
- Discard improvements, from Xiao Ni
- raid5/6 page offset support, from Yufen Yu
- Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap,
Hannes)
- null_blk open/active zone limit support (Niklas)
- Set of bcache updates (Coly, Dongsheng, Qinglang)"
* tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits)
md/raid5: fix oops during stripe resizing
md/bitmap: fix memory leak of temporary bitmap
md: fix the checking of wrong work queue
md/bitmap: md_bitmap_get_counter returns wrong blocks
md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
md/raid0: remove unused function is_io_in_chunk_boundary()
nvme-core: remove extra condition for vwc
nvme-core: remove extra variable
nvme: remove nvme_identify_ns_list
nvme: refactor nvme_validate_ns
nvme: move nvme_validate_ns
nvme: query namespace identifiers before adding the namespace
nvme: revalidate zone bitmaps in nvme_update_ns_info
nvme: remove nvme_update_formats
nvme: update the known admin effects
nvme: set the queue limits in nvme_update_ns_info
nvme: remove the 0 lba_shift check in nvme_update_ns_info
nvme: clean up the check for too large logic block sizes
nvme: freeze the queue over ->lba_shift updates
nvme: factor out a nvme_configure_metadata helper
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In nvme_set_queue_limits() we initialize vwc to false and later add
a condition to set vwc true. The value of the vwc can be declare
initialized which makes all the blk_queue_XXX() calls uniform.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In nvme_validate_ns() the exra variable ctrl is used only twice.
Using ns->ctrl directly still maintains the redability and original
length of the lines in the code. Get rid of the extra variable ctrl &
use ns->ctrl directly.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Just fold it into the only caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Move the logic to revalidate the block_device size or remove the
namespace from the caller into nvme_validate_ns. This removes
the return value and thus the status code translation. Additionally
it also catches non-permanent errors from nvme_update_ns_info using
the existing logic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move nvme_validate_ns just above its only remaining caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Check the namespace identifier list first thing when scanning namespaces.
This keeps the code to query the CSI common between the alloc and validate
path, and helps to structure the code better for multiple command set
support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| | |
Consolidate the two calls into a single place.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that the queue is frozen before updating ->lba_shift we can't hit the
invalid references mentioned in the comment any more. More importantly
this code would not have helped us if the format was changed by another
controller or through implementation defined back channels.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A Format NVM command can change the capabilities of namespaces, while
Sanitize does change the Logical Block Content and must be serialized.
Also remove CSUPP bit for Format - it is not a mandatory command,
and we don't check for the bit anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Only set the queue limits once we have the real block size. This also
updates the limits on a rescan if needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We can no longer reach this code if Identify Namespace failed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Use a single statement to set both the capacity and fake block size
instead of two.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Ensure that there can't be any I/O in flight went we change the disk
geometry in nvme_update_ns_info, most notable the LBA size by lifting
the queue free from nvme_update_disk_info into the caller
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Factor out a helper from nvme_update_ns_info that configures the
per-namespaces metadata and PI settings. Also make sure the helpers
clear the flags explicitly instead of all of ->features to allow for
potentially reusing ->features for future non-metadata flags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Check if the namespace actually exists as the very first thing and don't
bother with any extra work if not. This should speed up and simplify
the sequential scanning for NVMe 1.0 devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Move the check from the two callers into the common helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rename __nvme_revalidate_disk to nvme_update_ns_info and pass a
namespace instead of the gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rename _nvme_revalidate_disk to nvme_validate_ns to better describe
what the function does, and pass the struct nvme_ns instead of the
gendisk to better match the call chain.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use a slightly more descriptive name to enable reusing nvme_validate_ns
in the next patch for a lower level function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The queue can trivially be derived from the nvme_ns structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The removal of the ->revalidate_disk method broke the initialization of
the zone bitmaps, as nvme_revalidate_disk now never gets called during
initialization.
Move the zone related code from nvme_revalidate_disk into a new helper in
zns.c, and call it from nvme_alloc_ns in addition to nvme_validate_ns to
ensure the zone bitmaps are initialized during probe.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The function nvme_init_ctrl() gets the ctrl reference & when it fails it
does put the ctrl reference in the error unwind code.
When creating loop ctrl in nvme_loop_create_ctrl() if nvme_init_ctrl()
returns non zero (i.e. error) value it jumps to the "out_put_ctrl" label
which calls nvme_put_ctrl(), that will lead to douple ctrl put in error
unwind path.
Update nvme_loop_create_ctrl() such that this patch removes the
"out_put_ctrl" label, add a new "out" label after nvme_put_ctrl() in
error unwind path and jump to newly added label when nvme_init_ctrl()
call retuns an error.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
One queue will be reserved for non-polled IO when nvme.poll_queues is
greater or equal than the number of IO queues that the nvme controller
can provide. Currently the reserved queue for non-polled IO will reuse
the interrupt used by admin queue in this case, e.g, vector 0.
This can work and the performance may not be an issue since the admin
queue is used unfrequently. However this behaviour may be inconsistent
with that when nvme.poll_queues is smaller than the number of IO
queues available.
Thus allocate separate interrupt for this reserved queue, and thus make
the behaviour consistent.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
[hch: minor cleanups, mostly to the pre-existing surrounding code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
nvme_submit_sync_cmd can return positive NVMe error codes in addition to
the negative Linux error code, which are currently ignored. Fix this
by removing __nvme_ns_report_zones and handling the errors from
nvme_submit_sync_cmd in the caller instead of multiplexing the return
value and the number of zones reported into a single return value.
Fixes: 240e6ee272c0 ("nvme: support for zoned namespaces")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A hostport port pointer is allowed to be NULL as it is not allocated if
the lldd does not support the new interfaces for NVME LS request support.
The hostport free routine validates the handle but forgot to validate the
hostport pointer.
Validate the hostport pointer before using it to validate the handle.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In the default passthru implementation NVMeOF target passthru ctrl is
not capable of handling Zoned Namespaces (ZNS).
Update the nvmet_parse_pasthru_admin_cmd() to allow
NVME_ID_CNS_CS_CTRL/NVME_CIS_ZNS and NVME_ID_CNS_CS_NS/NVME_CIS_ZNS.
With this addition NVMeOF Passthru allows Zoned Namespaces.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A user may modify the kato by a set features cmd. To properly deal
with races or a kato value of 0 (no keep alive enabled) change
nvmet_set_feat_kato to first disable the timer, then set the value
and then re-enable the timer.
Signed-off-by: Amit Engel <amit.engel@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|