linux.git - Linux kernel mainline tree

diff options

author	Chunguang Xu <chunguang.xu@shopee.com>	2024-03-11 10:09:27 +0800
committer	Keith Busch <kbusch@kernel.org>	2024-03-14 11:32:39 -0700
commit	de105068fead55ed5c07ade75e9c8e7f86a00d1d (patch)
tree	1762f5e1530e9b93e5a724eb3883f20228392a91 /drivers/nvme/host/trace.c
parent	2bc91743096756d7c97d10c7079617192211369b (diff)
download	linux-de105068fead55ed5c07ade75e9c8e7f86a00d1d.tar.gz linux-de105068fead55ed5c07ade75e9c8e7f86a00d1d.tar.bz2 linux-de105068fead55ed5c07ade75e9c8e7f86a00d1d.zip

nvme: fix reconnection fail due to reserved tag allocation

We found a issue on production environment while using NVMe over RDMA, admin_q reconnect failed forever while remote target and network is ok. After dig into it, we found it may caused by a ABBA deadlock due to tag allocation. In my case, the tag was hold by a keep alive request waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the request maked as idle and will not process before reset success. As fabric_q shares tagset with admin_q, while reconnect remote target, we need a tag for connect command, but the only one reserved tag was held by keep alive command which waiting inside admin_q. As a result, we failed to reconnect admin_q forever. In order to fix this issue, I think we should keep two reserved tags for admin queue. Fixes: ed01fee283a0 ("nvme-fabrics: only reserve a single tag") Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>

Diffstat (limited to 'drivers/nvme/host/trace.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: