diff options
author | Bob Pearson <rpearsonhpe@gmail.com> | 2022-06-30 14:04:22 -0500 |
---|---|---|
committer | Jason Gunthorpe <jgg@nvidia.com> | 2022-07-22 17:43:00 -0300 |
commit | 445fd4f4fb76d513de6b05b08b3a4d0bb980fc80 (patch) | |
tree | a3ca2186041f8be3d10592944b8b537e9d513d98 /drivers/infiniband/sw/rxe/rxe_comp.c | |
parent | 930119a1720075d15e4c1e478b2b9412cd9eb6ad (diff) | |
download | linux-445fd4f4fb76d513de6b05b08b3a4d0bb980fc80.tar.gz linux-445fd4f4fb76d513de6b05b08b3a4d0bb980fc80.tar.bz2 linux-445fd4f4fb76d513de6b05b08b3a4d0bb980fc80.zip |
RDMA/rxe: Fix rnr retry behavior
Currently the completer tasklet when retransmit timer or the rnr timer
fires the same flag (qp->req.need_retry) is set so that if either timer
fires it will attempt to perform a retry flow on the send queue. This has
the effect of responding to an RNR NAK at the first retransmit timer event
which might not allow the requested rnr timeout.
This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set,
prevents a retry flow until the rnr nak timer fires.
This patch fixes rnr retry errors which can be observed by running the
pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this
patch applied they do not occur.
Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Diffstat (limited to 'drivers/infiniband/sw/rxe/rxe_comp.c')
-rw-r--r-- | drivers/infiniband/sw/rxe/rxe_comp.c | 8 |
1 files changed, 7 insertions, 1 deletions
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index da3a398053b8..4fc31bb7eee6 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -114,6 +114,8 @@ void retransmit_timer(struct timer_list *t) { struct rxe_qp *qp = from_timer(qp, t, retrans_timer); + pr_debug("%s: fired for qp#%d\n", __func__, qp->elem.index); + if (qp->valid) { qp->comp.timeout = 1; rxe_run_task(&qp->comp.task, 1); @@ -730,11 +732,15 @@ int rxe_completer(void *arg) break; case COMPST_RNR_RETRY: + /* we come here if we received an RNR NAK */ if (qp->comp.rnr_retry > 0) { if (qp->comp.rnr_retry != 7) qp->comp.rnr_retry--; - qp->req.need_retry = 1; + /* don't start a retry flow until the + * rnr timer has fired + */ + qp->req.wait_for_rnr_timer = 1; pr_debug("qp#%d set rnr nak timer\n", qp_num(qp)); mod_timer(&qp->rnr_nak_timer, |