summaryrefslogtreecommitdiffstats
path: root/net/sunrpc/xprt.c
Commit message (Collapse)AuthorAgeFilesLines
* SUNRPC: Dequeue the request from the receive queue while we're re-encodingTrond Myklebust2019-10-051-23/+31
| | | | | | | | | | | | | | | commit cc204d01262a69218b2d0db5cdea371de85871d9 upstream. Ensure that we dequeue the request from the transport receive queue while we're re-encoding to prevent issues like use-after-free when we release the bvec. Fixes: 7536908982047 ("SUNRPC: Ensure the bvecs are reset when we re-encode...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"Trond Myklebust2019-09-061-7/+0
| | | | | | | | | | | | | | commit d5711920ec6e578f51db95caa6f185f5090b865e upstream. This reverts commit a79f194aa4879e9baad118c3f8bb2ca24dbef765. The mechanism for aborting I/O is racy, since we are not guaranteed that the request is asleep while we're changing both task->tk_status and task->tk_action. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* SUNRPC: Ensure the bvecs are reset when we re-encode the RPC requestTrond Myklebust2019-07-261-0/+2
| | | | | | | | | | | | | commit 75369089820473eac45e9ddd970081901a373c08 upstream. The bvec tracks the list of pages, so if the number of pages changes due to a re-encode, we need to reset the bvec as well. Fixes: 277e4ab7d530 ("SUNRPC: Simplify TCP receive code by switching...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"Anna Schumaker2019-06-211-3/+1
| | | | | | | | | | | | | | | | | | Jon Hunter reports: "I have been noticing intermittent failures with a system suspend test on some of our machines that have a NFS mounted root file-system. Bisecting this issue points to your commit 431235818bc3 ("SUNRPC: Declare RPC timers as TIMER_DEFERRABLE") and reverting this on top of v5.2-rc3 does appear to resolve the problem. The cause of the suspend failure appears to be a long delay observed sometimes when resuming from suspend, and this is causing our test to timeout." This reverts commit 431235818bc3a919ca7487500c67c3144feece80. Reported-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* treewide: Add SPDX license identifier for missed filesThomas Gleixner2019-05-211-0/+1
| | | | | | | | | | | | | | | | | Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* SUNRPC: Update comments based on recent changesChuck Lever2019-04-251-2/+2
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Start the first major timeout calculation at task creationTrond Myklebust2019-04-251-10/+34
| | | | | | | | | | | | When calculating the major timeout for a new task, when we know that the connection has been broken, use the task->tk_start to ensure that we also take into account the time spent waiting for a slot or session slot. This ensures that we fail over soft requests relatively quickly once the connection has actually been broken, and the first requests have started to fail. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Ensure that the transport layer respect major timeoutsTrond Myklebust2019-04-251-4/+13
| | | | | | | | Ensure that when in the transport layer, we don't sleep past a major timeout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Declare RPC timers as TIMER_DEFERRABLETrond Myklebust2019-04-251-1/+3
| | | | | | | | Don't wake idle CPUs only for the purpose of servicing an RPC queue timeout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Add function rpc_sleep_on_timeout()Trond Myklebust2019-04-251-15/+21
| | | | | | | | | Clean up the RPC task sleep interfaces by replacing the task->tk_timeout 'hidden parameter' to rpc_sleep_on() with a new function that takes an absolute timeout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Refactor xprt_request_wait_receive()Trond Myklebust2019-04-251-37/+42
| | | | | | | | | Convert the transport callback to actually put the request to sleep instead of just setting a timeout. This is in preparation for rpc_sleep_on_timeout(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Fix up task signallingTrond Myklebust2019-04-251-0/+4
| | | | | | | | | | | The RPC_TASK_KILLED flag should really not be set from another context because it can clobber data in the struct task when task->tk_flags is changed non-atomically. Let's therefore swap out RPC_TASK_KILLED with an atomic flag, and add a function to set that flag and safely wake up the task. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Use the ENOTCONN error on socket disconnectTrond Myklebust2019-03-151-1/+1
| | | | | | | | | | | When the socket is closed, we currently send an EAGAIN error to all pending requests in order to ask them to retransmit. Use ENOTCONN instead, to ensure that they try to reconnect before attempting to transmit. This also helps SOFTCONN tasks to behave correctly in this situation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/flexfiles: Abort I/O early if the layout segment was invalidatedTrond Myklebust2019-03-011-0/+7
| | | | | | | | | | | | If a layout segment gets invalidated while a pNFS I/O operation is queued for transmission, then we ideally want to abort immediately. This is particularly the case when there is a large number of I/O related RPCs queued in the RPC layer, and the layout segment gets invalidated due to an ENOSPC error, or an EACCES (because the client was fenced). We may end up forced to spam the MDS with a lot of otherwise unnecessary LAYOUTERRORs after that I/O fails. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* Merge tag 'nfs-rdma-for-5.1-1' of ↵Trond Myklebust2019-02-251-4/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA client updates for 5.1 New features: - Convert rpc auth layer to use xdr_streams - Config option to disable insecure enctypes - Reduce size of RPC receive buffers Bugfixes and cleanups: - Fix sparse warnings - Check inline size before providing a write chunk - Reduce the receive doorbell rate - Various tracepoint improvements [Trond: Fix up merge conflicts] Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * SUNRPC: Introduce trace points in rpc_auth_gss.koChuck Lever2019-02-141-4/+6
| | | | | | | | | | | | | | | | | | | | Add infrastructure for trace points in the RPC_AUTH_GSS kernel module, and add a few sample trace points. These report exceptional or unexpected events, and observe the assignment of GSS sequence numbers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | SUNRPC: Convert socket page send code to use iov_iter()Trond Myklebust2019-02-201-0/+1
| | | | | | | | | | | | Simplify the page send code using iov_iter and bvecs. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Ensure rq_bytes_sent is reset before request transmissionTrond Myklebust2019-02-201-2/+0
| | | | | | | | | | | | | | When we resend a request, ensure that the 'rq_bytes_sent' is reset to zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobsTrond Myklebust2019-02-201-0/+3
|/ | | | | | | | Set memalloc_nofs_save() on all the rpciod/xprtiod jobs so that we ensure memory allocations for asynchronous rpc calls don't ever end up recursing back to the NFS layer for memory reclaim. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Address Kerberos performance/behavior regressionChuck Lever2019-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | When using Kerberos with v4.20, I've observed frequent connection loss on heavy workloads. I traced it down to the client underrunning the GSS sequence number window -- NFS servers are required to drop the RPC with the low sequence number, and also drop the connection to signal that an RPC was dropped. Bisected to commit 918f3c1fe83c ("SUNRPC: Improve latency for interactive tasks"). I've got a one-line workaround for this issue, which is easy to backport to v4.20 while a more permanent solution is being derived. Essentially, tk_owner-based sorting is disabled for RPCs that carry a GSS sequence number. Fixes: 918f3c1fe83c ("SUNRPC: Improve latency for interactive ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Ensure rq_bytes_sent is reset before request transmissionTrond Myklebust2019-01-151-0/+1
| | | | | | | | When we resend a request, ensure that the 'rq_bytes_sent' is reset to zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Remove xprt_connect_status()Trond Myklebust2018-12-181-31/+1
| | | | | | | | | | | | Over the years, xprt_connect_status() has been superseded by call_connect_status(), which now handles all the errors that xprt_connect_status() does and more. Since the latter converts all errors that it doesn't recognise to EIO, then it is time for it to be retired. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* SUNRPC: Fix disconnection racesTrond Myklebust2018-12-181-1/+4
| | | | | | | | | | | | | | | | | When the socket is closed, we need to call xprt_disconnect_done() in order to clean up the XPRT_WRITE_SPACE flag, and wake up the sleeping tasks. However, we also want to ensure that we don't wake them up before the socket is closed, since that would cause thundering herd issues with everyone piling up to retransmit before the TCP shutdown dance has completed. Only the task that holds XPRT_LOCKED needs to wake up early in order to allow the close to complete. Reported-by: Dave Wysochanski <dwysocha@redhat.com> Reported-by: Scott Mayhew <smayhew@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* SUNRPC: Fix a potential race in xprt_connect()Trond Myklebust2018-12-021-2/+9
| | | | | | | | | | | | If an asynchronous connection attempt completes while another task is in xprt_connect(), then the call to rpc_sleep_on() could end up racing with the call to xprt_wake_pending_tasks(). So add a second test of the connection state after we've put the task to sleep and set the XPRT_CONNECTING flag, when we know that there can be no asynchronous connection attempts still in progress. Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING into...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Fix a memory leak in call_encode()Trond Myklebust2018-12-021-0/+2
| | | | | | | | If we retransmit an RPC request, we currently end up clobbering the value of req->rq_rcv_buf.bvec that was allocated by the initial call to xprt_request_prepare(req). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* Merge tag 'nfs-rdma-for-4.20-1' of ↵Trond Myklebust2018-10-181-10/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/anna/linux-nfs NFS RDMA client updates for Linux 4.20 Stable bugfixes: - Reset credit grant properly after a disconnect Other bugfixes and cleanups: - xprt_release_rqst_cong is called outside of transport_lock - Create more MRs at a time and toss out old ones during recovery - Various improvements to the RDMA connection and disconnection code: - Improve naming of trace events, functions, and variables - Add documenting comments - Fix metrics and stats reporting - Fix a tracepoint sparse warning Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * sunrpc: Fix connect metricsChuck Lever2018-10-021-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | For TCP, the logic in xprt_connect_status is currently never invoked to record a successful connection. Commit 2a4919919a97 ("SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending") changed the way TCP xprt's are awoken after a connect succeeds. Instead, change connection-oriented transports to bump connect_count and compute connect_time the moment that XPRT_CONNECTED is set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()Trond Myklebust2018-09-301-0/+17
| | | | | | | | | | | | | | Add a bvec array to struct xdr_buf, and have the client allocate it when we need to receive data into pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Convert the xprt->sending queue back to an ordinary wait queueTrond Myklebust2018-09-301-17/+3
| | | | | | | | | | | | | | | | | | | | We no longer need priority semantics on the xprt->sending queue, because the order in which tasks are sent is now dictated by their position in the send queue. Note that the backlog queue remains a priority queue, meaning that slot resources are still managed in order of task priority. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Convert xprt receive queue to use an rbtreeTrond Myklebust2018-09-301-11/+82
| | | | | | | | | | | | | | | | | | If the server is slow, we can find ourselves with quite a lot of entries on the receive queue. Converting the search from an O(n) to O(log(n)) can make a significant difference, particularly since we have to hold a number of locks while searching. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCKTrond Myklebust2018-09-301-2/+5
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()Trond Myklebust2018-09-301-1/+1
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Clean up transport write space handlingTrond Myklebust2018-09-301-30/+47
| | | | | | | | | | | | | | | | Treat socket write space handling in the same way we now treat transport congestion: by denying the XPRT_LOCK until the transport signals that it has free buffer space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Turn off throttling of RPC slots for TCP socketsTrond Myklebust2018-09-301-14/+0
| | | | | | | | | | | | | | | | The theory was that we would need to grab the socket lock anyway, so we might as well use it to gate the allocation of RPC slots for a TCP socket. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCKTrond Myklebust2018-09-301-2/+2
| | | | | | | | | | | | This no longer causes them to lose their place in the transmission queue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queueTrond Myklebust2018-09-301-11/+60
| | | | | | | | | | | | | | | | Rather than forcing each and every RPC task to grab the socket write lock in order to send itself, we allow whichever task is holding the write lock to attempt to drain the entire transmit queue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queueTrond Myklebust2018-09-301-0/+11
| | | | | | | | | | | | | | Avoid memory starvation by giving RPCs that are tagged with the RPC_TASK_SWAPPER flag the highest priority. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Support for congestion control when queuing is enabledTrond Myklebust2018-09-301-36/+92
| | | | | | | | | | | | | | | | | | | | | | | | Both RDMA and UDP transports require the request to get a "congestion control" credit before they can be transmitted. Right now, this is done when the request locks the socket. We'd like it to happen when a request attempts to be transmitted for the first time. In order to support retransmission of requests that already hold such credits, we also want to ensure that they get queued first, so that we don't deadlock with requests that have yet to obtain a credit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Improve latency for interactive tasksTrond Myklebust2018-09-301-3/+24
| | | | | | | | | | | | | | | | | | | | | | One of the intentions with the priority queues was to ensure that no single process can hog the transport. The field task->tk_owner therefore identifies the RPC call's origin, and is intended to allow the RPC layer to organise queues for fairness. This commit therefore modifies the transmit queue to group requests by task->tk_owner, and ensures that we round robin among those groups. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Move RPC retransmission stat counter to xprt_transmit()Trond Myklebust2018-09-301-7/+12
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Simplify xprt_prepare_transmit()Trond Myklebust2018-09-301-16/+7
| | | | | | | | | | | | | | | | Remove the checks for whether or not we need to transmit, and whether or not a reply has been received. Those are already handled in call_transmit() itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCKTrond Myklebust2018-09-301-14/+0
| | | | | | | | | | | | If the request is still on the queue, this will be incorrect behaviour. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Treat the task and request as separate in the xprt_ops->send_request()Trond Myklebust2018-09-301-1/+1
| | | | | | | | | | | | | | When we shift to using the transmit queue, then the task that holds the write lock will not necessarily be the same as the one being transmitted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Fix up the back channel transmitTrond Myklebust2018-09-301-1/+26
| | | | | | | | | | | | | | | | Fix up the back channel code to recognise that it has already been transmitted, so does not need to be called again. Also ensure that we set req->rq_task. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Refactor RPC call encodingTrond Myklebust2018-09-301-9/+13
| | | | | | | | | | | | | | Move the call encoding so that it occurs before the transport connection etc. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Add a transmission queue for RPC requestsTrond Myklebust2018-09-301-9/+75
| | | | | | | | | | | | Add the queue that will enforce the ordering of RPC task transmission. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Distinguish between the slot allocation list and receive queueTrond Myklebust2018-09-301-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | When storing a struct rpc_rqst on the slot allocation list, we currently use the same field 'rq_list' as we use to store the request on the receive queue. Since the structure is never on both lists at the same time, this is OK. However, for clarity, let's make that a union with different names for the different lists so that we can more easily distinguish between the two states. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Refactor xprt_transmit() to remove wait for reply codeTrond Myklebust2018-09-301-22/+52
| | | | | | | | | | | | | | | | Allow the caller in clnt.c to call into the code to wait for a reply after calling xprt_transmit(). Again, the reason is that the backchannel code does not need this functionality. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Refactor xprt_transmit() to remove the reply queue codeTrond Myklebust2018-09-301-44/+83
| | | | | | | | | | | | | | Separate out the action of adding a request to the reply queue so that the backchannel code can simply skip calling it altogether. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | SUNRPC: Rename xprt->recv_lock to xprt->queue_lockTrond Myklebust2018-09-301-12/+12
| | | | | | | | | | | | We will use the same lock to protect both the transmit and receive queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>