linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs	Linus Torvalds	2019-09-26	29	-580/+835
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - Dequeue the request from the receive queue while we're re-encoding # v4.20+ - Fix buffer handling of GSS MIC without slack # 5.1 Features: - Increase xprtrdma maximum transport header and slot table sizes - Add support for nfs4_call_sync() calls using a custom rpc_task_struct - Optimize the default readahead size - Enable pNFS filelayout LAYOUTGET on OPEN Other bugfixes and cleanups: - Fix possible null-pointer dereferences and memory leaks - Various NFS over RDMA cleanups - Various NFS over RDMA comment updates - Don't receive TCP data into a reset request buffer - Don't try to parse incomplete RPC messages - Fix congestion window race with disconnect - Clean up pNFS return-on-close error handling - Fixes for NFS4ERR_OLD_STATEID handling" * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits) pNFS/filelayout: enable LAYOUTGET on OPEN NFS: Optimise the default readahead size NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE NFSv4: Fix OPEN_DOWNGRADE error handling pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid NFSv4: Add a helper to increment stateid seqids NFSv4: Handle RPC level errors in LAYOUTRETURN NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close NFSv4: Clean up pNFS return-on-close error handling pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors NFS: remove unused check for negative dentry NFSv3: use nfs_add_or_obtain() to create and reference inodes NFS: Refactor nfs_instantiate() for dentry referencing callers SUNRPC: Fix congestion window race with disconnect SUNRPC: Don't try to parse incomplete RPC messages SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic SUNRPC: Fix buffer handling of GSS MIC without slack SUNRPC: RPC level errors should always set task->tk_rpc_status SUNRPC: Don't receive TCP data into a request buffer that has been reset ...
\| *	pNFS/filelayout: enable LAYOUTGET on OPEN	Olga Kornievskaia	2019-09-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the flag to the filelayout driver to add LAYOUTGET to the OPEN compound. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFS: Optimise the default readahead size	Trond Myklebust	2019-09-24	2	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the years since the max readahead size was fixed in NFS, a number of things have happened: - Users can now set the value directly using /sys/class/bdi - NFS max supported block sizes have increased by several orders of magnitude from 64K to 1MB. - Disk access latencies are orders of magnitude faster due to SSD + NVME. In particular note that if the server is advertising 1MB as the optimal read size, as that will set the readahead size to 15MB. Let's therefore adjust down, and try to default to VM_READAHEAD_PAGES. However let's inform the VM about our preferred block size so that it can choose to round up in cases where that makes sense. Reported-by: Alkis Georgopoulos <alkisg@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU	Trond Myklebust	2019-09-20	1	-5/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a LOCKU request receives a NFS4ERR_OLD_STATEID, then bump the seqid before resending. Ensure we only bump the seqid by 1. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE	Trond Myklebust	2019-09-20	3	-21/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a CLOSE or OPEN_DOWNGRADE operation receives a NFS4ERR_OLD_STATEID then bump the seqid before resending. Ensure we only bump the seqid by 1. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFSv4: Fix OPEN_DOWNGRADE error handling	Trond Myklebust	2019-09-20	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If OPEN_DOWNGRADE returns a state error, then we want to initiate state recovery in addition to marking the stateid as closed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid	Trond Myklebust	2019-09-20	3	-7/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a LAYOUTRETURN receives a reply of NFS4ERR_OLD_STATEID then assume we've missed an update, and just bump the stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFSv4: Add a helper to increment stateid seqids	Trond Myklebust	2019-09-20	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a helper function to increment stateid seqids according to the rules specified in RFC5661 Section 8.2.2. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFSv4: Handle RPC level errors in LAYOUTRETURN	Trond Myklebust	2019-09-20	2	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Handle RPC level errors by assuming that the RPC call was successful. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close	Trond Myklebust	2019-09-20	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the server sends a NFS4ERR_DELAY, then allow the caller to retry. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFSv4: Clean up pNFS return-on-close error handling	Trond Myklebust	2019-09-20	3	-56/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both close and delegreturn have identical code to handle pNFS return-on-close. This patch refactors that code and places it in pnfs.c Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors	Trond Myklebust	2019-09-20	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IF the server rejected our layout return with a state error such as NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want to clear out all the remaining layout segments and mark that stateid as invalid. Fixes: 1c5bd76d17cca ("pNFS: Enable layoutreturn operation for...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFS: remove unused check for negative dentry	Benjamin Coddington	2019-09-20	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This check has been hanging out since we used to have parallel paths to add dentry in nfs_create(), but that hasn't been the case for some years. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFSv3: use nfs_add_or_obtain() to create and reference inodes	Benjamin Coddington	2019-09-20	1	-9/+36
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFS: Refactor nfs_instantiate() for dentry referencing callers	Benjamin Coddington	2019-09-20	2	-14/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases for directory inode"), nfs_instantiate() may succeed without actually instantiating the dentry that was passed in. That can be problematic for some callers in NFSv3, so this patch breaks things up so we can get the actual dentry obtained. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	SUNRPC: Fix congestion window race with disconnect	Chuck Lever	2019-09-20	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the congestion window closes just as the transport disconnects, a reconnect is never driven because: 1. The XPRT_CONG_WAIT flag prevents tasks from taking the write lock 2. There's no wake-up of the first task on the xprt->sending queue To address this, clear the congestion wait flag as part of completing a disconnect. Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	SUNRPC: Don't try to parse incomplete RPC messages	Trond Myklebust	2019-09-20	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the copy of the RPC reply into our buffers did not complete, and we could end up with a truncated message. In that case, just resend the call. Fixes: a0584ee9aed80 ("SUNRPC: Use struct xdr_stream when decoding...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic	Benjamin Coddington	2019-09-20	3	-24/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Let the name reflect the single use. The function now assumes the GSS MIC is the last object in the buffer. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	SUNRPC: Fix buffer handling of GSS MIC without slack	Benjamin Coddington	2019-09-20	1	-9/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GSS Message Integrity Check data for krb5i may lie partially in the XDR reply buffer's pages and tail. If so, we try to copy the entire MIC into free space in the tail. But as the estimations of the slack space required for authentication and verification have improved there may be less free space in the tail to complete this copy -- see commit 2c94b8eca1a2 ("SUNRPC: Use au_rslack when computing reply buffer size"). In fact, there may only be room in the tail for a single copy of the MIC, and not part of the MIC and then another complete copy. The real world failure reported is that `ls` of a directory on NFS may sometimes return -EIO, which can be traced back to xdr_buf_read_netobj() failing to find available free space in the tail to copy the MIC. Fix this by checking for the case of the MIC crossing the boundaries of head, pages, and tail. If so, shift the buffer until the MIC is contained completely within the pages or tail. This allows the remainder of the function to create a sub buffer that directly address the complete MIC. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # v5.1 Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	SUNRPC: RPC level errors should always set task->tk_rpc_status	Trond Myklebust	2019-09-17	2	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that we set task->tk_rpc_status for all RPC level errors so that the caller can distinguish between those and server reply status errors. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	SUNRPC: Don't receive TCP data into a request buffer that has been reset	Trond Myklebust	2019-09-17	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we've removed the request from the receive list, and have added it back after resetting the request receive buffer, then we should only receive message data if it is a new reply (i.e. if transport->recv.copied is zero). Fixes: 277e4ab7d530b ("SUNRPC: Simplify TCP receive code by switching...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	SUNRPC: Dequeue the request from the receive queue while we're re-encoding	Trond Myklebust	2019-09-17	3	-26/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that we dequeue the request from the transport receive queue while we're re-encoding to prevent issues like use-after-free when we release the bvec. Fixes: 7536908982047 ("SUNRPC: Ensure the bvecs are reset when we re-encode...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Send Queue size grows after a reconnect	Chuck Lever	2019-08-26	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eli Dorfman reports that after a series of idle disconnects, an RPC/RDMA transport becomes unusable (rdma_create_qp returns -ENOMEM). Problem was tracked down to increasing Send Queue size after each reconnect. The rdma_create_qp() API does not promise to leave its @qp_init_attr parameter unaltered. In fact, some drivers do modify one or more of its fields. Thus our calls to rdma_create_qp must use a fresh copy of ib_qp_init_attr each time. This fix is appropriate for kernels dating back to late 2007, though it will have to be adapted, as the connect code has changed over the years. Reported-by: Eli Dorfman <eli@vastdata.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Clear xprt->reestablish_timeout on close	Chuck Lever	2019-08-26	3	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that the re-establishment delay does not grow exponentially on each good reconnect. This probably should have been part of commit 675dd90ad093 ("xprtrdma: Modernize ops->connect"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Recycle MRs after disconnect	Chuck Lever	2019-08-26	3	-9/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The optimization done in "xprtrdma: Simplify rpcrdma_mr_pop" was a bit too optimistic. MRs left over after a reconnect still need to be recycled, not added back to the free list, since they could be in flight or actually fully registered. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFS: Have nfs4_proc_get_lease_time() call nfs4_call_sync_custom()	Anna Schumaker	2019-08-22	1	-10/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes some code duplication, since both functions were doing the same thing. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFS: Have nfs41_proc_secinfo_no_name() call nfs4_call_sync_custom()	Anna Schumaker	2019-08-22	1	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to use the custom rpc_task_setup here to set the RPC_TASK_NO_ROUND_ROBIN flag on the RPC call. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFS: Have nfs41_proc_reclaim_complete() call nfs4_call_sync_custom()	Anna Schumaker	2019-08-22	1	-11/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An async call followed by an rpc_wait_for_completion() is basically the same as a synchronous call, so we can use nfs4_call_sync_custom() to keep our custom callback ops and the RPC_TASK_NO_ROUND_ROBIN flag. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFS: Have _nfs4_proc_secinfo() call nfs4_call_sync_custom()	Anna Schumaker	2019-08-22	1	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We do this to set the RPC_TASK_NO_ROUND_ROBIN flag in the task_setup structure Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFS: Have nfs4_proc_setclientid() call nfs4_call_sync_custom()	Anna Schumaker	2019-08-22	1	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Rather than running the task manually Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFS: Add an nfs4_call_sync_custom() function	Anna Schumaker	2019-08-22	1	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a few cases where we need to manually configure the rpc_task_setup structure to get the behavior we want. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	NFSv4: Fix a memory leak bug	Wenwen Wang	2019-08-21	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In nfs4_try_migration(), if nfs4_begin_drain_session() fails, the previously allocated 'page' and 'locations' are not deallocated, leading to memory leaks. To fix this issue, go to the 'out' label to free 'page' and 'locations' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Optimize rpcrdma_post_recvs()	Chuck Lever	2019-08-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Micro-optimization: In rpcrdma_post_recvs, since commit e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)"), the common case is to return without doing anything. Found with perf. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Inline XDR chunk encoder functions	Chuck Lever	2019-08-21	1	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Micro-optimization: Save the cost of three function calls during transport header encoding. These were "noinline" before to generate more meaningful call stacks during debugging, but this code is now pretty stable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Fix bc_max_slots return value	Chuck Lever	2019-08-21	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the moment the returned value just happens to be correct because the current backchannel server implementation does not vary the number of credits it offers. The spec does permit this value to change during the lifetime of a connection, however. The actual maximum is fixed for all RPC/RDMA transports, because each transport instance has to pre-allocate the resources for processing BC requests. That's the value that should be returned. Fixes: 7402a4fedc2b ("SUNRPC: Fix up backchannel slot table ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Clean up xprt_rdma_set_connect_timeout()	Chuck Lever	2019-08-21	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up: The function name should match the documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Use an llist to manage free rpcrdma_reps	Chuck Lever	2019-08-21	2	-59/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rpcrdma_rep objects are removed from their free list by only a single thread: the Receive completion handler. Thus that free list can be converted to an llist, where a single-threaded consumer and a multi-threaded producer (rpcrdma_buffer_put) can both access the llist without the need for any serialization. This eliminates spin lock contention between the Receive completion handler and rpcrdma_buffer_get, and makes the rep consumer wait- free. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Remove rpcrdma_buffer::rb_mrlock	Chuck Lever	2019-08-21	3	-18/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up: Now that the free list is used sparingly, get rid of the separate spin lock protecting it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Cache free MRs in each rpcrdma_req	Chuck Lever	2019-08-21	5	-14/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of a globally-contended MR free list, cache MRs in each rpcrdma_req as they are released. This means acquiring and releasing an MR will be lock-free in the common case, even outside the transport send lock. The original idea of per-rpcrdma_req MR free lists was suggested by Shirley Ma <shirley.ma@oracle.com> several years ago. I just now figured out how to make that idea work with on-demand MR allocation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Ensure creating an MR does not trigger FS writeback	Chuck Lever	2019-08-20	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Probably would be good to also pass GFP flags to ib_alloc_mr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Move rpcrdma_mr_get out of frwr_map	Chuck Lever	2019-08-20	5	-44/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor: Retrieve an MR and handle error recovery entirely in rpc_rdma.c, as this is not a device-specific function. Note that since commit 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue"), the xprt_transmit function handles the cond_resched. The transport no longer has to do this itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Combine rpcrdma_mr_put and rpcrdma_mr_unmap_and_put	Chuck Lever	2019-08-20	3	-28/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up. There is only one remaining rpcrdma_mr_put call site, and it can be directly replaced with unmap_and_put because mr->mr_dir is set to DMA_NONE just before the call. Now all the call sites do a DMA unmap, and we can just rename mr_unmap_and_put to mr_put, which nicely matches mr_get. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Simplify rpcrdma_mr_pop	Chuck Lever	2019-08-20	4	-21/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up: rpcrdma_mr_pop call sites check if the list is empty first. Let's replace the list_empty with less costly logic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Toggle XPRT_CONGESTED in xprtrdma's slot methods	Chuck Lever	2019-08-20	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 48be539dd44a ("xprtrdma: Introduce ->alloc_slot call-out for xprtrdma") added a separate alloc_slot and free_slot to the RPC/RDMA transport. Later, commit 75891f502f5f ("SUNRPC: Support for congestion control when queuing is enabled") modified the generic alloc/free_slot methods, but neglected the methods in xprtrdma. Found via code review. Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Rename rpcrdma_buffer::rb_all	Chuck Lever	2019-08-20	2	-19/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up: There are other "all" list heads. For code clarity distinguish this one as for use only for MRs by renaming it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Rename CQE field in Receive trace points	Chuck Lever	2019-08-20	2	-11/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the field name the same for all trace points that handle pointers to struct rpcrdma_rep. That makes it easy to grep for matching rep points in trace output. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Boost client's max slot table size to match Linux server	Chuck Lever	2019-08-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've heard rumors of an NFS/RDMA server implementation that has a default credit limit of 1024. The client's default setting remains at 128. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Boost maximum transport header size	Chuck Lever	2019-08-20	2	-14/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although I haven't seen any performance results that justify it, I've received several complaints that NFS/RDMA no longer supports a maximum rsize and wsize of 1MB. These days it is somewhat smaller. To simplify the logic that determines whether a chunk list is necessary, the implementation uses a fixed maximum size of the transport header. Currently that maximum size is 256 bytes, one quarter of the default inline threshold size for RPC/RDMA v1. Since commit a78868497c2e ("xprtrdma: Reduce max_frwr_depth"), the size of chunks is also smaller to take advantage of inline page lists in device internal MR data structures. The combination of these two design choices has reduced the maximum NFS rsize and wsize that can be used for most RNIC/HCAs. Increasing the maximum transport header size and the maximum number of RDMA segments it can contain increases the negotiated maximum rsize/wsize on common RNIC/HCAs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Fix calculation of ri_max_segs again	Chuck Lever	2019-08-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 302d3deb206 ("xprtrdma: Prevent inline overflow") added this calculation back in 2016, but got it wrong. I tested only the lower bound, which is why there is a max_t there. The upper bound should be rounded up too. Now, when using DIV_ROUND_UP, that takes care of the lower bound as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
\| *	xprtrdma: Update obsolete comment	Chuck Lever	2019-08-20	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Comment was made obsolete by commit 8cec3dba76a4 ("xprtrdma: rpcrdma_regbuf alignment"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>