summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2019-01-0263-1995/+1518
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Yet another double DMA-unmap # v4.20 Features: - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG - Per-xprt rdma receive workqueues - Drop support for FMR memory registration - Make port= mount option optional for RDMA mounts Other bugfixes and cleanups: - Remove unused nfs4_xdev_fs_type declaration - Fix comments for behavior that has changed - Remove generic RPC credentials by switching to 'struct cred' - Fix crossing mountpoints with different auth flavors - Various xprtrdma fixes from testing and auditing the close code - Fixes for disconnect issues when using xprtrdma with krb5 - Clean up and improve xprtrdma trace points - Fix NFS v4.2 async copy reboot recovery" * tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) sunrpc: convert to DEFINE_SHOW_ATTRIBUTE sunrpc: Add xprt after nfs4_test_session_trunk() sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS sunrpc: handle ENOMEM in rpcb_getport_async NFS: remove unnecessary test for IS_ERR(cred) xprtrdma: Prevent leak of rpcrdma_rep objects NFSv4.2 fix async copy reboot recovery xprtrdma: Don't leak freed MRs xprtrdma: Add documenting comment for rpcrdma_buffer_destroy xprtrdma: Replace outdated comment for rpcrdma_ep_post xprtrdma: Update comments in frwr_op_send SUNRPC: Fix some kernel doc complaints SUNRPC: Simplify defining common RPC trace events NFS: Fix NFSv4 symbolic trace point output xprtrdma: Trace mapping, alloc, and dereg failures xprtrdma: Add trace points for calls to transport switch methods xprtrdma: Relocate the xprtrdma_mr_map trace points xprtrdma: Clean up of xprtrdma chunk trace points xprtrdma: Remove unused fields from rpcrdma_ia xprtrdma: Cull dprintk() call sites ...
| * sunrpc: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li2019-01-021-16/+3
| | | | | | | | | | | | | | Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * sunrpc: Add xprt after nfs4_test_session_trunk()Santosh kumar pradhan2019-01-025-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multipathing: In case of NFSv3, rpc_clnt_test_and_add_xprt() adds the xprt to xprt switch (i.e. xps) if rpc_call_null_helper() returns success. But in case of NFSv4.1, it needs to do EXCHANGEID to verify the path along with check for session trunking. Add the xprt in nfs4_test_session_trunk() only when nfs4_detect_session_trunking() returns success. Also release refcount hold by rpc_clnt_setup_test_and_add_xprt(). Signed-off-by: Santosh kumar pradhan <santoshkumar.pradhan@wdc.com> Tested-by: Suresh Jayaraman <suresh.jayaraman@wdc.com> Reported-by: Aditya Agnihotri <aditya.agnihotri@wdc.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFSJ. Bruce Fields2019-01-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's OK to sleep here, we just don't want to recurse into the filesystem as a writeout could be waiting on this. Future work: the documentation for GFP_NOFS says "Please try to avoid using this flag directly and instead use memalloc_nofs_{save,restore} to mark the whole scope which cannot/shouldn't recurse into the FS layer with a short explanation why. All allocation requests will inherit GFP_NOFS implicitly." But I'm not sure where to do this. Should the workqueue be arranging that for us in the case of workqueues created with WQ_MEM_RECLAIM? Reported-by: Trond Myklebust <trondmy@hammer.space> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * sunrpc: handle ENOMEM in rpcb_getport_asyncJ. Bruce Fields2019-01-021-0/+8
| | | | | | | | | | | | | | | | If we ignore the error we'll hit a null dereference a little later. Reported-by: syzbot+4b98281f2401ab849f4b@syzkaller.appspotmail.com Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: remove unnecessary test for IS_ERR(cred)NeilBrown2019-01-021-5/+0
| | | | | | | | | | | | | | | | | | | | | | As gte_current_cred() cannot return an error, this test is not necessary. It hasn't been necessary for years, but it wasn't so obvious before. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Prevent leak of rpcrdma_rep objectsChuck Lever2019-01-022-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a reply has been processed but the RPC is later retransmitted anyway, the req->rl_reply field still contains the only pointer to the old rpcrdma rep. When the next reply comes in, the reply handler will stomp on the rl_reply field, leaking the old rep. A trace event is added to capture such leaks. This problem seems to be worsened by the restructuring of the RPC Call path in v4.20. Fully addressing this issue will require at least a re-architecture of the disconnect logic, which is not appropriate during -rc. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFSv4.2 fix async copy reboot recoveryOlga Kornievskaia2019-01-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Original commit (e4648aa4f98a "NFS recover from destination server reboot for copies") used memcmp() and then it was changed to use nfs4_stateid_match_other() but that function returns opposite of memcmp. As the result, recovery can't find the copy leading to copy hanging. Fixes: 80f42368868e ("NFSv4: Split out NFS v4.2 copy completion functions") Fixes: cb7a8384dc02 ("NFS: Split out the body of nfs4_reclaim_open_state") Signed-of-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Don't leak freed MRsChuck Lever2019-01-021-12/+15
| | | | | | | | | | | | | | | | Defensive clean up. Don't set frwr->fr_mr until we know that the scatterlist allocation has succeeded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Add documenting comment for rpcrdma_buffer_destroyChuck Lever2019-01-021-0/+8
| | | | | | | | | | | | | | Make a note of the function's dependency on an earlier ib_drain_qp. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Replace outdated comment for rpcrdma_ep_postChuck Lever2019-01-021-3/+7
| | | | | | | | | | | | | | | | | | | | Since commit 7c8d9e7c8863 ("xprtrdma: Move Receive posting to Receive handler"), rpcrdma_ep_post is no longer responsible for posting Receive buffers. Update the documenting comment to reflect this change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Update comments in frwr_op_sendChuck Lever2019-01-021-2/+2
| | | | | | | | | | | | | | | | | | | | Commit f2877623082b ("xprtrdma: Chain Send to FastReg WRs") was written before commit ce5b37178283 ("xprtrdma: Replace all usage of "frmr" with "frwr""), but was merged afterwards. Thus it still refers to FRMR and MWs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Fix some kernel doc complaintsChuck Lever2019-01-024-4/+6
| | | | | | | | | | | | | | Clean up some warnings observed when building with "make W=1". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Simplify defining common RPC trace eventsChuck Lever2019-01-021-103/+69
| | | | | | | | | | | | | | Clean up, no functional change is expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: Fix NFSv4 symbolic trace point outputChuck Lever2019-01-021-143/+313
| | | | | | | | | | | | | | | | | | These symbolic values were not being displayed in string form. TRACE_DEFINE_ENUM was missing in many cases. It also turns out that __print_symbolic wants an unsigned long in the first field... Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Trace mapping, alloc, and dereg failuresChuck Lever2019-01-024-10/+144
| | | | | | | | | | | | | | | | These are rare, but can be helpful at tracking down DMAR and other problems. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Add trace points for calls to transport switch methodsChuck Lever2019-01-022-11/+17
| | | | | | | | | | | | | | | | | | Name them "trace_xprtrdma_op_*" so they can be easily enabled as a group. No trace point is added where the generic layer already has observability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Relocate the xprtrdma_mr_map trace pointsChuck Lever2019-01-021-1/+1
| | | | | | | | | | | | | | | | The mr_map trace points were capturing information about the previous use of the MR rather than about the segment that was just mapped. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Clean up of xprtrdma chunk trace pointsChuck Lever2019-01-022-19/+29
| | | | | | | | | | | | | | | | | | | | | | The chunk-related trace points capture nearly the same information as the MR-related trace points. Also, rename them so globbing can be used to enable or disable these trace points more easily. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Remove unused fields from rpcrdma_iaChuck Lever2019-01-021-2/+0
| | | | | | | | | | | | | | | | Clean up. The last use of these fields was in commit 173b8f49b3af ("xprtrdma: Demote "connect" log messages") . Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Cull dprintk() call sitesChuck Lever2019-01-024-68/+19
| | | | | | | | | | | | | | | | | | Clean up: Remove dprintk() call sites that report rare or impossible errors. Leave a few that display high-value low noise status information. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Simplify locking that protects the rl_allreqs listChuck Lever2019-01-023-35/+23
| | | | | | | | | | | | | | | | | | | | | | | | Clean up: There's little chance of contention between the use of rb_lock and rb_reqslock, so merge the two. This avoids having to take both in some (possibly future) cases. Transport tear-down is already serialized, thus there is no need for locking at all when destroying rpcrdma_reqs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Expose transport header errorsChuck Lever2019-01-021-1/+0
| | | | | | | | | | | | | | | | For better observability of parsing errors, return the error code generated in the decoders to the upper layer consumer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Remove request_module from backchannelChuck Lever2019-01-021-2/+0
| | | | | | | | | | | | | | | | | | | | Since commit ffe1f0df5862 ("rpcrdma: Merge svcrdma and xprtrdma modules into one"), the forward and backchannel components are part of the same kernel module. A separate request_module() call in the backchannel code is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Recognize XDRBUF_SPARSE_PAGESChuck Lever2019-01-021-5/+6
| | | | | | | | | | | | | | | | | | | | Commit 431f6eb3570f ("SUNRPC: Add a label for RPC calls that require allocation on receive") didn't update similar logic in rpc_rdma.c. I don't think this is a bug, per-se; the commit just adds more careful checking for broken upper layer behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: Make "port=" mount option optional for RDMA mountsChuck Lever2019-01-021-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having to specify "proto=rdma,port=20049" is cumbersome. RFC 8267 Section 6.3 requires NFSv4 clients to use "the alternative well-known port number", which is 20049. Make the use of the well- known port number automatic, just as it is for NFS/TCP and port 2049. For NFSv2/3, Section 4.2 allows clients to simply choose 20049 as the default or use rpcbind. I don't know of an NFS/RDMA server implementation that registers it's NFS/RDMA service with rpcbind, so automatically choosing 20049 seems like the better choice. The other widely-deployed NFS/RDMA client, Solaris, also uses 20049 as the default port. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)Chuck Lever2019-01-023-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Place the associated RPC transaction's XID in the upper 32 bits of each RDMA segment's rdma_offset field. There are two reasons to do this: - The R_key only has 8 bits that are different from registration to registration. The XID adds more uniqueness to each RDMA segment to reduce the likelihood of a software bug on the server reading from or writing into memory it's not supposed to. - On-the-wire RDMA Read and Write requests do not otherwise carry any identifier that matches them up to an RPC. The XID in the upper 32 bits will act as an eye-catcher in network captures. Suggested-by: Tom Talpey <ttalpey@microsoft.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Remove rpcrdma_memreg_opsChuck Lever2019-01-025-101/+116
| | | | | | | | | | | | | | | | | | Clean up: Now that there is only FRWR, there is no need for a memory registration switch. The indirect calls to the memreg operations can be replaced with faster direct calls. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Remove support for FMR memory registrationChuck Lever2019-01-024-359/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FMR is not supported on most recent RDMA devices. It is also less secure than FRWR because an FMR memory registration can expose adjacent bytes to remote reading or writing. As discussed during the RDMA BoF at LPC 2018, it is time to remove support for FMR in the NFS/RDMA client stack. Note that NFS/RDMA server-side uses either local memory registration or FRWR. FMR is not used. There are a few Infiniband/RoCE devices in the kernel tree that do not appear to support MEM_MGT_EXTENSIONS (FRWR), and therefore will not support client-side NFS/RDMA after this patch. These are: - mthca - qib - hns (RoCE) Users of these devices can use NFS/TCP on IPoIB instead. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Reduce max_frwr_depthChuck Lever2019-01-021-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some devices advertise a large max_fast_reg_page_list_len capability, but perform optimally when MRs are significantly smaller than that depth -- probably when the MR itself is no larger than a page. By default, the RDMA R/W core API uses max_sge_rd as the maximum page depth for MRs. For some devices, the value of max_sge_rd is 1, which is also not optimal. Thus, when max_sge_rd is larger than 1, use that value. Otherwise use the value of the max_fast_reg_page_list_len attribute. I've tested this with CX-3 Pro, FastLinq, and CX-5 devices. It reproducibly improves the throughput of large I/Os by several percent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Fix ri_max_segs and the result of ro_maxpagesChuck Lever2019-01-023-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With certain combinations of krb5i/p, MR size, and r/wsize, I/O can fail with EMSGSIZE. This is because the calculated value of ri_max_segs (the max number of MRs per RPC) exceeded RPCRDMA_MAX_HDR_SEGS, which caused Read or Write list encoding to walk off the end of the transport header. Once that was addressed, the ro_maxpages result has to be corrected to account for the number of MRs needed for Reply chunks, which is 2 MRs smaller than a normal Read or Write chunk. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Don't wake pending tasks until disconnect is doneChuck Lever2019-01-025-17/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Transport disconnect processing does a "wake pending tasks" at various points. Suppose an RPC Reply is being processed. The RPC task that Reply goes with is waiting on the pending queue. If a disconnect wake-up happens before reply processing is done, that reply, even if it is good, is thrown away, and the RPC has to be sent again. This window apparently does not exist for socket transports because there is a lock held while a reply is being received which prevents the wake-up call until after reply processing is done. To resolve this, all RPC replies being processed on an RPC-over-RDMA transport have to complete before pending tasks are awoken due to a transport disconnect. Callers that already hold the transport write lock may invoke ->ops->close directly. Others use a generic helper that schedules a close when the write lock can be taken safely. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: No qp_event disconnectChuck Lever2019-01-022-33/+0
| | | | | | | | | | | | | | | | | | | | After thinking about this more, and auditing other kernel ULP imple- mentations, I believe that a DISCONNECT cm_event will occur after a fatal QP event. If that's the case, there's no need for an explicit disconnect in the QP event handler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueueChuck Lever2019-01-024-48/+44
| | | | | | | | | | | | | | | | | | | | To address a connection-close ordering problem, we need the ability to drain the RPC completions running on rpcrdma_receive_wq for just one transport. Give each transport its own RPC completion workqueue, and drain that workqueue when disconnecting the transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Refactor Receive accountingChuck Lever2019-01-025-39/+19
| | | | | | | | | | | | | | | | | | | | | | Clean up: Divide the work cleanly: - rpcrdma_wc_receive is responsible only for RDMA Receives - rpcrdma_reply_handler is responsible only for RPC Replies - the posted send and receive counts both belong in rpcrdma_ep Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Ensure MRs are DMA-unmapped when posting LOCAL_INV failsChuck Lever2019-01-021-2/+2
| | | | | | | | | | | | | | | | | | The recovery case in frwr_op_unmap_sync needs to DMA unmap each MR. frwr_release_mr does not DMA-unmap, but the recycle worker does. Fixes: 61da886bf74e ("xprtrdma: Explicitly resetting MRs is ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Yet another double DMA-unmapChuck Lever2019-01-022-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While chasing yet another set of DMAR fault reports, I noticed that the frwr recycler conflates whether or not an MR has been DMA unmapped with frwr->fr_state. Actually the two have only an indirect relationship. It's in fact impossible to guess reliably whether the MR has been DMA unmapped based on its fr_state field, especially as the surrounding code and its assumptions have changed over time. A better approach is to track the DMA mapping status explicitly so that the recycler is less brittle to unexpected situations, and attempts to DMA-unmap a second time are prevented. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: nfs_compare_mount_options always compare auth flavors.Chris Perl2018-12-211-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the check from nfs_compare_mount_options to see if a `sec' option was passed for the current mount before comparing auth flavors and instead just always compares auth flavors. Consider the following scenario: You have a server with the address 192.168.1.1 and two exports /export/a and /export/b. The first export supports `sys' and `krb5' security, the second just `sys'. Assume you start with no mounts from the server. The following results in EIOs being returned as the kernel nfs client incorrectly thinks it can share the underlying `struct nfs_server's: $ mkdir /tmp/{a,b} $ sudo mount -t nfs -o vers=3,sec=krb5 192.168.1.1:/export/a /tmp/a $ sudo mount -t nfs -o vers=3 192.168.1.1:/export/b /tmp/b $ df >/dev/null df: ‘/tmp/b’: Input/output error Signed-off-by: Chris Perl <cperl@janestreet.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC discard cr_uid from struct rpc_cred.NeilBrown2018-12-193-9/+6
| | | | | | | | | | | | | | Just use ->cr_cred->fsuid directly. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: simplify auth_unix.NeilBrown2018-12-192-70/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1/ discard 'struct unx_cred'. We don't need any data that is not already in 'struct rpc_cred'. 2/ Don't keep these creds in a hash table. When a credential is needed, simply allocate it. When not needed, discard it. This can easily be faster than performing a lookup on a shared hash table. As the lookup can happen during write-out, use a mempool to ensure forward progress. This means that we cannot compare two credentials for equality by comparing the pointers, but we never do that anyway. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: remove crbind rpc_cred operationNeilBrown2018-12-195-17/+1
| | | | | | | | | | | | | | | | This now always just does get_rpccred(), so we don't need an operation pointer to know to do that. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: remove generic cred code.NeilBrown2018-12-195-225/+2
| | | | | | | | | | | | | | This is no longer used. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.NeilBrown2018-12-1933-343/+261
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SUNRPC has two sorts of credentials, both of which appear as "struct rpc_cred". There are "generic credentials" which are supplied by clients such as NFS and passed in 'struct rpc_message' to indicate which user should be used to authorize the request, and there are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS which describe the credential to be sent over the wires. This patch replaces all the generic credentials by 'struct cred' pointers - the credential structure used throughout Linux. For machine credentials, there is a special 'struct cred *' pointer which is statically allocated and recognized where needed as having a special meaning. A look-up of a low-level cred will map this to a machine credential. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: struct nfs_open_dir_context: convert rpc_cred pointer to cred.NeilBrown2018-12-196-19/+35
| | | | | | | | | | | | | | Use the common 'struct cred' to pass credentials for readdir. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: change access cache to use 'struct cred'.NeilBrown2018-12-194-32/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than keying the access cache with 'struct rpc_cred', use 'struct cred'. Then use cred_fscmp() to compare credentials rather than comparing the raw pointer. A benefit of this approach is that in the common case we avoid the rpc_lookup_cred_nonblock() call which can be slow when the cred cache is large. This also keeps many fewer items pinned in the rpc cred cache, so the cred cache is less likely to get large. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: remove RPCAUTH_AUTH_NO_CRKEY_TIMEOUTNeilBrown2018-12-193-5/+0
| | | | | | | | | | | | | | This is no longer used. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: move credential expiry tracking out of SUNRPC into NFS.NeilBrown2018-12-197-124/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS needs to know when a credential is about to expire so that it can modify write-back behaviour to finish the write inside the expiry time. It currently uses functions in SUNRPC code which make use of a fairly complex callback scheme and flags in the generic credientials. As I am working to discard the generic credentials, this has to change. This patch moves the logic into NFS, in part by finding and caching the low-level credential in the open_context. We then make direct cred-api calls on that. This makes the code much simpler and removes a dependency on generic rpc credentials. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: add side channel to use non-generic cred for rpc call.NeilBrown2018-12-194-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The credential passed in rpc_message.rpc_cred is always a generic credential except in one instance. When gss_destroying_context() calls rpc_call_null(), it passes a specific credential that it needs to destroy. In this case the RPC acts *on* the credential rather than being authorized by it. This special case deserves explicit support and providing that will mean that rpc_message.rpc_cred is *always* generic, allowing some optimizations. So add "tk_op_cred" to rpc_task and "rpc_op_cred" to the setup data. Use this to pass the cred down from rpc_call_null(), and have rpcauth_bindcred() notice it and bind it in place. Credit to kernel test robot <fengguang.wu@intel.com> for finding a bug in earlier version of this patch. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: introduce RPC_TASK_NULLCREDS to request auth_noneNeilBrown2018-12-193-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In almost all cases the credential stored in rpc_message.rpc_cred is a "generic" credential. One of the two expections is when an AUTH_NULL credential is used such as for RPC ping requests. To improve consistency, don't pass an explicit credential in these cases, but instead pass NULL and set a task flag, similar to RPC_TASK_ROOTCREDS, which requests that NULL credentials be used by default. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS/SUNRPC: don't lookup machine credential until rpcauth_bindcred().NeilBrown2018-12-1910-69/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When NFS creates a machine credential, it is a "generic" credential, not tied to any auth protocol, and is really just a container for the princpal name. This doesn't get linked to a genuine credential until rpcauth_bindcred() is called. The lookup always succeeds, so various places that test if the machine credential is NULL, are pointless. As a step towards getting rid of generic credentials, this patch gets rid of generic machine credentials. The nfs_client and rpc_client just hold a pointer to a constant principal name. When a machine credential is wanted, a special static 'struct rpc_cred' pointer is used. rpcauth_bindcred() recognizes this, finds the principal from the client, and binds the correct credential. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>