summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* genericsTrond Myklebust2017-02-213-34/+12
| | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Add generic helpers for xdr_stream encode/decodeTrond Myklebust2017-02-211-0/+177
| | | | | | | | | Add some generic helpers for encoding/decoding opaque structures and basic u32/u64. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc: silence uninitialized variable warningDan Carpenter2017-02-211-1/+3
| | | | | | | | | | kstrtouint() can return a couple different error codes so the check for "ret == -EINVAL" is wrong and static analysis tools correctly complain that we can use "num" without initializing it. It's not super harmful because we check the bounds. But it's also easy enough to fix. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* nlm: Ensure callback code also checks that the files matchTrond Myklebust2017-02-131-1/+2
| | | | | | | | | | | | It is not sufficient to just check that the lock pids match when granting a callback, we also need to ensure that we're granting the callback on the right file. Reported-by: Pankaj Singh <psingh.ait@gmail.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc: Allow xprt->ops->timer method to sleepChuck Lever2017-02-102-2/+2
| | | | | | | | | | | | | | | | | The transport lock is needed to protect the xprt_adjust_cwnd() call in xs_udp_timer, but it is not necessary for accessing the rq_reply_bytes_recvd or tk_status fields. It is correct to sublimate the lock into UDP's xs_udp_timer method, where it is required. The ->timer method has to take the transport lock if needed, but it can now sleep safely, or even call back into the RPC scheduler. This is more a clean-up than a fix, but the "issue" was introduced by my transport switch patches back in 2005. Fixes: 46c0ee8bc4ad ("RPC: separate xprt_timer implementations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Refactor management of mw_list fieldChuck Lever2017-02-105-24/+29
| | | | | | | Clean up some duplicate code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Handle stale connection rejectionChuck Lever2017-02-101-45/+21
| | | | | | | | | | | | | | | | | | A server rejects a connection attempt with STALE_CONNECTION when a client attempts to connect to a working remote service, but uses a QPN and GUID that corresponds to an old connection that was abandoned. This might occur after a client crashes and restarts. Fix rpcrdma_conn_upcall() to distinguish between a normal rejection and rejection of stale connection parameters. As an additional clean-up, remove the code that retries the connection attempt with different ORD/IRD values. Code audit of other ULP initiators shows no similar special case handling of initiator_depth or responder_resources. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Properly recover FRWRs with in-flight FASTREG WRsChuck Lever2017-02-102-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sriharsha (sriharsha.basavapatna@broadcom.com) reports an occasional double DMA unmap of an FRWR MR when a connection is lost. I see one way this can happen. When a request requires more than one segment or chunk, rpcrdma_marshal_req loops, invoking ->frwr_op_map for each segment (MR) in each chunk. Each call posts a FASTREG Work Request to register one MR. Now suppose that the transport connection is lost part-way through marshaling this request. As part of recovering and resetting that req, rpcrdma_marshal_req invokes ->frwr_op_unmap_safe, which hands all the req's registered FRWRs to the MR recovery thread. But note: FRWR registration is asynchronous. So it's possible that some of these "already registered" FRWRs are fully registered, and some are still waiting for their FASTREG WR to complete. When the connection is lost, the "already registered" frmrs are marked FRMR_IS_VALID, and the "still waiting" WRs flush. Then frwr_wc_fastreg marks these frmrs FRMR_FLUSHED_FR. But thanks to ->frwr_op_unmap_safe, the MR recovery thread is doing an unreg / alloc_mr, a DMA unmap, and marking each of these frwrs FRMR_IS_INVALID, at the same time frwr_wc_fastreg might be running. - If the recovery thread runs last, then the frmr is marked FRMR_IS_INVALID, and life continues. - If frwr_wc_fastreg runs last, the frmr is marked FRMR_FLUSHED_FR, but the recovery thread has already DMA unmapped that MR. When ->frwr_op_map later re-uses this frmr, it sees it is not marked FRMR_IS_INVALID, and tries to recover it before using it, resulting in a second DMA unmap of the same MR. The fix is to guarantee in-flight FASTREG WRs have flushed before MR recovery runs on those FRWRs. Thus we depend on ro_unmap_safe (called from xprt_rdma_send_request on retransmit, or from xprt_rdma_free) to clean up old registrations as needed. Reported-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Shrink send SGEs arrayChuck Lever2017-02-101-4/+7
| | | | | | | | | | We no longer need to accommodate an xdr_buf whose pages start at an offset and cross extra page boundaries. If there are more partial or whole pages to send than there are available SGEs, the marshaling logic is now smart enough to use a Read chunk instead of failing. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Reduce required number of send SGEsChuck Lever2017-02-103-9/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The MAX_SEND_SGES check introduced in commit 655fec6987be ("xprtrdma: Use gathered Send for large inline messages") fails for devices that have a small max_sge. Instead of checking for a large fixed maximum number of SGEs, check for a minimum small number. RPC-over-RDMA will switch to using a Read chunk if an xdr_buf has more pages than can fit in the device's max_sge limit. This is considerably better than failing all together to mount the server. This fix supports devices that have as few as three send SGEs available. Reported-by: Selvin Xavier <selvin.xavier@broadcom.com> Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> Reported-by: Honggang Li <honli@redhat.com> Reported-by: Ram Amrani <Ram.Amrani@cavium.com> Fixes: 655fec6987be ("xprtrdma: Use gathered Send for large ...") Cc: stable@vger.kernel.org # v4.9+ Tested-by: Honggang Li <honli@redhat.com> Tested-by: Ram Amrani <Ram.Amrani@cavium.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Disable pad optimization by defaultChuck Lever2017-02-102-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d5440e27d3e5 ("xprtrdma: Enable pad optimization") made the Linux client omit XDR round-up padding in normal Read and Write chunks so that the client doesn't have to register and invalidate 3-byte memory regions that contain no real data. Unfortunately, my cheery 2014 assessment that this optimization "is supported now by both Linux and Solaris servers" was premature. We've found bugs in Solaris in this area since commit d5440e27d3e5 ("xprtrdma: Enable pad optimization") was merged (SYMLINK is the main offender). So for maximum interoperability, I'm disabling this optimization again. If a CM private message is exchanged when connecting, the client recognizes that the server is Linux, and enables the optimization for that connection. Until now the Solaris server bugs did not impact common operations, and were thus largely benign. Soon, less capable devices on Linux NFS/RDMA clients will make use of Read chunks more often, and these Solaris bugs will prevent interoperation in more cases. Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Per-connection pad optimizationChuck Lever2017-02-103-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pad optimization is changed by echoing into /proc/sys/sunrpc/rdma_pad_optimize. This is a global setting, affecting all RPC-over-RDMA connections to all servers. The marshaling code picks up that value and uses it for decisions about how to construct each RPC-over-RDMA frame. Having it change suddenly in mid-operation can result in unexpected failures. And some servers a client mounts might need chunk round-up, while others don't. So instead, copy the pad_optimize setting into each connection's rpcrdma_ia when the transport is created, and use the copy, which can't change during the life of the connection, instead. This also removes a hack: rpcrdma_convert_iovs was using the remote-invalidation-expected flag to predict when it could leave out Write chunk padding. This is because the Linux server handles implicit XDR padding on Write chunks correctly, and only Linux servers can set the connection's remote-invalidation-expected flag. It's more sensible to use the pad optimization setting instead. Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Fix Read chunk paddingChuck Lever2017-02-101-6/+4
| | | | | | | | | | | | | | | | | | | | | | When pad optimization is disabled, rpcrdma_convert_iovs still does not add explicit XDR round-up padding to a Read chunk. Commit 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") incorrectly short-circuited the test for whether round-up padding is needed that appears later in rpcrdma_convert_iovs. However, if this is indeed a regular Read chunk (and not a Position-Zero Read chunk), the tail iovec _always_ contains the chunk's padding, and never anything else. So, it's easy to just skip the tail when padding optimization is enabled, and add the tail in a subsequent Read chunk segment, if disabled. Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4: Set the connection timeout to match the lease periodTrond Myklebust2017-02-093-7/+10
| | | | | | | | Set the timeout for TCP connections to be 1 lease period to ensure that we don't lose our lease due to a faulty TCP connection. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Allow changing of the TCP timeout parameters on the flyTrond Myklebust2017-02-094-11/+77
| | | | | | | | | When the NFSv4 server tells us the lease period, we usually want to adjust down the timeout parameters on the TCP connection to ensure that we don't miss lease renewals due to a faulty connection. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Refactor TCP socket timeout code into a helper functionTrond Myklebust2017-02-091-19/+26
| | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Remove unused function rpc_get_timeout()Trond Myklebust2017-02-092-16/+0
| | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4: Fix memory and state leak in _nfs4_open_and_get_stateTrond Myklebust2017-02-081-1/+1
| | | | | | | | | | | If we exit because the file access check failed, we currently leak the struct nfs4_state. We need to attach it to the open context before returning. Fixes: 3efb9722475e ("NFSv4: Refactor _nfs4_open_and_get_state..") Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc: use simple_read_from_buffer for reading cache flushKinglong Mee2017-02-081-12/+3
| | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc: record rpc client pointer in seq->private directlyKinglong Mee2017-02-081-25/+10
| | | | | | | pos in rpc_clnt_iter is useless, drop it and record clnt in seq_private. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc: update the comments of sunrpc proc pathKinglong Mee2017-02-081-2/+2
| | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc: remove dead codes of cr_magic in rpc_credKinglong Mee2017-02-083-11/+0
| | | | | | | Don't found any place using the cr_magic. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc: rename NFS_NGROUPS to UNX_NGROUPS for auth unixKinglong Mee2017-02-083-12/+11
| | | | | | | NFS_NGROUPS has been move to sunrpc, rename to UNX_NGROUPS. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc/nfs: cleanup procfs/pipefs entry in cache_detailKinglong Mee2017-02-083-46/+21
| | | | | | | Record flush/channel/content entries is useless, remove them. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc: error out if register_shrinker failKinglong Mee2017-02-081-1/+5
| | | | | | | register_shrinker may return error when register fail, error out. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* nfs: no PG_private waiters remain, remove wakerNicholas Piggin2017-02-081-2/+0
| | | | | | | | | | Since commit 4f52b6bb ("NFS: Don't call COMMIT in ->releasepage()"), no tasks wait on PagePrivate, so the wake introduced in commit 95905446 ("NFS: avoid deadlocks with loop-back mounted NFS filesystems.") can be removed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: nfs_rename() handle -ERESTARTSYS dentry left behindBenjamin Coddington2017-02-081-11/+25
| | | | | | | | | | An interrupted rename will leave the old dentry behind if the rename succeeds. Fix this by moving the final local work of the rename to rpc_call_done so that the results of the RENAME can always be handled, even if the original process has already returned with -ERESTARTSYS. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4: Fix warning for using 0 as NULLWei Yongjun2017-01-301-1/+1
| | | | | | | | | Fixes the following sparse warning: fs/nfs/nfs4state.c:862:60: warning: Using plain integer as NULL pointer Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS/flexfiles: Make local symbol layoutreturn_ops staticWei Yongjun2017-01-301-1/+1
| | | | | | | | | | Fixes the following sparse warning: fs/nfs/flexfilelayout/flexfilelayout.c:2114:34: warning: symbol 'layoutreturn_ops' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Return the comparison result directly in nfs41_match_stateid()Anna Schumaker2017-01-301-3/+1
| | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Clean up nfs41_same_server_scope()Anna Schumaker2017-01-301-5/+3
| | | | | | | The function is cleaner this way, since we can use the result of memcmp() directly Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: No need to set and return status in nfs41_lock_expired()Anna Schumaker2017-01-301-2/+1
| | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Remove unnecessary goto in nfs4_lookup_root_sec()Anna Schumaker2017-01-301-8/+3
| | | | | | Once again, it's easier and cleaner just to return the error directly. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Remove nfs4_recover_expired_lease()Anna Schumaker2017-01-301-6/+1
| | | | | | | This function doesn't add much, since all it does is access the server's nfs_client variable. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Remove an extra if in _nfs4_recover_proc_open()Anna Schumaker2017-01-301-4/+1
| | | | | | It's simpler just to return the status unconditionally Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Return errors directly in _nfs4_opendata_reclaim_to_nfs4_state()Anna Schumaker2017-01-301-8/+3
| | | | | | | There is no need for a goto just to return an error code without any cleanup. Returning the error directly helps to clean up the code. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Remove nfs4_wait_for_completion_rpc_task()Anna Schumaker2017-01-301-15/+7
| | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Clean up _nfs4_is_integrity_protected()Anna Schumaker2017-01-301-6/+1
| | | | | | | We can cut out the if statement and return the results of the comparison directly. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Fix inconsistent indentation in nfs4proc.cAnna Schumaker2017-01-301-28/+28
| | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Make trace_nfs4_setup_sequence() available to NFS v4.0Anna Schumaker2017-01-303-35/+35
| | | | | | | | | This tracepoint displays information about the slot that was chosen for the RPC, in addition to session information. This could be useful information for debugging, and we can set the session id hash to 0 to indicate that there is no session. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Merge the remaining setup_sequence functionsAnna Schumaker2017-01-301-67/+20
| | | | | | | This creates a single place for all the work to happen, using the presence of a session to determine if extra values need to be set. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Check if the slot table is draining from nfs4_setup_sequence()Anna Schumaker2017-01-301-14/+9
| | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Handle setup sequence task rescheduling in a single placeAnna Schumaker2017-01-301-22/+15
| | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Lock the slot table from a single place during setup sequenceAnna Schumaker2017-01-301-13/+12
| | | | | | Rather than implementing this twice for NFS v4.0 and v4.1 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Move slot-already-allocated check into nfs_setup_sequence()Anna Schumaker2017-01-301-10/+9
| | | | | | | This puts the check in a single place, rather than needing to implement it twice for v4.0 and v4.1. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Create a single nfs4_setup_sequence() functionAnna Schumaker2017-01-301-47/+29
| | | | | | | The inline ifdef lets us put everything in a single place, rather than having two (very similar) versions of this function. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Use nfs4_setup_sequence() everywhereAnna Schumaker2017-01-305-52/+33
| | | | | | | This does the right thing depending on if we have a session, rather than needing to handle this manually in multiple places. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Change nfs4_setup_sequence() to take an nfs_client structureAnna Schumaker2017-01-301-16/+15
| | | | | | | | I want to have all callers use this function, rather than calling the NFS v4.0 and v4.1 versions directly. This includes pNFS, which only has access to the nfs_client structure in some places. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Change nfs4_get_session() to take an nfs_client structureAnna Schumaker2017-01-303-10/+9
| | | | | | | | pNFS only has access to the nfs_client structure, and not the nfs_server, so we need to make this change so the function can be used by pNFS as well. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Move nfs4_get_session() into nfs4_session.hAnna Schumaker2017-01-303-10/+6
| | | | | | | | This puts session related functions together in the same space. I only keep one version of this function, since this variable will always be NULL when using NFS v4.0. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>