summaryrefslogtreecommitdiffstats
path: root/fs/nfs/pnfs.c
Commit message (Collapse)AuthorAgeFilesLines
* nfs: fix spellint typo in pnfs.cWang Qing2020-09-241-1/+1
| | | | | | | Change the comment typo: "manger" -> "manager". Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva2020-08-231-1/+1
| | | | | | | | | | Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
* NFS: Fix flexfiles read failoverTrond Myklebust2020-08-121-1/+3
| | | | | | | | | | | | | | | The current mirrored read failover code is correctly resetting the mirror index between failed reads, however it is not able to actually flip the RPC call over to the next RPC client. The end result is that we keep resending the RPC call to the same client over and over. The fix is to use the pnfs_read_resend_pnfs() mechanism to schedule a new RPC call, but we need to add the ability to pass in a mirror index so that we always retry the next mirror in the list. Fixes: 166bd5b889ac ("pNFS/flexfiles: Fix layoutstats handling during read failovers") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Don't return layout segments that are in useTrond Myklebust2020-08-121-19/+15
| | | | | | | | | | If the NFS_LAYOUT_RETURN_REQUESTED flag is set, we want to return the layout as soon as possible, meaning that the affected layout segments should be marked as invalid, and should no longer be in use for I/O. Fixes: f0b429819b5f ("pNFS: Ignore non-recalled layouts in pnfs_layout_need_return()") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Don't move layouts to plh_return_segs list while in useTrond Myklebust2020-08-121-11/+1
| | | | | | | | | | If the layout segment is still in use for a read or a write, we should not move it to the layout plh_return_segs list. If we do, we can end up returning the layout while I/O is still in progress. Fixes: e0b7d420f72a ("pNFS: Don't discard layout segments that are marked for return") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()Trond Myklebust2020-08-051-1/+1
| | | | | | | Ensure we correctly report the stateid and status in the layoutreturn on close tracepoint. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()Trond Myklebust2020-04-191-5/+2
| | | | | | | | | | If the credential returned by pnfs_prepare_layoutreturn() does not match the credential of the RPC call, then we do end up calling pnfs_send_layoutreturn() with that credential, so don't free it! Fixes: 44ea8dfce021 ("NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completionTrond Myklebust2020-04-191-1/+3
| | | | | | | We require that any outstanding layout return completes before we can free up the inode so that the layout itself can be freed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Fix an ABBA spinlock issue in pnfs_update_layout()Trond Myklebust2020-04-131-1/+2
| | | | | | | | | | | We need to drop the inode spinlock while calling nfs4_select_rw_stateid(), since nfs4_copy_delegation_stateid() could take the delegation lock. Note that it is safe to do this, since all other calls to pnfs_update_layout() for that inode will find themselves blocked by the lock we hold on NFS_LAYOUT_FIRST_LAYOUTGET. Fixes: fc51b1cf391d ("NFS: Beware when dereferencing the delegation cred") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()Trond Myklebust2020-04-031-19/+33
| | | | | | | When we're sending a layoutreturn, ensure that we reference the layout cred atomically with the copy of the stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()Trond Myklebust2020-04-031-1/+1
| | | | | | | Ensure that the dereference of the layout cred is atomic with the stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS/flexfiles: Check the layout segment range before doing I/OTrond Myklebust2020-03-271-1/+2
| | | | | | | When starting to read or write with a layout segment, check that the range matches our request. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Add infrastructure for cleaning up per-layout commit structuresTrond Myklebust2020-03-271-0/+1
| | | | | | | Ensure that both the file and flexfiles layout types clean up when freeing the layout segments. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Add support for CB_RECALL_ANY for flexfiles layoutsTrond Myklebust2020-03-161-12/+136
| | | | | | | | | When we receive a CB_RECALL_ANY that asks us to return flexfiles layouts, we iterate through all the layouts and look at whether or not there are active open file descriptors that might need them for I/O. If there are no such descriptors, we return the layouts. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Ensure layout headers are RCU safeTrond Myklebust2020-03-161-6/+6
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Avoid unnecessary credential references in layoutgetTrond Myklebust2020-03-161-2/+1
| | | | | | Layoutget is just using the credential attached to the open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/pnfs: pnfs_set_layout_stateid() should update the layout credTrond Myklebust2020-03-161-4/+16
| | | | | | | | If the cred assigned to the layout that we're updating differs from the one used to retrieve the new layout segment, then we need to update the layout plh_lc_cred field. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: pnfs_roc() must use cred_fscmp() to compare credsTrond Myklebust2020-02-031-1/+1
| | | | | | | | | When comparing two 'struct cred' for equality w.r.t. behaviour under filesystem access, we need to use cred_fscmp(). Fixes: a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4.x recover from pre-mature loss of openstateidOlga Kornievskaia2020-01-151-2/+0
| | | | | | | | | | | | | | | | Ever since the commit 0e0cb35b417f, it's possible to lose an open stateid while retrying a CLOSE due to ERR_OLD_STATEID. Once that happens, operations that require openstateid fail with EAGAIN which is propagated to the application then tests like generic/446 and generic/168 fail with "Resource temporarily unavailable". Instead of returning this error, initiate state recovery when possible to recover the open stateid and then try calling nfs4_select_rw_stateid() again. Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4: add declaration of current_stateidBen Dooks2019-11-181-2/+0
| | | | | | | | | | | The current_stateid is exported from nfs4state.c but not declared in any of the headers. Add to nfs4_fs.h to remove the following warning: fs/nfs/nfs4state.c:80:20: warning: symbol 'current_stateid' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqidTrond Myklebust2019-09-201-4/+14
| | | | | | | | If a LAYOUTRETURN receives a reply of NFS4ERR_OLD_STATEID then assume we've missed an update, and just bump the stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4: Handle RPC level errors in LAYOUTRETURNTrond Myklebust2019-09-201-0/+15
| | | | | | | Handle RPC level errors by assuming that the RPC call was successful. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4: Handle NFS4ERR_DELAY correctly in return-on-closeTrond Myklebust2019-09-201-0/+4
| | | | | | | If the server sends a NFS4ERR_DELAY, then allow the caller to retry. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4: Clean up pNFS return-on-close error handlingTrond Myklebust2019-09-201-0/+27
| | | | | | | | | Both close and delegreturn have identical code to handle pNFS return-on-close. This patch refactors that code and places it in pnfs.c Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS: Ensure we do clear the return-on-close layout stateid on fatal errorsTrond Myklebust2019-09-201-2/+7
| | | | | | | | | | | IF the server rejected our layout return with a state error such as NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want to clear out all the remaining layout segments and mark that stateid as invalid. Fixes: 1c5bd76d17cca ("pNFS: Enable layoutreturn operation for...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFSv4: Report the error from nfs4_select_rw_stateid()Trond Myklebust2019-08-041-6/+1
| | | | | | | | In pnfs_update_layout() ensure that we do report any fatal errors from nfs4_select_rw_stateid(). Fixes: d9aba2b40de6 ("NFSv4: Don't use the zero stateid with layoutget") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDSTrond Myklebust2019-07-181-0/+2
| | | | | | | Add tracepoints to allow debugging of the event chain leading to a pnfs fallback to doing I/O through the MDS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Fix a problem where we gratuitously start doing I/O through the MDSTrond Myklebust2019-07-181-1/+1
| | | | | | | | | | If the client has to stop in pnfs_update_layout() to wait for another layoutget to complete, it currently exits and defaults to I/O through the MDS if the layoutget was successful. Fixes: d03360aaf5cc ("pNFS: Ensure we return the error if someone kills...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+
* NFSv4: Don't use the zero stateid with layoutgetTrond Myklebust2019-07-181-3/+11
| | | | | | | | | The NFSv4.1 protocol explicitly forbids us from using the zero stateid together with layoutget, so when we see that nfs4_select_rw_stateid() is unable to return a valid delegation, lock or open stateid, then we should initiate recovery and retry. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Clean up writeback codeTrond Myklebust2019-07-061-1/+1
| | | | | | | | Now that the VM promises never to recurse back into the filesystem layer on writeback, remove all the GFP_NOFS references etc from the generic writeback code. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Add a helper to return a pointer to the open context of a struct nfs_pageTrond Myklebust2019-04-251-2/+2
| | | | | | | | Add a helper for when we remove the explicit pointer to the open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS: Fix a typo in pnfs_update_layoutTrond Myklebust2019-03-121-1/+1
| | | | | | | | | We're supposed to wait for the outstanding layout count to go to zero, but that got lost somehow. Fixes: d03360aaf5cca ("pNFS: Ensure we return the error if someone...") Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umountTrond Myklebust2019-02-231-10/+23
| | | | | | | | | If a bulk layout recall or a metadata server reboot coincides with a umount, then holding a reference to an inode is unsafe unless we also hold a reference to the super block. Fixes: fd9a8d7160937 ("NFSv4.1: Fix bulk recall and destroy of layouts") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.NeilBrown2018-12-191-7/+7
| | | | | | | | | | | | | | | | | | | | | | SUNRPC has two sorts of credentials, both of which appear as "struct rpc_cred". There are "generic credentials" which are supplied by clients such as NFS and passed in 'struct rpc_message' to indicate which user should be used to authorize the request, and there are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS which describe the credential to be sent over the wires. This patch replaces all the generic credentials by 'struct cred' pointers - the credential structure used throughout Linux. For machine credentials, there is a special 'struct cred *' pointer which is statically allocated and recognized where needed as having a special meaning. A look-up of a low-level cred will map this to a machine credential. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Convert lookups of the open context to RCUTrond Myklebust2018-09-301-1/+4
| | | | | | | Reduce contention on the inode->i_lock by ensuring that we use RCU when looking up the NFS open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Don't allocate more pages than we need to fit a layoutget responseTrond Myklebust2018-09-301-0/+7
| | | | | | | | For the 'files' and 'flexfiles' layout types, we do not expect the reply to be any larger than 4k. The block and scsi layout types are a little more greedy, so we keep allocating the maximum response size for now. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Don't zero out the array in nfs4_alloc_pages()Trond Myklebust2018-09-301-2/+2
| | | | | | We don't need a zeroed out array, since it is immediately being filled. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Ensure we return the error if someone kills a waiting layoutgetTrond Myklebust2018-09-141-10/+16
| | | | | | | | If someone interrupts a wait on one or more outstanding layoutgets in pnfs_update_layout() then return the ERESTARTSYS/EINTR error. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS: Remove unwanted optimisation of layoutgetTrond Myklebust2018-08-211-6/+0
| | | | | | | | | | If we knew that the file was empty, we wouldn't be asking for a layout. Any optimisation here is already done before calling pnfs_update_layout(). As it stands, we sometimes end up doing an unnecessary inband read to the MDS even when holding a layout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS: Treat RECALLCONFLICT like DELAY...Trond Myklebust2018-08-161-9/+0
| | | | | | | | Yes, it is possible to get trapped in a loop, but the server should be administratively revoking the recalled layout if it never gets returned. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS: When updating the stateid in layoutreturn, also update the recall rangeTrond Myklebust2018-08-161-1/+4
| | | | | | | | | | When we update the layout stateid in nfs4_layoutreturn_refresh_stateid, we should also update the range in order to let the server know we're actually returning everything. Fixes: 16c278dbfa63 ("pnfs: Fix handling of NFS4ERR_OLD_STATEID replies...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pnfs: Use true and false for boolean valuesGustavo A. R. Silva2018-08-081-1/+1
| | | | | | | | | | Return statements in functions returning bool should use true or false instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pnfs: pnfs_find_lseg() should not check NFS_LSEG_LAYOUTRETURNTrond Myklebust2018-08-081-1/+0
| | | | | | | | | | | Layout segment validity is determined only by the NFS_LSEG_VALID flag. If it is set, the layout segment is finable. As it is, when the flexfiles driver sets NFS_LSEG_LAYOUTRETURN to indicate that we cannot discard the layout segment, but that it must be returned, then this can result in an unnecessary layoutget storm. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pnfs: Fix handling of NFS4ERR_OLD_STATEID replies to layoutreturnTrond Myklebust2018-08-081-3/+14
| | | | | | | | | If the server tells us that out layoutreturn raced with another layout update, then we must ensure that the new layout segments are not in use before we resend with an updated layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS: Parse the results of layoutget on open even if permissions checks failTrond Myklebust2018-07-261-4/+0
| | | | | | | | | | | | Even if the results of the permissions checks failed, we should parse the results of the layout on open call so that we can return the layout if required. Note that we also want to ignore the sequence counter for whether or not a layout recall occurred. If the recall pertained to our OPEN, then the callback will know, and will attempt to wait for us to finih processing anyway. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout()Trond Myklebust2018-07-261-5/+31
| | | | | | | | If the old layout was recalled, and we returned NFS4ERR_NOMATCHINGLAYOUT then we need to wait for all outstanding layoutget calls to complete before we can send a new one. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Ignore non-recalled layouts in pnfs_layout_need_return()Trond Myklebust2018-07-261-1/+10
| | | | | | | If a layout has been recalled, then we should fire off a layoutreturn as soon as all the layout segments that match the recall have been retired. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Don't discard layout segments that are marked for returnTrond Myklebust2018-07-261-7/+27
| | | | | | | | | If there are layout segments that are marked for return, then we need to ensure that pnfs_mark_matching_lsegs_return() does not just silently discard them, but it should tell the caller that there is a layoutreturn scheduled. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* skip LAYOUTRETURN if layout is invalidOlga Kornievskaia2018-06-121-2/+4
| | | | | | | | | | | | | | | Currently, when IO to DS fails, client returns the layout and retries against the MDS. However, then on umounting (inode eviction) it returns the layout again. This is because pnfs_return_layout() was changed in commit d78471d32bb6 ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR") to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned the layout, it will be returned again. Instead, let's also check if we have already marked the layout invalid. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Don't call commit on failed layoutget-on-openTrond Myklebust2018-05-311-6/+1
| | | | | | | If the layoutget on open call failed, we can't really commit the inode, so don't bother calling it. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>