summaryrefslogtreecommitdiffstats
path: root/fs/nfs
Commit message (Collapse)AuthorAgeFilesLines
* pNFS/NFSv4: Improve rejection of out-of-order layoutsTrond Myklebust2021-01-241-6/+16
| | | | | | | | | | | | If a layoutget ends up being reordered w.r.t. a layoutreturn, e.g. due to a layoutget-on-open not knowing a priori which file to lock, then we must assume the layout is no longer being considered valid state by the server. Incrementally improve our ability to reject such states by using the cached old stateid in conjunction with the plh_barrier to try to identify them. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturnTrond Myklebust2021-01-241-18/+21
| | | | | | | | | When we're scheduling a layoutreturn, we need to ignore any further incoming layouts with sequence ids that are going to be affected by the layout return. Fixes: 44ea8dfce021 ("NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()Trond Myklebust2021-01-241-2/+7
| | | | | | | | | If the server returns a new stateid that does not match the one in our cache, then try to return the one we hold instead of just invalidating it on the client side. This ensures that both client and server will agree that the stateid is invalid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()Trond Myklebust2021-01-241-0/+1
| | | | | | | | | If the server returns a new stateid that does not match the one in our cache, then pnfs_layout_process() will leak the layout segments returned by pnfs_mark_layout_stateid_invalid(). Fixes: 9888d837f3cf ("pNFS: Force a retry of LAYOUTGET if the stateid doesn't match our cache") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: nfs_igrab_and_active must first reference the superblockTrond Myklebust2021-01-101-5/+7
| | | | | | | | | Before referencing the inode, we must ensure that the superblock can be referenced. Otherwise, we can end up with iput() calling superblock operations that are no longer valid or accessible. Fixes: ea7c38fef0b7 ("NFSv4: Ensure we reference the inode for return-on-close in delegreturn") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: nfs_delegation_find_inode_server must first reference the superblockTrond Myklebust2021-01-101-5/+7
| | | | | | | | | Before referencing the inode, we must ensure that the superblock can be referenced. Otherwise, we can end up with iput() calling superblock operations that are no longer valid or accessible. Fixes: e39d8a186ed0 ("NFSv4: Fix an Oops during delegation callbacks") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counterTrond Myklebust2021-01-101-0/+1
| | | | | | | | If we exit _lgopen_prepare_attached() without setting a layout, we will currently leak the plh_outstanding counter. Fixes: 411ae722d10a ("pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit()Trond Myklebust2021-01-101-3/+5
| | | | | | | | | | | | We must ensure that we pass a layout segment to nfs_retry_commit() when we're cleaning up after pnfs_bucket_alloc_ds_commits(). Otherwise, requests that should be committed to the DS will get committed to the MDS. Do so by ensuring that pnfs_bucket_get_committing() always tries to return a layout segment when it returns a non-empty page list. Fixes: c84bea59449a ("NFS/pNFS: Simplify bucket layout segment reference counting") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the requestTrond Myklebust2021-01-101-9/+5
| | | | | | | | | | In pnfs_generic_clear_request_commit(), we try calling pnfs_free_bucket_lseg() before we remove the request from the DS bucket. That will always fail, since the point is to test for whether or not that bucket is empty. Fixes: c84bea59449a ("NFS/pNFS: Simplify bucket layout segment reference counting") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Stricter ordering of layoutget and layoutreturnTrond Myklebust2021-01-101-22/+21
| | | | | | | | If a layout return is in progress, we should wait for it to complete, in case the layout segment we are picking up gets returned too. Fixes: 30cb3ee299cb ("pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Clean up pnfs_layoutreturn_free_lsegs()Trond Myklebust2021-01-101-5/+4
| | | | | | | Remove the check for whether or not the stateid is NULL, and fix up the callers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: We want return-on-close to complete when evicting the inodeTrond Myklebust2021-01-103-26/+16
| | | | | | | | If the inode is being evicted, it should be safe to run return-on-close, so we should do it to ensure we don't inadvertently leak layout segments. Fixes: 1c5bd76d17cc ("pNFS: Enable layoutreturn operation for return-on-close") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Mark layout for return if return-on-close was not sentTrond Myklebust2021-01-101-0/+6
| | | | | | | | If the layout return-on-close failed because the layoutreturn was never sent, then we should mark the layout for return again. Fixes: 9c47b18cf722 ("pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Adjust fs_context error loggingScott Mayhew2021-01-102-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | Several existing dprink()/dfprintk() calls were converted to use the new mount API logging macros by commit ce8866f0913f ("NFS: Attach supplementary error information to fs_context"). If the fs_context was not created using fsopen() then it will not have had a log buffer allocated for it, and the new mount API logging macros will wind up calling printk(). This can result in syslog messages being logged where previously there were none... most notably "NFS4: Couldn't follow remote path", which can happen if the client is auto-negotiating a protocol version with an NFS server that doesn't support the higher v4.x versions. Convert the nfs_errorf(), nfs_invalf(), and nfs_warnf() macros to check for the existence of the fs_context's log buffer and call dprintk() if it doesn't exist. Add nfs_ferrorf(), nfs_finvalf(), and nfs_warnf(), which do the same thing but take an NFS debug flag as an argument and call dfprintk(). Finally, modify the "NFS4: Couldn't follow remote path" message to use nfs_ferrorf(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207385 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Fixes: ce8866f0913f ("NFS: Attach supplementary error information to fs_context.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lockDave Wysochanski2021-01-061-1/+1
| | | | | | | | | It is only safe to call the tracepoint before rpc_put_task() because 'data' is freed inside nfs4_lock_release (rpc_release). Fixes: 48c9579a1afe ("Adding stateid information to tracepoints") Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* Merge tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2020-12-1722-524/+922
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - NFSv3: Add emulation of lookupp() to improve open_by_filehandle() support - A series of patches to improve readdir performance, particularly with large directories - Basic support for using NFS/RDMA with the pNFS files and flexfiles drivers - Micro-optimisations for RDMA - RDMA tracing improvements Bugfixes: - Fix a long standing bug with xs_read_xdr_buf() when receiving partial pages (Dan Aloni) - Various fixes for getxattr and listxattr, when used over non-TCP transports - Fixes for containerised NFS from Sargun Dhillon - switch nfsiod to be an UNBOUND workqueue (Neil Brown) - READDIR should not ask for security label information if there is no LSM policy (Olga Kornievskaia) - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay) - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code - A couple of fixes for pnfs/flexfiles read failover Cleanups: - Various cleanups for the SUNRPC xdr code in conjunction with the READ_PLUS fixes" * tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits) NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read() pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read NFSv4/pnfs: Add tracing for the deviceid cache fs/lockd: convert comma to semicolon NFSv4.2: fix error return on memory allocation failure NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer NFSv4.2: decode_read_plus_hole() needs to check the extent offset NFSv4.2: decode_read_plus_data() must skip padding after data segment NFSv4.2: Ensure we always reset the result->count in decode_read_plus() SUNRPC: When expanding the buffer, we may need grow the sparse pages SUNRPC: Cleanup - constify a number of xdr_buf helpers SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field SUNRPC: _copy_to/from_pages() now check for zero length SUNRPC: Cleanup xdr_shrink_bufhead() SUNRPC: Fix xdr_expand_hole() SUNRPC: Fixes for xdr_align_data() SUNRPC: _shift_data_left/right_pages should check the shift length ...
| * NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()Trond Myklebust2020-12-161-1/+1
| | | | | | | | | | | | | | Don't bump the index twice. Fixes: 563c53e73b8b ("NFS: Fix flexfiles read failover") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_readTrond Myklebust2020-12-161-5/+1
| | | | | | | | | | | | | | | | The callers of ff_layout_choose_ds_for_read() should decide whether or not they want to return the layout on error. Sometimes, we may just want to retry from the beginning. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4/pnfs: Add tracing for the deviceid cacheTrond Myklebust2020-12-163-8/+92
| | | | | | | | | | | | Add tracepoints to allow debugging of the deviceid cache. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: fix error return on memory allocation failureColin Ian King2020-12-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | Currently when an alloc_page fails the error return is not set in variable err and a garbage initialized value is returned. Fix this by setting err to -ENOMEM before taking the error return path. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: a1f26739ccdc ("NFSv4.2: improve page handling for GETXATTR") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yetTrond Myklebust2020-12-141-7/+8
| | | | | | | | | | | | | | | | We have no way of tracking server READ_PLUS support in pNFS for now, so just disable it. Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: Deal with potential READ_PLUS data extent buffer overflowTrond Myklebust2020-12-141-2/+7
| | | | | | | | | | | | | | If the server returns more data than we have buffer space for, then we need to truncate and exit early. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflowTrond Myklebust2020-12-141-19/+17
| | | | | | | | | | | | | | Expanding the READ_PLUS extents can cause the read buffer to overflow. If it does, then don't error, but just exit early. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: Handle hole lengths that exceed the READ_PLUS read bufferTrond Myklebust2020-12-141-0/+6
| | | | | | | | | | | | | | If a hole extends beyond the READ_PLUS read buffer, then we want to fill just the remaining buffer with zeros. Also ignore eof... Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: decode_read_plus_hole() needs to check the extent offsetTrond Myklebust2020-12-141-3/+21
| | | | | | | | | | | | | | | | The server is allowed to return a hole extent with an offset that starts before the offset supplied in the READ_PLUS argument. Ensure that we support that case too. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: decode_read_plus_data() must skip padding after data segmentTrond Myklebust2020-12-141-1/+3
| | | | | | | | | | | | | | All XDR opaque object sizes are 32-bit aligned, and a data segment is no exception. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: Ensure we always reset the result->count in decode_read_plus()Trond Myklebust2020-12-141-0/+1
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.1: use BITS_PER_LONG macro in nfs4session.hGeliang Tang2020-12-141-1/+1
| | | | | | | | | | | | | | Use the existing BITS_PER_LONG macro instead of calculating the value. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: improve page handling for GETXATTRFrank van der Linden2020-12-142-16/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | XDRBUF_SPARSE_PAGES can cause problems for the RDMA transport, and it's easy enough to allocate enough pages for the request up front, so do that. Also, since we've allocated the pages anyway, use the full page aligned length for the receive buffer. This will allow caching of valid replies that are too large for the caller, but that still fit in the allocated pages. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4.2: Fix up the get/listxattr calls to rpc_prepare_reply_pages()Trond Myklebust2020-12-101-5/+7
| | | | | | | | | | | | | | Ensure that both getxattr and listxattr page array are correctly aligned, and that getxattr correctly accounts for the page padding word. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFS: switch nfsiod to be an UNBOUND workqueue.NeilBrown2020-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfsiod is currently a concurrency-managed workqueue (CMWQ). This means that workitems scheduled to nfsiod on a given CPU are queued behind all other work items queued on any CMWQ on the same CPU. This can introduce unexpected latency. Occaionally nfsiod can even cause excessive latency. If the work item to complete a CLOSE request calls the final iput() on an inode, the address_space of that inode will be dismantled. This takes time proportional to the number of in-memory pages, which on a large host working on large files (e.g.. 5TB), can be a large number of pages resulting in a noticable number of seconds. We can avoid these latency problems by switching nfsiod to WQ_UNBOUND. This causes each concurrent work item to gets a dedicated thread which can be scheduled to an idle CPU. There is precedent for this as several other filesystems use WQ_UNBOUND workqueue for handling various async events. Signed-off-by: NeilBrown <neilb@suse.de> Fixes: ada609ee2ac2 ("workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4: Refactor to use user namespaces for nfs4idmapSargun Dhillon2020-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In several patches work has been done to enable NFSv4 to use user namespaces: 58002399da65: NFSv4: Convert the NFS client idmapper to use the container user namespace 3b7eb5e35d0f: NFS: When mounting, don't share filesystems between different user namespaces Unfortunately, the userspace APIs were only such that the userspace facing side of the filesystem (superblock s_user_ns) could be set to a non init user namespace. This furthers the fs_context related refactoring, and piggybacks on top of that logic, so the superblock user namespace, and the NFS user namespace are the same. Users can still use rpc.idmapd if they choose to, but there are complexities with user namespaces and request-key that have yet to be addresssed. Eventually, we will need to at least: * Come up with an upcall mechanism that can be triggered inside of the container, or safely triggered outside, with the requisite context to do the right mapping. * Handle whatever refactoring needs to be done in net/sunrpc. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Tested-by: Alban Crequy <alban.crequy@gmail.com> Fixes: 62a55d088cd8 ("NFS: Additional refactoring for fs_context conversion") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFS: NFSv2/NFSv3: Use cred from fs_context during mountSargun Dhillon2020-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was refactoring done to use the fs_context for mounting done in: 62a55d088cd87: NFS: Additional refactoring for fs_context conversion This made it so that the net_ns is fetched from the fs_context (the netns that fsopen is called in). This change also makes it so that the credential fetched during fsopen is used as well as the net_ns. NFS has already had a number of changes to prepare it for user namespaces: 1a58e8a0e5c1: NFS: Store the credential of the mount process in the nfs_server 264d948ce7d0: NFS: Convert NFSv3 to use the container user namespace c207db2f5da5: NFS: Convert NFSv2 to use the container user namespace Previously, different credentials could be used for creation of the fs_context versus creation of the nfs_server, as FSCONFIG_CMD_CREATE did the actual credential check, and that's where current_creds() were fetched. This meant that the user namespace which fsopen was called in could be a non-init user namespace. This still requires that the user that calls FSCONFIG_CMD_CREATE has CAP_SYS_ADMIN in the init user ns. This roughly allows a privileged user to mount on behalf of an unprivileged usernamespace, by forking off and calling fsopen in the unprivileged user namespace. It can then pass back that fsfd to the privileged process which can configure the NFS mount, and then it can call FSCONFIG_CMD_CREATE before switching back into the mount namespace of the container, and finish up the mounting process and call fsmount and move_mount. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Tested-by: Alban Crequy <alban.crequy@gmail.com> Fixes: 62a55d088cd8 ("NFS: Additional refactoring for fs_context conversion") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4: Fix a pNFS layout related use-after-free race when freeing the inodeTrond Myklebust2020-12-023-3/+37
| | | | | | | | | | | | | | | | When returning the layout in nfs4_evict_inode(), we need to ensure that the layout is actually done being freed before we can proceed to free the inode itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4: Fix open coded xdr_stream_remaining()Trond Myklebust2020-12-021-3/+3
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages()Trond Myklebust2020-12-023-39/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rpc_prepare_reply_pages() currently expects the 'hdrsize' argument to contain the length of the data that we expect to want placed in the head kvec plus a count of 1 word of padding that is placed after the page data. This is very confusing when trying to read the code, and sometimes leads to callers adding an arbitrary value of '1' just in order to satisfy the requirement (whether or not the page data actually needs such padding). This patch aims to clarify the code by changing the 'hdrsize' argument to remove that 1 word of padding. This means we need to subtract the padding from all the existing callers. Fixes: 02ef04e432ba ("NFS: Account for XDR pad of buf->pages") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4: Fix the alignment of page data in the getdeviceinfo replyTrond Myklebust2020-12-021-3/+7
| | | | | | | | | | | | | | We can fit the device_addr4 opaque data padding in the pages. Fixes: cf500bac8fd4 ("SUNRPC: Introduce rpc_prepare_reply_pages()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * pNFS: Clean up open coded xdr string decodingTrond Myklebust2020-12-021-36/+7
| | | | | | | | | | | | | | Use the existing xdr_stream_decode_string_dup() to safely decode into kmalloced strings. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * pNFS/flexfiles: Fix up layoutstats reporting for non-TCP transportsTrond Myklebust2020-12-021-7/+2
| | | | | | | | | | | | | | Ensure that we report the correct netid when using UDP or RDMA transports to the DSes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4/pNFS: Store the transport type in struct nfs4_pnfs_ds_addrTrond Myklebust2020-12-022-15/+21
| | | | | | | | | | | | | | | | | | We want to enable RDMA and UDP as valid transport methods if a GETDEVICEINFO call specifies it. Do so by adding a parser for the netid that translates it to an appropriate argument for the RPC transport layer. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * pNFS: Add helpers for allocation/free of struct nfs4_pnfs_ds_addrTrond Myklebust2020-12-021-5/+16
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFSv4/pNFS: Use connections to a DS that are all of the same protocol familyTrond Myklebust2020-12-021-1/+6
| | | | | | | | | | | | | | | | If the pNFS metadata server advertises multiple addresses for the same data server, we should try to connect to just one protocol family and transport type on the assumption that homogeneity will improve performance. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFS: Switch mount code to use xprt_find_transport_ident()Trond Myklebust2020-12-021-9/+12
| | | | | | | | | | | | | | Switch the mount code to use xprt_find_transport_ident() and to check the results before allowing the mount to proceed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFS: Do uncached readdir when we're seeking a cookie in an empty page cacheTrond Myklebust2020-12-021-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the directory is changing, causing the page cache to get invalidated while we are listing the contents, then the NFS client is currently forced to read in the entire directory contents from scratch, because it needs to perform a linear search for the readdir cookie. While this is not an issue for small directories, it does not scale to directories with millions of entries. In order to be able to deal with large directories that are changing, add a heuristic to ensure that if the page cache is empty, and we are searching for a cookie that is not the zero cookie, we just default to performing uncached readdir. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Reduce number of RPC calls when doing uncached readdirTrond Myklebust2020-12-021-36/+69
| | | | | | | | | | | | | | | | | | | | If we're doing uncached readdir, allocate multiple pages in order to try to avoid duplicate RPC calls for the same getdents() call. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Optimisations for monotonically increasing readdir cookiesTrond Myklebust2020-12-021-1/+22
| | | | | | | | | | | | | | | | | | | | | | If the server is handing out monotonically increasing readdir cookie values, then we can optimise away searches through pages that contain cookies that lie outside our search range. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Improve handling of directory verifiersTrond Myklebust2020-12-022-19/+23
| | | | | | | | | | | | | | | | | | | | | | | | If the server insists on using the readdir verifiers in order to allow cookies to expire, then we should ensure that we cache the verifier with the cookie, so that we can return an error if the application tries to use the expired cookie. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Handle NFS4ERR_NOT_SAME and NFSERR_BADCOOKIE from readdir callsTrond Myklebust2020-12-022-8/+18
| | | | | | | | | | | | | | | | | | | | | | If the server returns NFS4ERR_NOT_SAME or tells us that the cookie is bad in response to a READDIR call, then we should empty the page cache so that we can fill it from scratch again. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Allow the NFS generic code to pass in a verifier to readdirTrond Myklebust2020-12-024-51/+61
| | | | | | | | | | | | | | | | | | | | | | If we're ever going to allow support for servers that use the readdir verifier, then that use needs to be managed by the middle layers as those need to be able to reject cookies from other verifiers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
| * NFS: Cleanup to remove nfs_readdir_descriptor_t typedefTrond Myklebust2020-12-021-17/+13
| | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>