summaryrefslogtreecommitdiffstats
path: root/fs/nfs/flexfilelayout/flexfilelayout.c
Commit message (Collapse)AuthorAgeFilesLines
...
* NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client()Trond Myklebust2019-03-011-3/+3
| | | | | | | Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh()Trond Myklebust2019-03-011-2/+2
| | | | | | | Pass in a pointer to the mirror rather than having to retrieve it from the array and then verify the resulting pointer. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Speed up read failover when DSes are downTrond Myklebust2019-03-011-12/+62
| | | | | | | If we notice that a DS may be down, we should attempt to read from the other mirrors first before we go back to retry the dead DS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Remove bogus checks for invalid deviceidsTrond Myklebust2019-03-011-20/+0
| | | | | | We already check the deviceids before we start the RPC call. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: refactor calls to fs4_ff_layout_prepare_ds()Trond Myklebust2019-03-011-6/+20
| | | | | | | | | While we may want to skip attempting to connect to a downed mirror when we're deciding which mirror to select for a read, we do not want to do so once we've committed to attempting the I/O in ff_layout_read/write_pagelist(), or ff_layout_initiate_commit() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Send LAYOUTERROR when failing over mirrored readsTrond Myklebust2019-03-011-5/+55
| | | | | | | | | | | | | When a read to the preferred mirror returns an error, the flexfiles driver records the error in the inode list and currently marks the layout for return before failing over the attempted read to the next mirror. What we actually want to do is fire off a LAYOUTERROR to notify the MDS that there is an issue with the preferred mirror, then we fail over. Only once we've failed to read from all mirrors should we return the layout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/flexfiles: Abort I/O early if the layout segment was invalidatedTrond Myklebust2019-03-011-0/+17
| | | | | | | | | | | | If a layout segment gets invalidated while a pNFS I/O operation is queued for transmission, then we ideally want to abort immediately. This is particularly the case when there is a large number of I/O related RPCs queued in the RPC layer, and the layout segment gets invalidated due to an ENOSPC error, or an EACCES (because the client was fenced). We may end up forced to spam the MDS with a lot of otherwise unnecessary LAYOUTERRORs after that I/O fails. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Fix up sparse RCU annotationsTrond Myklebust2019-03-011-2/+2
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* SUNRPC: Add xdr_stream::rqst fieldChuck Lever2019-02-131-1/+1
| | | | | | | | | | | | | | Having access to the controlling rpc_rqst means a trace point in the XDR code can report: - the XID - the task ID and client ID - the p_name of RPC being processed Subsequent patches will introduce such trace points. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.NeilBrown2018-12-191-22/+11
| | | | | | | | | | | | | | | | | | | | | | SUNRPC has two sorts of credentials, both of which appear as "struct rpc_cred". There are "generic credentials" which are supplied by clients such as NFS and passed in 'struct rpc_message' to indicate which user should be used to authorize the request, and there are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS which describe the credential to be sent over the wires. This patch replaces all the generic credentials by 'struct cred' pointers - the credential structure used throughout Linux. For machine credentials, there is a special 'struct cred *' pointer which is statically allocated and recognized where needed as having a special meaning. A look-up of a low-level cred will map this to a machine credential. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: remove uid and gid from struct auth_credNeilBrown2018-12-191-6/+8
| | | | | | | Use cred->fsuid and cred->fsgid instead. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: remove groupinfo from struct auth_cred.NeilBrown2018-12-191-13/+1
| | | | | | | We can use cred->groupinfo (from the 'struct cred') instead. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: add 'struct cred *' to auth_cred and rpc_credNeilBrown2018-12-191-0/+17
| | | | | | | | | | | | | | | | | | | | The SUNRPC credential framework was put together before Linux has 'struct cred'. Now that we have it, it makes sense to use it. This first step just includes a suitable 'struct cred *' pointer in every 'struct auth_cred' and almost every 'struct rpc_cred'. The rpc_cred used for auth_null has a NULL 'struct cred *' as nothing else really makes sense. For rpc_cred, the pointer is reference counted. For auth_cred it isn't. struct auth_cred are either allocated on the stack, in which case the thread owns a reference to the auth, or are part of 'struct generic_cred' in which case gc_base owns the reference, and "acred" shares it. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* flexfiles: enforce per-mirror stateid only for v4 DSesTigran Mkrtchyan2018-12-021-2/+4
| | | | | | | | Since commit bb21ce0ad227 we always enforce per-mirror stateid. However, this makes sense only for v4+ servers. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* flexfiles: use per-mirror specified stateid for IOTigran Mkrtchyan2018-11-221-12/+9
| | | | | | | | | | | | | | | | | rfc8435 says: For tight coupling, ffds_stateid provides the stateid to be used by the client to access the file. However current implementation replaces per-mirror provided stateid with by open or lock stateid. Ensure that per-mirror stateid is used by ff_layout_write_prepare_v4 and nfs4_ff_layout_prepare_ds. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Don't allocate more pages than we need to fit a layoutget responseTrond Myklebust2018-09-301-0/+1
| | | | | | | | For the 'files' and 'flexfiles' layout types, we do not expect the reply to be any larger than 4k. The block and scsi layout types are a little more greedy, so we keep allocating the maximum response size for now. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS/flexfiles: ff_layout_pg_init_read should exit on errorTrond Myklebust2018-08-211-17/+11
| | | | | | | | If we get an error while retrieving the layout, then we should report it rather than falling back to I/O through the MDS. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS/flexfiles: Ensure we always return a layout if it has layoutstatsTrond Myklebust2018-07-261-0/+3
| | | | | | | If a layout segment is carrying layoutstats or layout error information, then we always want to return it rather than using a forgetful model. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* Merge tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2018-06-221-5/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Hightlights include: - fix an rcu deadlock in nfs_delegation_find_inode() - fix NFSv4 deadlocks due to not freeing the session slot in layoutget - don't send layoutreturn if the layout is already invalid - prevent duplicate XID allocation - flexfiles: Don't tie up all the rpciod threads in resends" * tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS/flexfiles: Process writeback resends from nfsiod context as well pNFS/flexfiles: Don't tie up all the rpciod threads in resends sunrpc: Prevent duplicate XID allocation pNFS: Don't send layoutreturn if the layout is already invalid pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception NFS: Fix an rcu deadlock in nfs_delegation_find_inode()
| * pNFS/flexfiles: Process writeback resends from nfsiod context as wellTrond Myklebust2018-06-191-2/+8
| | | | | | | | | | | | | | | | Although the writeback resends are more robust than the reads, since they are not immediately rescheduled by the same thread, we are better off processing them in the same place as the reads. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * pNFS/flexfiles: Don't tie up all the rpciod threads in resendsTrond Myklebust2018-06-191-3/+8
| | | | | | | | | | | | | | | | | | | | We do not want to have rpciod threads perform recursive calls into the RPC layer since that can deadlock. In particular, having to wait for a layoutget can be nasty... We want rather to defer scheduling those retries until we're in the rpc_release() callback, since that is called from the nfsiod workqueue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | Merge tag 'overflow-v4.18-rc1-part2' of ↵Linus Torvalds2018-06-121-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more overflow updates from Kees Cook: "The rest of the overflow changes for v4.18-rc1. This includes the explicit overflow fixes from Silvio, further struct_size() conversions from Matthew, and a bug fix from Dan. But the bulk of it is the treewide conversions to use either the 2-factor argument allocators (e.g. kmalloc(a * b, ...) into kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a * b) into vmalloc(array_size(a, b)). Coccinelle was fighting me on several fronts, so I've done a bunch of manual whitespace updates in the patches as well. Summary: - Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees)" * tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits) treewide: Use array_size in f2fs_kvzalloc() treewide: Use array_size() in f2fs_kzalloc() treewide: Use array_size() in f2fs_kmalloc() treewide: Use array_size() in sock_kmalloc() treewide: Use array_size() in kvzalloc_node() treewide: Use array_size() in vzalloc_node() treewide: Use array_size() in vzalloc() treewide: Use array_size() in vmalloc() treewide: devm_kzalloc() -> devm_kcalloc() treewide: devm_kmalloc() -> devm_kmalloc_array() treewide: kvzalloc() -> kvcalloc() treewide: kvmalloc() -> kvmalloc_array() treewide: kzalloc_node() -> kcalloc_node() treewide: kzalloc() -> kcalloc() treewide: kmalloc() -> kmalloc_array() mm: Introduce kvcalloc() video: uvesafb: Fix integer overflow in allocation UBIFS: Fix potential integer overflow in allocation leds: Use struct_size() in allocation Convert intel uncore to struct_size ...
| * treewide: kzalloc() -> kcalloc()Kees Cook2018-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
* | pnfs: Add layout driver flag PNFS_LAYOUTGET_ON_OPENFred Isaman2018-05-311-0/+1
|/ | | | | | | Driver can set flag to allow LAYOUTGET to be sent with OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* fs, nfs: convert nfs_client.cl_count from atomic_t to refcount_tElena Reshetova2017-11-171-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs_client.cl_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* fs, nfs: convert nfs4_ff_layout_mirror.ref from atomic_t to refcount_tElena Reshetova2017-11-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_ff_layout_mirror.ref is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()Trond Myklebust2017-07-191-0/+4
| | | | | | | | If the layout has expired due to a fencing event, then we should not attempt to commit to the DS. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* PNFS for stateid errors retry against MDS firstOlga Kornievskaia2017-07-131-33/+0
| | | | | | | | | | | | | | Upon receiving a stateid error such as BAD_STATEID, the client should retry the operation against the MDS before deciding to do stateid recovery. Previously, the code would initiate state recovery and it could lead to a race in a state manager that could chose an incorrect recovery method which would lead to the EIO failure for the application. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()Dan Carpenter2017-05-241-0/+1
| | | | | | | | | | If xdr_inline_decode() fails then we end up returning ERR_PTR(0). The caller treats NULL returns as -ENOMEM so it doesn't really hurt runtime, but obviously we intended to set an error code here. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabledTrond Myklebust2017-05-091-0/+11
| | | | | | | | | Layoutstats is always desirable when using the flexfiles driver, so we should enable it if that driver is being loaded. It is safe to do so, because even when the mount specifies NFSv4.1, we will turn it off if the server tells us it is unsupported. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure pathTrond Myklebust2017-04-291-3/+8
| | | | | | | | | | If the attempt to write through pNFS fails, we need to use the same failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS flag is set or we have sufficient valid DSes, then we must retry through pNFS Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Ensure we check layout segment validity in the pg_init() callbackTrond Myklebust2017-04-251-0/+2
| | | | | | | | If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2017-03-011-39/+21
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "Highlights include: Stable bugfixes: - NFSv4: Fix memory and state leak in _nfs4_open_and_get_state - xprtrdma: Fix Read chunk padding - xprtrdma: Per-connection pad optimization - xprtrdma: Disable pad optimization by default - xprtrdma: Reduce required number of send SGEs - nlm: Ensure callback code also checks that the files match - pNFS/flexfiles: If the layout is invalid, it must be updated before retrying - NFSv4: Fix reboot recovery in copy offload - Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE" - NFSv4: fix getacl head length estimation - NFSv4: fix getacl ERANGE for sum ACL buffer sizes Features: - Add and use dprintk_cont macros - Various cleanups to NFS v4.x to reduce code duplication and complexity - Remove unused cr_magic related code - Improvements to sunrpc "read from buffer" code - Clean up sunrpc timeout code and allow changing TCP timeout parameters - Remove duplicate mw_list management code in xprtrdma - Add generic functions for encoding and decoding xdr streams Bugfixes: - Clean up nfs_show_mountd_netid - Make layoutreturn_ops static and use NULL instead of 0 to fix sparse warnings - Properly handle -ERESTARTSYS in nfs_rename() - Check if register_shrinker() failed during rpcauth_init() - Properly clean up procfs/pipefs entries - Various NFS over RDMA related fixes - Silence unititialized variable warning in sunrpc" * tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (64 commits) NFSv4: fix getacl ERANGE for some ACL buffer sizes NFSv4: fix getacl head length estimation Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE" NFSv4: Fix reboot recovery in copy offload pNFS/flexfiles: If the layout is invalid, it must be updated before retrying NFSv4: Clean up owner/group attribute decode SUNRPC: Add a helper function xdr_stream_decode_string_dup() NFSv4: Remove bogus "struct nfs_client" argument from decode_ace() NFSv4: Fix the underestimation of delegation XDR space reservation NFSv4: Replace callback string decode function with a generic NFSv4: Replace the open coded decode_opaque_inline() with the new generic NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* generics SUNRPC: Add generic helpers for xdr_stream encode/decode sunrpc: silence uninitialized variable warning nlm: Ensure callback code also checks that the files match sunrpc: Allow xprt->ops->timer method to sleep xprtrdma: Refactor management of mw_list field xprtrdma: Handle stale connection rejection xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs xprtrdma: Shrink send SGEs array ...
| * pNFS/flexfiles: If the layout is invalid, it must be updated before retryingTrond Myklebust2017-02-221-6/+7
| | | | | | | | | | | | | | | | | | | | | | If we see that our pNFS READ/WRITE/COMMIT operation failed, but we also see that our layout segment is no longer valid, then we need to get a new layout segment before retrying. Fixes: 90816d1ddacf ("NFSv4.1/flexfiles: Don't mark the entire deviceid...") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* genericsTrond Myklebust2017-02-211-4/+1
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * pNFS/flexfiles: Make local symbol layoutreturn_ops staticWei Yongjun2017-01-301-1/+1
| | | | | | | | | | | | | | | | | | | | Fixes the following sparse warning: fs/nfs/flexfilelayout/flexfilelayout.c:2114:34: warning: symbol 'layoutreturn_ops' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: Use nfs4_setup_sequence() everywhereAnna Schumaker2017-01-301-28/+12
| | | | | | | | | | | | | | This does the right thing depending on if we have a session, rather than needing to handle this manually in multiple places. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | lib/vsprintf.c: remove %Z supportAlexey Dobriyan2017-02-271-2/+2
|/ | | | | | | | | | | | | | | | | | Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ktime: Get rid of ktime_equal()Thomas Gleixner2016-12-251-1/+1
| | | | | | | | No point in going through loops and hoops instead of just comparing the values. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
* ktime: Get rid of the unionThomas Gleixner2016-12-251-2/+1
| | | | | | | | | | | | | | | ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless. Get rid of the union and just keep ktime_t as simple typedef of type s64. The conversion was done with coccinelle and some manual mopping up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
* pNFS/flexfiles: delete deviceid, don't mark inactiveWeston Andros Adamson2016-12-191-2/+4
| | | | | | | | | | Instead of marking a device inactive, remove it from the cache entirely. Flexfiles has a way to report errors back to the server, so we don't want to stop devices from being tried again for 120 seconds. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Ensure we have enough buffer for layoutreturnTrond Myklebust2016-12-091-6/+26
| | | | | | | | | | The flexfiles client can piggyback both layout errors and layoutstats as part of the layoutreturn. Both these payloads can get large, with 20 layout error entries taking up about 1.2K, and 4 layoutstats entries taking up another 1K. This patch allows a maximum payload of 4k by allocating a full page. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Remove a redundant parameter in ff_layout_encode_ioerr()Trond Myklebust2016-12-091-6/+4
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Fix a deadlock on LAYOUTGETFred Isaman2016-12-081-34/+3
| | | | | | | | | | | | | | | We encountered a deadlock where the SEQUENCE that accompanied the LAYOUTGET triggered a session drain, while ff_layout_alloc_lseg triggered a GETDEVICEINFO. The GETDEVICEINFO hung waiting for the session drain, while the LAYOUTGET held the slot waiting for alloc_lseg to finish. Avoid this by moving the call to nfs4_find_get_deviceid out of ff_layout_alloc_lseg and into nfs4_ff_layout_prepare_ds. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> [dros@primarydata.com: pNFS/flexfiles: fix races in ff_layout_mirror_valid] Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Support sending layoutstats in layoutreturnTrond Myklebust2016-12-031-6/+76
| | | | | | | Add the ability to send an array of layoutstats entries as part of layoutreturn. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Minor refactoring before adding iostats to layoutreturnTrond Myklebust2016-12-031-25/+34
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Fix up read of mirror statsTrond Myklebust2016-12-031-0/+2
| | | | | | Need to lock while reading in order to ensure 64-bit reads are correct. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Clean up layoutstatsTrond Myklebust2016-12-031-20/+7
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Refactor encoding of the layoutreturn payloadTrond Myklebust2016-12-031-13/+50
| | | | | | | | Add the layout error payload to the flexfiles layoutreturn private data, and set up the encoding mechanisms. This is a refactoring in preparation for adding the layout iostats payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Only send layoutstats updates for mirrors that were updatedTrond Myklebust2016-12-021-0/+6
| | | | | | | If there have been no reads or writes to a given mirror since the last layoutstats update, then don't resend the same data. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>