summaryrefslogtreecommitdiffstats
path: root/fs/nfs/internal.h
Commit message (Collapse)AuthorAgeFilesLines
* NFS: Clean up cache validity checkingTrond Myklebust2016-12-191-0/+1
| | | | | | | Consolidate the open-coded checking of NFS_I(inode)->cache_validity into a couple of helper functions. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Allow getattr to also report readdirplus cache hitsTrond Myklebust2016-12-021-0/+1
| | | | | | | | | | | If the use called stat() on an 'ls -l' workload, and the attribute cache was successfully revalidate by READDIRPLUS, then we want to report that back so that the readdir code continues to use readdirplus. Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Remove unused authflavour parameter from nfs_get_client()Anna Schumaker2016-12-011-5/+3
| | | | | | | | This parameter hasn't been used since f8407299 (Linux 3.11-rc2), so let's remove it from this function and callers. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2016-10-131-4/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "Highlights include: Stable bugfixes: - sunrpc: fix writ espace race causing stalls - NFS: Fix inode corruption in nfs_prime_dcache() - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation() - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid - NFSv4: Open state recovery must account for file permission changes - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic Features: - Add support for tracking multiple layout types with an ordered list - Add support for using multiple backchannel threads on the client - Add support for pNFS file layout session trunking - Delay xprtrdma use of DMA API (for device driver removal) - Add support for xprtrdma remote invalidation - Add support for larger xprtrdma inline thresholds - Use a scatter/gather list for sending xprtrdma RPC calls - Add support for the CB_NOTIFY_LOCK callback - Improve hashing sunrpc auth_creds by using both uid and gid Bugfixes: - Fix xprtrdma use of DMA API - Validate filenames before adding to the dcache - Fix corruption of xdr->nwords in xdr_copy_to_scratch - Fix setting buffer length in xdr_set_next_buffer() - Don't deadlock the state manager on the SEQUENCE status flags - Various delegation and stateid related fixes - Retry operations if an interrupted slot receives EREMOTEIO - Make nfs boot time y2038 safe" * tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits) NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic fs: nfs: Make nfs boot time y2038 safe sunrpc: replace generic auth_cred hash with auth-specific function sunrpc: add RPCSEC_GSS hash_cred() function sunrpc: add auth_unix hash_cred() function sunrpc: add generic_auth hash_cred() function sunrpc: add hash_cred() function to rpc_authops struct Retry operation on EREMOTEIO on an interrupted slot pNFS: Fix atime updates on pNFS clients sunrpc: queue work on system_power_efficient_wq NFSv4.1: Even if the stateid is OK, we may need to recover the open modes NFSv4: If recovery failed for a specific open stateid, then don't retry NFSv4: Fix retry issues with nfs41_test/free_stateid NFSv4: Open state recovery must account for file permission changes NFSv4: Mark the lock and open stateids as invalid after freeing them NFSv4: Don't test open_stateid unless it is set NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation NFSv4: Fix a race when updating an open_stateid NFSv4: Fix a race in nfs_inode_reclaim_delegation() ...
| * pNFS: Fix atime updates on pNFS clientsTrond Myklebust2016-09-271-1/+0
| | | | | | | | | | | | | | | | | | Fix the code so that we always mark the atime as invalid in nfs4_read_done(). Currently, the expectation appears to be that the pNFS drivers should always do this, with the result that most of them don't. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * nfs: cover ->migratepage with CONFIG_MIGRATIONChao Yu2016-09-201-3/+0
| | | | | | | | | | | | | | | | | | It will be more clean to use CONFIG_MIGRATION to cover nfs' private .migratepage in nfs_file_aops like we do in other part of nfs operations. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS pnfs data server multipath session trunkingAndy Adamson2016-09-191-0/+3
| | | | | | | | | | | | | | | | | | | | Try all multipath addresses for a data server. The first address that successfully connects and creates a session is the DS mount address. All subsequent addresses are tested for session trunking and added as aliases. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2016-10-101-1/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning
| * | fs: make remaining filesystems use .rename2Miklos Szeredi2016-09-271-1/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is trivial to do: - add flags argument to foo_rename() - check if flags is zero - assign foo_rename() to .rename2 instead of .rename This doesn't mean it's impossible to support RENAME_NOREPLACE for these filesystems, but it is not trivial, like for local filesystems. RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible for a file to be created on one host while it is overwritten by rename on another host). Filesystems converted: 9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs. After this, we can get rid of the duplicate interfaces for rename. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Howells <dhowells@redhat.com> [AFS] Acked-by: Mike Marshall <hubcap@omnibond.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Mark Fasheh <mfasheh@suse.com>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2016-10-071-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge updates from Andrew Morton: - fsnotify updates - ocfs2 updates - all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits) console: don't prefer first registered if DT specifies stdout-path cred: simpler, 1D supplementary groups CREDITS: update Pavel's information, add GPG key, remove snail mail address mailmap: add Johan Hovold .gitattributes: set git diff driver for C source code files uprobes: remove function declarations from arch/{mips,s390} spelling.txt: "modeled" is spelt correctly nmi_backtrace: generate one-line reports for idle cpus arch/tile: adopt the new nmi_backtrace framework nmi_backtrace: do a local dump_stack() instead of a self-NMI nmi_backtrace: add more trigger_*_cpu_backtrace() methods min/max: remove sparse warnings when they're nested Documentation/filesystems/proc.txt: add more description for maps/smaps mm, proc: fix region lost in /proc/self/smaps proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self proc: add LSM hook checks to /proc/<tid>/timerslack_ns proc: relax /proc/<tid>/timerslack_ns capability requirements meminfo: break apart a very long seq_printf with #ifdefs seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char proc: faster /proc/*/status ...
| * | mm: remove page_file_indexHuang Ying2016-10-071-3/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After using the offset of the swap entry as the key of the swap cache, the page_index() becomes exactly same as page_file_index(). So the page_file_index() is removed and the callers are changed to use page_index() instead. Link: http://lkml.kernel.org/r/1473270649-27229-2-git-send-email-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* / switch generic_file_splice_read() to use of ->read_iter()Al Viro2016-10-051-2/+0
|/ | | | | | ... and kill the ->splice_read() instances that can be switched to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* NFS: Allow the mount option retrans=0Trond Myklebust2016-08-161-1/+4
| | | | | | | | | | | We should allow retrans=0 as just meaning that every timeout is a major timeout, and that there is no increment in the timeout value. For instance, this means that we would allow TCP users to specify a flat timeout value of 60s, by specifying "timeo=600,retrans=0" in their mount option string. Siged-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2016-07-301-11/+51
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - nfs: don't create zero-length requests - several LAYOUTGET bugfixes Features: - several performance related features - more aggressive caching when we can rely on close-to-open cache consistency - remove serialisation of O_DIRECT reads and writes - optimise several code paths to not flush to disk unnecessarily. However allow for the idiosyncracies of pNFS for those layout types that need to issue a LAYOUTCOMMIT before the metadata can be updated on the server. - SUNRPC updates to the client data receive path - pNFS/SCSI support RH/Fedora dm-mpath device nodes - pNFS files/flexfiles can now use unprivileged ports when the generic NFS mount options allow it. Bugfixes: - Don't use RDMA direct data placement together with data integrity or privacy security flavours - Remove the RDMA ALLPHYSICAL memory registration mode as it has potential security holes. - Several layout recall fixes to improve NFSv4.1 protocol compliance. - Fix an Oops in the pNFS files and flexfiles connection setup to the DS - Allow retry of operations that used a returned delegation stateid - Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding - Fix writeback races in nfs4_copy_range() and nfs42_proc_deallocate()" * tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (104 commits) pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding NFSv4: Clean up lookup of SECINFO_NO_NAME NFSv4.2: Fix warning "variable ‘stateids’ set but not used" NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’" SUNRPC: Fix a compiler warning in fs/nfs/clnt.c pNFS: Remove redundant smp_mb() from pnfs_init_lseg() pNFS: Cleanup - do layout segment initialisation in one place pNFS: Remove redundant stateid invalidation pNFS: Remove redundant pnfs_mark_layout_returned_if_empty() pNFS: Clear the layout metadata if the server changed the layout stateid pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid() NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id pNFS: Do not set plh_return_seq for non-callback related layoutreturns pNFS: Ensure layoutreturn acts as a completion for layout callbacks pNFS: Fix CB_LAYOUTRECALL stateid verification pNFS: Always update the layout barrier seqid on LAYOUTGET pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set pNFS: Clear the layout return tracking on layout reinitialisation pNFS: LAYOUTRETURN should only update the stateid if the layout is valid nfs: don't create zero-length requests ...
| * Merge branch 'writeback'Trond Myklebust2016-07-241-0/+40
| |\
| | * NFSv4.2: Fix writeback races in nfs4_copy_file_rangeTrond Myklebust2016-07-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to ensure that any writes to the destination file are serialised with the copy, meaning that the writeback has to occur under the inode lock. Also relax the writeback requirement on the source, and rely on the stateid checking to tell us if the source rebooted. Add the helper nfs_filemap_write_and_wait_range() to call pnfs_sync_inode() as is appropriate for pNFS servers that may need a layoutcommit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| | * NFS: Do not aggressively cache file attributes in the case of O_DIRECTTrond Myklebust2016-07-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | A file that is open for O_DIRECT is by definition not obeying close-to-open cache consistency semantics, so let's not cache the attributes too aggressively either. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| | * NFS: Do not serialise O_DIRECT reads and writesTrond Myklebust2016-07-051-0/+8
| | | | | | | | | | | | | | | | | | | | | Allow dio requests to be scheduled in parallel, but ensuring that they do not conflict with buffered I/O. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| | * NFS: Ensure we reset the write verifier 'committed' value on resend.Trond Myklebust2016-07-051-0/+17
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| | * NFS: Fix O_DIRECT verifier problemsTrond Myklebust2016-07-051-0/+7
| | | | | | | | | | | | | | | | | | | | | We should not be interested in looking at the value of the stable field, since that could take any value. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs4: flexfiles: respect noresvport when establishing connections to DSesTigran Mkrtchyan2016-07-191-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs4: clnt: respect noresvport when establishing connections to DSesTigran Mkrtchyan2016-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result: $ mount -o vers=4.1 dcache-lab007:/ /pnfs $ cp /etc/profile /pnfs tcp 0 0 131.169.185.68:1005 131.169.191.141:32049 ESTABLISHED tcp 0 0 131.169.185.68:751 131.169.191.144:2049 ESTABLISHED $ $ mount -o vers=4.1,noresvport dcache-lab007:/ /pnfs $ cp /etc/profile /pnfs tcp 0 0 131.169.185.68:34894 131.169.191.141:32049 ESTABLISHED tcp 0 0 131.169.185.68:35722 131.169.191.144:2049 ESTABLISHED $ Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flagsScott Mayhew2016-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A generic_cred can be used to look up a unx_cred or a gss_cred, so it's not really safe to use the the generic_cred->acred->ac_flags to store the NO_CRKEY_TIMEOUT flag. A lookup for a unx_cred triggered while the KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated with the auth_cred to be in a state where they're perpetually doing 4K NFS_FILE_SYNC writes. This can be reproduced as follows: 1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys. They do not need to be the same export, nor do they even need to be from the same NFS server. Also, v3 is fine. $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5 $ sudo mount -o v3,sec=sys server2:/export /mnt/sys 2. As the normal user, before accessing the kerberized mount, kinit with a short lifetime (but not so short that renewing the ticket would leave you within the 4-minute window again by the time the original ticket expires), e.g. $ kinit -l 10m -r 60m 3. Do some I/O to the kerberized mount and verify that the writes are wsize, UNSTABLE: $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1 4. Wait until you're within 4 minutes of key expiry, then do some more I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets set. Verify that the writes are 4K, FILE_SYNC: $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1 5. Now do some I/O to the sec=sys mount. This will cause RPC_CRED_NO_CRKEY_TIMEOUT to be set: $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1 6. Writes for that user will now be permanently 4K, FILE_SYNC for that user, regardless of which mount is being written to, until you reboot the client. Renewing the kerberos ticket (assuming it hasn't already expired) will have no effect. Grabbing a new kerberos ticket at this point will have no effect either. Move the flag to the auth->au_flags field (which is currently unused) and rename it slightly to reflect that it's no longer associated with the auth_cred->ac_flags. Add the rpc_auth to the arg list of rpcauth_cred_key_to_expire and check the au_flags there too. Finally, add the inode to the arg list of nfs_ctx_key_to_expire so we can determine the rpc_auth to pass to rpcauth_cred_key_to_expire. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DSTrond Myklebust2016-06-301-8/+8
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris Worley reports: RIP: 0010:[<ffffffffa0245f80>] [<ffffffffa0245f80>] rpc_new_client+0x2a0/0x2e0 [sunrpc] RSP: 0018:ffff880158f6f548 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880234f8bc00 RCX: 000000000000ea60 RDX: 0000000000074cc0 RSI: 000000000000ea60 RDI: ffff880234f8bcf0 RBP: ffff880158f6f588 R08: 000000000001ac80 R09: ffff880237003300 R10: ffff880201171000 R11: ffffea0000d75200 R12: ffffffffa03afc60 R13: ffff880230c18800 R14: 0000000000000000 R15: ffff880158f6f680 FS: 00007f0e32673740(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000234886000 CR4: 00000000001406e0 Stack: ffffffffa047a680 0000000000000000 ffff880158f6f598 ffff880158f6f680 ffff880158f6f680 ffff880234d11d00 ffff88023357f800 ffff880158f6f7d0 ffff880158f6f5b8 ffffffffa024660a ffff880158f6f5b8 ffffffffa02492ec Call Trace: [<ffffffffa024660a>] rpc_create_xprt+0x1a/0xb0 [sunrpc] [<ffffffffa02492ec>] ? xprt_create_transport+0x13c/0x240 [sunrpc] [<ffffffffa0246766>] rpc_create+0xc6/0x1a0 [sunrpc] [<ffffffffa038e695>] nfs_create_rpc_client+0xf5/0x140 [nfs] [<ffffffffa038f31a>] nfs_init_client+0x3a/0xd0 [nfs] [<ffffffffa038f22f>] nfs_get_client+0x25f/0x310 [nfs] [<ffffffffa025cef8>] ? rpc_ntop+0xe8/0x100 [sunrpc] [<ffffffffa047512c>] nfs3_set_ds_client+0xcc/0x100 [nfsv3] [<ffffffffa041fa10>] nfs4_pnfs_ds_connect+0x120/0x400 [nfsv4] [<ffffffffa03d41c7>] nfs4_ff_layout_prepare_ds+0xe7/0x330 [nfs_layout_flexfiles] [<ffffffffa03d1b1b>] ff_layout_pg_init_write+0xcb/0x280 [nfs_layout_flexfiles] [<ffffffffa03a14dc>] __nfs_pageio_add_request+0x12c/0x490 [nfs] [<ffffffffa03a1fa2>] nfs_pageio_add_request+0xc2/0x2a0 [nfs] [<ffffffffa03a0365>] ? nfs_pageio_init+0x75/0x120 [nfs] [<ffffffffa03a5b50>] nfs_do_writepage+0x120/0x270 [nfs] [<ffffffffa03a5d31>] nfs_writepage_locked+0x61/0xc0 [nfs] [<ffffffff813d4115>] ? __percpu_counter_add+0x55/0x70 [<ffffffffa03a6a9f>] nfs_wb_single_page+0xef/0x1c0 [nfs] [<ffffffff811ca4a3>] ? __dec_zone_page_state+0x33/0x40 [<ffffffffa0395b21>] nfs_launder_page+0x41/0x90 [nfs] [<ffffffff811baba0>] invalidate_inode_pages2_range+0x340/0x3a0 [<ffffffff811bac17>] invalidate_inode_pages2+0x17/0x20 [<ffffffffa039960e>] nfs_release+0x9e/0xb0 [nfs] [<ffffffffa0399570>] ? nfs_open+0x60/0x60 [nfs] [<ffffffffa0394dad>] nfs_file_release+0x3d/0x60 [nfs] [<ffffffff81226e6c>] __fput+0xdc/0x1e0 [<ffffffff81226fbe>] ____fput+0xe/0x10 [<ffffffff810bf2e4>] task_work_run+0xc4/0xe0 [<ffffffff810a4188>] do_exit+0x2e8/0xb30 [<ffffffff8102471c>] ? do_audit_syscall_entry+0x6c/0x70 [<ffffffff811464e6>] ? __audit_syscall_exit+0x1e6/0x280 [<ffffffff810a4a5f>] do_group_exit+0x3f/0xa0 [<ffffffff810a4ad4>] SyS_exit_group+0x14/0x20 [<ffffffff8179b76e>] system_call_fastpath+0x12/0x71 Which seems to be due to a call to utsname() when in a task exit context in order to determine the hostname to set in rpc_new_client(). In reality, what we want here is not the hostname of the current task, but the hostname that was used to set up the metadata server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* / mm: move most file-based accounting to the nodeMel Gorman2016-07-281-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | There are now a number of accounting oddities such as mapped file pages being accounted for on the node while the total number of file pages are accounted on the zone. This can be coped with to some extent but it's confusing so this patch moves the relevant file-based accounted. Due to throttling logic in the page allocator for reliable OOM detection, it is still necessary to track dirty and writeback pages on a per-zone basis. [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting] Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* NFS: Add nfs_commit_file()Anna Schumaker2016-05-171-0/+1
| | | | | | | | | Copy will use this to set up a commit request for a generic range. I don't want to allocate a new pagecache entry for the file, so I needed to change parts of the commit path to handle requests with a null wb_page. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov2016-04-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nfs: remove nfs_inode_dio_waitChristoph Hellwig2016-03-161-4/+0
| | | | | | | Just call inode_dio_wait directly instead of through a pointless wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: remove nfs4_file_fsyncChristoph Hellwig2016-03-161-1/+1
| | | | | | | | | | The only difference to nfs_file_fsync is the call to pnfs_sync_inode. But pnfs_sync_inode is just an inline that calls a pNFS layout driver method if CONFIG_PNFS is designed, and thus can be called just fine from the core NFS module. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Simplify nfs_request_add_commit_list() argumentsAnna Schumaker2016-01-211-1/+1
| | | | | | | | | I noticed that all the callers of this function pass cinfo->mds->list as an argument in addition to the cinfo structure itself. Let's get rid of the extra argument, since it doesn't seem to be adding anything. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Use wait_on_atomic_t() for unlock after readaheadBenjamin Coddington2016-01-071-7/+2
| | | | | | | | | | | | The use of wait_on_atomic_t() for waiting on I/O to complete before unlocking allows us to git rid of the NFS_IO_INPROGRESS flag, and thus the nfs_iocounter's flags member, and finally the nfs_iocounter altogether. The count of I/O is moved to the lock context, and the counter increment/decrement functions become simple enough to open-code. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> [Trond: Fix up conflict with existing function nfs_wait_atomic_killable()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge branch 'pnfs_generic'Trond Myklebust2016-01-041-1/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pnfs_generic: NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs NFSv4: List stateid information in the callback tracepoints NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1 pNFS: If we have to delay the layout callback, mark the layout for return NFSv4.1/pNFS: Add a helper to mark the layout as returned pNFS: Ensure nfs4_layoutget_prepare returns the correct error
| * NFS: Relax requirements in nfs_flush_incompatibleTrond Myklebust2015-12-311-0/+6
| | | | | | | | | | | | | | | | If two processes share the same credentials and NFSv4 open stateid, then allow them both to dirty the same page, even if their nfs_open_context differs. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS/pNFS: Fix up pNFS write reschedule layering violations and bugsTrond Myklebust2015-12-311-1/+0
| | | | | | | | | | | | | | | | | | | | The flexfiles layout in particular, seems to want to poke around in the O_DIRECT flags when retransmitting. This patch sets up an interface to allow it to call back into O_DIRECT to handle retransmission correctly. It also fixes a potential bug whereby we could change the behaviour of O_DIRECT if an error is already pending. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | nfs: handle request add failure properlyPeng Tao2015-12-281-0/+14
|/ | | | | | | | | | | | | | | When we fail to queue a read page to IO descriptor, we need to clean it up otherwise it is hanging around preventing nfs module from being removed. When we fail to queue a write page to IO descriptor, we need to clean it up and also save the failure status to open context. Then at file close, we can try to write pages back again and drop the page if it fails to writeback in .launder_page, which will be done in the next patch. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Adding stateid information to tracepointsOlga Kornievskaia2015-12-281-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Operations to which stateid information is added: close, delegreturn, open, read, setattr, layoutget, layoutcommit, test_stateid, write, lock, locku, lockt Format is "stateid=<seqid>:<crc32 hash stateid.other>", also "openstateid=", "layoutstateid=", and "lockstateid=" for open_file, layoutget, set_lock tracepoints. New function is added to internal.h, nfs_stateid_hash(), to compute the hash trace_nfs4_setattr() is moved from nfs4_do_setattr() to _nfs4_do_setattr() to get access to stateid. trace_nfs4_setattr and trace_nfs4_delegreturn are changed from INODE_EVENT to new event type, INODE_STATEID_EVENT which is same as INODE_EVENT but adds stateid information for locking tracepoints, moved trace_nfs4_set_lock() into _nfs4_do_setlk() to get access to stateid information, and removed trace_nfs4_lock_reclaim(), trace_nfs4_lock_expired() as they call into _nfs4_do_setlk() and both were previously same LOCK_EVENT type. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* sched/wait: Fix the signal handling fixPeter Zijlstra2015-12-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Jan Stancek reported that I wrecked things for him by fixing things for Vladimir :/ His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which should not be possible, however my previous patch made this possible by unconditionally checking signal_pending(). We cannot use current->state as was done previously, because the instruction after the store to that variable it can be changed. We must instead pass the initial state along and use that. Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers") Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Chris Mason <clm@fb.com> Tested-by: Jan Stancek <jstancek@redhat.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Chris Mason <clm@fb.com> Reviewed-by: Paul Turner <pjt@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: tglx@linutronix.de Cc: Oleg Nesterov <oleg@redhat.com> Cc: hpa@zytor.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* NFSv4: Respect the server imposed limit on how many changes we may cacheTrond Myklebust2015-09-071-1/+0
| | | | | | | | | | | The NFSv4 delegation spec allows the server to tell a client to limit how much data it cache after the file is closed. In return, the server guarantees enough free space to avoid ENOSPC situations, etc. Prior to this patch, we assumed we could always cache aggressively after close. Unfortunately, this causes problems with servers that set the limit to 0 and therefore do not offer any ENOSPC guarantees. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Use RPC functions for matching sockaddrsAnna Schumaker2015-08-171-4/+0
| | | | | | | | They already exist and do the exact same thing. Let's save ourselves several lines of code! Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1/pnfs: Fix atomicity of commit list updatesTrond Myklebust2015-08-101-5/+10
| | | | | | | | | | | pnfs_layout_mark_request_commit() needs to ensure that it adds the request to the commit list atomically with all the other updates in order to prevent corruption to buckets[ds_commit_idx].wlseg due to races with pnfs_generic_clear_request_commit(). Fixes: 338d00cfef07d ("pnfs: Refactor the *_layout_mark_request_commit...") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2015-07-281-0/+21
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - Fix a situation where the client uses the wrong (zero) stateid. - Fix a memory leak in nfs_do_recoalesce Bugfixes: - Plug a memory leak when ->prepare_layoutcommit fails - Fix an Oops in the NFSv4 open code - Fix a backchannel deadlock - Fix a livelock in sunrpc when sendmsg fails due to low memory availability - Don't revalidate the mapping if both size and change attr are up to date - Ensure we don't miss a file extension when doing pNFS - Several fixes to handle NFSv4.1 sequence operation status bits correctly - Several pNFS layout return bugfixes" * tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) nfs: Fix an oops caused by using other thread's stack space in ASYNC mode nfs: plug memory leak when ->prepare_layoutcommit fails SUNRPC: Report TCP errors to the caller sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. NFSv4.2: handle NFS-specific llseek errors NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce() NFS: Fix a memory leak in nfs_do_recoalesce NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised NFS: Don't revalidate the mapping if both size and change attr are up to date NFSv4/pnfs: Ensure we don't miss a file extension NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked SUNRPC: xprt_complete_bc_request must also decrement the free slot count SUNRPC: Fix a backchannel deadlock pNFS: Don't throw out valid layout segments pNFS: pnfs_roc_drain() fix a race with open pNFS: Fix races between return-on-close and layoutreturn. pNFS: pnfs_roc_drain should return 'true' when sleeping pNFS: Layoutreturn must invalidate all existing layout segments. ...
| * nfs: Fix an oops caused by using other thread's stack space in ASYNC modeKinglong Mee2015-07-281-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An oops caused by using other thread's stack space in sunrpc ASYNC sending thread. [ 9839.007187] ------------[ cut here ]------------ [ 9839.007923] kernel BUG at fs/nfs/nfs4xdr.c:910! [ 9839.008069] invalid opcode: 0000 [#1] SMP [ 9839.008069] Modules linked in: blocklayoutdriver rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm joydev iosf_mbi crct10dif_pclmul snd_timer crc32_pclmul crc32c_intel ghash_clmulni_intel snd soundcore ppdev pvpanic parport_pc i2c_piix4 serio_raw virtio_balloon parport acpi_cpufreq nfsd nfs_acl lockd grace auth_rpcgss sunrpc qxl drm_kms_helper virtio_net virtio_console virtio_blk ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi [ 9839.008069] CPU: 0 PID: 308 Comm: kworker/0:1H Not tainted 4.0.0-0.rc4.git1.3.fc23.x86_64 #1 [ 9839.008069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 9839.008069] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 9839.008069] task: ffff8800d8b4d8e0 ti: ffff880036678000 task.ti: ffff880036678000 [ 9839.008069] RIP: 0010:[<ffffffffa0339cc9>] [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4] [ 9839.008069] RSP: 0018:ffff88003667ba58 EFLAGS: 00010246 [ 9839.008069] RAX: 0000000000000000 RBX: 000000001fc15e18 RCX: ffff8800c0193800 [ 9839.008069] RDX: ffff8800e4ae3f24 RSI: 000000001fc15e2c RDI: ffff88003667bcd0 [ 9839.008069] RBP: ffff88003667ba58 R08: ffff8800d9173008 R09: 0000000000000003 [ 9839.008069] R10: ffff88003667bcd0 R11: 000000000000000c R12: 0000000000010000 [ 9839.008069] R13: ffff8800d9173350 R14: 0000000000000000 R15: ffff8800c0067b98 [ 9839.008069] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 9839.008069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9839.008069] CR2: 00007f988c9c8bb0 CR3: 00000000d99b6000 CR4: 00000000000407f0 [ 9839.008069] Stack: [ 9839.008069] ffff88003667bbc8 ffffffffa03412c5 00000000c6c55680 ffff880000000003 [ 9839.008069] 0000000000000088 00000010c6c55680 0001000000000002 ffffffff816e87e9 [ 9839.008069] 0000000000000000 00000000477290e2 ffff88003667bab8 ffffffff81327ba3 [ 9839.008069] Call Trace: [ 9839.008069] [<ffffffffa03412c5>] encode_attrs+0x435/0x530 [nfsv4] [ 9839.008069] [<ffffffff816e87e9>] ? inet_sendmsg+0x69/0xb0 [ 9839.008069] [<ffffffff81327ba3>] ? selinux_socket_sendmsg+0x23/0x30 [ 9839.008069] [<ffffffff8164c1df>] ? do_sock_sendmsg+0x9f/0xc0 [ 9839.008069] [<ffffffff8164c278>] ? kernel_sendmsg+0x58/0x70 [ 9839.008069] [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc] [ 9839.008069] [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc] [ 9839.008069] [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4] [ 9839.008069] [<ffffffffa03419a5>] encode_open+0x2d5/0x340 [nfsv4] [ 9839.008069] [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4] [ 9839.008069] [<ffffffffa011ab89>] ? xdr_encode_opaque+0x19/0x20 [sunrpc] [ 9839.008069] [<ffffffffa0339cfb>] ? encode_string+0x2b/0x40 [nfsv4] [ 9839.008069] [<ffffffffa0341bf3>] nfs4_xdr_enc_open+0xb3/0x140 [nfsv4] [ 9839.008069] [<ffffffffa0110a4c>] rpcauth_wrap_req+0xac/0xf0 [sunrpc] [ 9839.008069] [<ffffffffa01017db>] call_transmit+0x18b/0x2d0 [sunrpc] [ 9839.008069] [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc] [ 9839.008069] [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc] [ 9839.008069] [<ffffffffa010caa0>] __rpc_execute+0x90/0x460 [sunrpc] [ 9839.008069] [<ffffffffa010ce85>] rpc_async_schedule+0x15/0x20 [sunrpc] [ 9839.008069] [<ffffffff810b452b>] process_one_work+0x1bb/0x410 [ 9839.008069] [<ffffffff810b47d3>] worker_thread+0x53/0x470 [ 9839.008069] [<ffffffff810b4780>] ? process_one_work+0x410/0x410 [ 9839.008069] [<ffffffff810b4780>] ? process_one_work+0x410/0x410 [ 9839.008069] [<ffffffff810ba7b8>] kthread+0xd8/0xf0 [ 9839.008069] [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180 [ 9839.008069] [<ffffffff81786418>] ret_from_fork+0x58/0x90 [ 9839.008069] [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180 [ 9839.008069] Code: 00 00 48 c7 c7 21 fa 37 a0 e8 94 1c d6 e0 c6 05 d2 17 05 00 01 8b 03 eb d7 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 89 f3 [ 9839.008069] RIP [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4] [ 9839.008069] RSP <ffff88003667ba58> [ 9839.071114] ---[ end trace cc14c03adb522e94 ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | writeback: move backing_dev_info->bdi_stat[] into bdi_writebackTejun Heo2015-06-021-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->bdi_stat[] into wb. * enum bdi_stat_item is renamed to wb_stat_item and the prefix of all enums is changed from BDI_ to WB_. * BDI_STAT_BATCH() -> WB_STAT_BATCH() * [__]{add|inc|dec|sum}_wb_stat(bdi, ...) -> [__]{add|inc}_wb_stat(wb, ...) * bdi_stat[_error]() -> wb_stat[_error]() * bdi_writeout_inc() -> wb_writeout_inc() * stat init is moved to bdi_wb_init() and bdi_wb_exit() is added and frees stat. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[] introducing no behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* NFS: Add attribute update barriers to NFS writebacksTrond Myklebust2015-03-011-0/+1
| | | | | | | | Ensure that other operations that race with our write RPC calls cannot revert the file size updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* nfs: Provide and use helper functions for marking a page as unstableTom Haynes2015-02-131-0/+13
| | | | | Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-blockLinus Torvalds2015-02-121-1/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull backing device changes from Jens Axboe: "This contains a cleanup of how the backing device is handled, in preparation for a rework of the life time rules. In this part, the most important change is to split the unrelated nommu mmap flags from it, but also removing a backing_dev_info pointer from the address_space (and inode), and a cleanup of other various minor bits. Christoph did all the work here, I just fixed an oops with pages that have a swap backing. Arnd fixed a missing export, and Oleg killed the lustre backing_dev_info from staging. Last patch was from Al, unexporting parts that are now no longer needed outside" * 'for-3.20/bdi' of git://git.kernel.dk/linux-block: Make super_blocks and sb_lock static mtd: export new mtd_mmap_capabilities fs: make inode_to_bdi() handle NULL inode staging/lustre/llite: get rid of backing_dev_info fs: remove default_backing_dev_info fs: don't reassign dirty inodes to default_backing_dev_info nfs: don't call bdi_unregister ceph: remove call to bdi_unregister fs: remove mapping->backing_dev_info fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info nilfs2: set up s_bdi like the generic mount_bdev code block_dev: get bdev inode bdi directly from the block device block_dev: only write bdev inode on close fs: introduce f_op->mmap_capabilities for nommu mmap support fs: kill BDI_CAP_SWAP_BACKED fs: deduplicate noop_backing_dev_info
| * nfs: don't call bdi_unregisterChristoph Hellwig2015-01-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | bdi_destroy already does all the work, and if we delay freeing the anon bdev we can get away with just that single call. Addintionally remove the call during mount failure, as deactivate_super_locked will already call ->kill_sb and clean up the bdi for us. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
* | NFSv4: Ensure we reference the inode for return-on-close in delegreturnTrond Myklebust2015-02-051-1/+21
| | | | | | | | | | | | | | | | | | | | If we have to do a return-on-close in the delegreturn code, then we must ensure that the inode and super block remain referenced. Cc: Peng Tao <tao.peng@primarydata.com> Cc: stable@vger.kernel.org # 3.17.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Peng Tao <tao.peng@primarydata.com>
* | Merge branch 'flexfiles'Trond Myklebust2015-02-031-5/+26
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * flexfiles: (53 commits) pnfs: lookup new lseg at lseg boundary nfs41: .init_read and .init_write can be called with valid pg_lseg pnfs: Update documentation on the Layout Drivers pnfs/flexfiles: Add the FlexFile Layout Driver nfs: count DIO good bytes correctly with mirroring nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags nfs/flexfiles: send layoutreturn before freeing lseg nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE nfs41: allow async version layoutreturn nfs41: add range to layoutreturn args pnfs: allow LD to ask to resend read through pnfs nfs: add nfs_pgio_current_mirror helper nfs: only reset desc->pg_mirror_idx when mirroring is supported nfs41: add a debug warning if we destroy an unempty layout pnfs: fail comparison when bucket verifier not set nfs: mirroring support for direct io nfs: add mirroring support to pgio layer pnfs: pass ds_commit_idx through the commit path ... Conflicts: fs/nfs/pnfs.c fs/nfs/pnfs.h
| * | nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writesPeng Tao2015-02-031-0/+1
| | | | | | | | | | | | | | | | | | To allow pnfs LD to ask direct writes to be resend. Signed-off-by: Peng Tao <tao.peng@primarydata.com>