summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | nfs: fix DIO good bytes calculationPeng Tao2015-04-231-12/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For direct read that has IO size larger than rsize, we'll split it into several READ requests and nfs_direct_good_bytes() would count completed bytes incorrectly by eating last zero count reply. Fix it by handling mirror and non-mirror cases differently such that we only count mirrored writes differently. This fixes 5fadeb47("nfs: count DIO good bytes correctly with mirroring"). Reported-by: Jean Spector <jean@primarydata.com> Cc: <stable@vger.kernel.org> # v3.19+ Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | nfs: Fetch MOUNTED_ON_FILEID when updating an inodeAnna Schumaker2015-04-232-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2ef47eb1 (NFS: Fix use of nfs_attr_use_mounted_on_fileid()) was a good start to fixing a circular directory structure warning for NFS v4 "junctioned" mountpoints. Unfortunately, further testing continued to generate this error. My server is configured like this: anna@nfsd ~ % df Filesystem Size Used Avail Use% Mounted on /dev/vda1 9.1G 2.0G 6.5G 24% / /dev/vdc1 1014M 33M 982M 4% /exports /dev/vdc2 1014M 33M 982M 4% /exports/vol1 /dev/vdc3 1014M 33M 982M 4% /exports/vol1/vol2 anna@nfsd ~ % cat /etc/exports /exports/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/vol2 *(rw,async,no_subtree_check,no_root_squash) I've been running chown across the entire mountpoint twice in a row to hit this problem. The first run succeeds, but the second one fails with the circular directory warning along with: anna@client ~ % dmesg [Apr 3 14:28] NFS: server 192.168.100.204 error: fileid changed fsid 0:39: expected fileid 0x100080, got 0x80 WHere 0x80 is the mountpoint's fileid and 0x100080 is the mounted-on fileid. This patch fixes the issue by requesting an updated mounted-on fileid from the server during nfs_update_inode(), and then checking that the fileid stored in the nfs_inode matches either the fileid or mounted-on fileid returned by the server. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | nfs: fix high load average due to callback thread sleepingJeff Layton2015-04-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chuck pointed out a problem that crept in with commit 6ffa30d3f734 (nfs: don't call blocking operations while !TASK_RUNNING). Linux counts tasks in uninterruptible sleep against the load average, so this caused the system's load average to be pinned at at least 1 when there was a NFSv4.1+ mount active. Not a huge problem, but it's probably worth fixing before we get too many complaints about it. This patch converts the code back to use TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop iteration. In practice no one should really be signalling this thread at all, so I think this is reasonably safe. With this change, there's also no need to game the hung task watchdog so we can also convert the schedule_timeout call back to a normal schedule. Cc: <stable@vger.kernel.org> Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: commit 6ffa30d3f734 (“nfs: don't call blocking . . .”) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFS: Reduce time spent holding the i_mutex during fallocate()Anna Schumaker2015-04-232-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the very least, we should not be taking the i_mutex until after checking if the server even supports ALLOCATE or DEALLOCATE, allowing v4.0 or v4.1 to exit without potentially waiting on a lock. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFS: Don't zap caches on fallocate()Anna Schumaker2015-04-234-10/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a GETATTR to the end of ALLOCATE and DEALLOCATE operations so we can set the updated inode size and change attribute directly. DEALLOCATE will still need to release pagecache pages, so nfs42_proc_deallocate() now calls truncate_pagecache_range() before contacting the server. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFS: Block new writes while syncing data in nfs_getattr()Trond Myklebust2015-03-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4.1/pnfs: Separate out metadata and data consistency for pNFSTrond Myklebust2015-03-279-8/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LAYOUTCOMMIT operation means different things to different layout types. For blocks and objects, it is both a data and metadata consistency operation. For files and flexfiles, it is only a metadata consistency operation. This patch separates out the 2 cases, allowing the files/flexfiles layout drivers to optimise away the data consistency calls to layoutcommit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4.1/pnfs: Ensure we send layoutcommit before return-on-closeTrond Myklebust2015-03-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must not send a close or delegreturn that would result in a return-on-close of the layout without ensuring that we've also sent the necessary layoutcommit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4.1/pnfs: Ensure that writes respect the O_SYNC flag when doing O_DIRECTTrond Myklebust2015-03-273-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the caller does not specify the O_SYNC flag, then it is legitimate to return from O_DIRECT without doing a pNFS layoutcommit operation. However if the file is opened O_DIRECT|O_SYNC then we'd better get it right. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4: Truncating file opens should also sync O_DIRECT writesTrond Myklebust2015-03-272-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't just want to sync out buffered writes, but also O_DIRECT ones. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFS: File unlock needs to be a metadata synchronisation pointTrond Myklebust2015-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | File unlock needs to update both data and metadata on the NFS server in order to act as a synchronisation point for other clients. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFS: Add a helper to sync both O_DIRECT and buffered writesTrond Myklebust2015-03-271-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Then apply it to nfs_setattr() and nfs_getattr(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4.1/pnfs: Refactor pnfs_set_layoutcommit()Trond Myklebust2015-03-274-42/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pnfs_set_layoutcommit() and pnfs_commit_set_layoutcommit() are 100% identical except for the function arguments. Refactor to eliminate the difference. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4.1/pnfs: Fix setting of layoutcommit last write byteTrond Myklebust2015-03-271-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the NFS_INO_LAYOUTCOMMIT flag was unset, then we _must_ ensure that we also reset the last write byte (lwb) for that layout. The current code depends on us clearing the lwb when we clear NFS_INO_LAYOUTCOMMIT, which is not the case when we call pnfs_clear_layoutcommit(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4: Return the delegation before returning the layout in evict_inode()Trond Myklebust2015-03-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minor optimisation for the case where the layout has return-on-close enabled. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4: Allow tracing of NFSv4 fsync callsTrond Myklebust2015-03-272-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I appear to have missed this when adding the ftrace probes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFS: Fix free_deveiceid -> free_deviceidTrond Myklebust2015-03-272-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make it easier to grep for these functions by name. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4.1: Don't cache deviceids that have no notificationsTrond Myklebust2015-03-273-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The spec says that once all layouts that reference a given deviceid have been returned, then we are only allowed to continue to cache the deviceid if the metadata server supports notifications. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4.1: Allow getdeviceinfo to return notification info back to callerTrond Myklebust2015-03-272-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are only allowed to cache deviceinfo if the server supports notifications and actually promises to call us back when changes occur. Right now, we request those notifications, but then we don't check the server's reply. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4.1: Cleanup - don't opencode nfs4_put_deviceid_node()Trond Myklebust2015-03-271-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There really is no reason to do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4.1: Convert pNFS deviceid to use kfree_rcu()Trond Myklebust2015-03-276-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use of synchronize_rcu() when unmounting and potentially freeing a lot of deviceids is problematic. There really is no reason why we can't just use kfree_rcu() here. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | nfs: clean up nfs_direct_IOPeng Tao2015-03-131-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This follows up "nfs: fix dio deadlock when O_DIRECT flag is flipped" and removes the unnecessary CONFIG_NFS_SWAP switch. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | NFSv4: Append delegations to the per-client list instead of prependingTrond Myklebust2015-03-121-1/+1
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do so on the assumption that for most use cases, that list will turn into a more or less LRU-ordered list, and so the list traversals in nfs_client_return_marked_delegations() are likely to be shorter before hitting a candidate to return. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2015-04-26285-1520/+1512
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
| * | | | | RCU pathwalk breakage when running into a symlink overmounting somethingAl Viro2015-04-241-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling unlazy_walk() in walk_component() and do_last() when we find a symlink that needs to be followed doesn't acquire a reference to vfsmount. That's fine when the symlink is on the same vfsmount as the parent directory (which is almost always the case), but it's not always true - one _can_ manage to bind a symlink on top of something. And in such cases we end up with excessive mntput(). Cc: stable@vger.kernel.org # since 2.6.39 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | direct-io: only inc/dec inode->i_dio_count for file systemsJens Axboe2015-04-248-32/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_blockdev_direct_IO() increments and decrements the inode ->i_dio_count for each IO operation. It does this to protect against truncate of a file. Block devices don't need this sort of protection. For a capable multiqueue setup, this atomic int is the only shared state between applications accessing the device for O_DIRECT, and it presents a scaling wall for that. In my testing, as much as 30% of system time is spent incrementing and decrementing this value. A mixed read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with better latencies too. Before: clat percentiles (usec): | 1.00th=[ 33], 5.00th=[ 34], 10.00th=[ 34], 20.00th=[ 34], | 30.00th=[ 34], 40.00th=[ 34], 50.00th=[ 35], 60.00th=[ 35], | 70.00th=[ 35], 80.00th=[ 35], 90.00th=[ 37], 95.00th=[ 80], | 99.00th=[ 98], 99.50th=[ 151], 99.90th=[ 155], 99.95th=[ 155], | 99.99th=[ 165] After: clat percentiles (usec): | 1.00th=[ 95], 5.00th=[ 108], 10.00th=[ 129], 20.00th=[ 149], | 30.00th=[ 155], 40.00th=[ 161], 50.00th=[ 167], 60.00th=[ 171], | 70.00th=[ 177], 80.00th=[ 185], 90.00th=[ 201], 95.00th=[ 270], | 99.00th=[ 390], 99.50th=[ 398], 99.90th=[ 418], 99.95th=[ 422], | 99.99th=[ 438] In other setups, Robert Elliott reported seeing good performance improvements: https://lkml.org/lkml/2015/4/3/557 The more applications accessing the device, the worse it gets. Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells do_blockdev_direct_IO() that it need not worry about incrementing or decrementing the inode i_dio_count for this caller. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Elliott, Robert (Server Storage) <elliott@hp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | fs/9p: fix readdir()Johannes Berg2015-04-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Al Viro's IOV changes broke 9p readdir() because the new code didn't abort the read when it returned nothing. The original code checked if the combined error/length was <= 0 but in the new code that accidentally got changed to just an error check. Add back the return from the function when nothing is read. Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: e1200fe68f20 ("9p: switch p9_client_read() to passing struct iov_iter *") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | VFS: assorted d_backing_inode() annotationsDavid Howells2015-04-153-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | VFS: fs/inode.c helpers: d_inode() annotationsDavid Howells2015-04-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | these should be used on objects already in top layer Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | VFS: fs/cachefiles: d_backing_inode() annotationsDavid Howells2015-04-156-62/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | VFS: fs library helpers: d_inode() annotationsDavid Howells2015-04-152-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | library helpers called by filesystem drivers on their own inodes Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | VFS: assorted weird filesystems: d_inode() annotationsDavid Howells2015-04-153-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells2015-04-15266-1351/+1350
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | VFS: Cachefiles should perform fs modifications on the top layer onlyDavid Howells2015-04-152-28/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cachefiles should perform fs modifications (eg. vfs_unlink()) on the top layer only and should not attempt to alter the lower layer. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | Merge branch 'for-4.1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2015-04-2410-77/+35
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "A quiet cycle this time; this is basically entirely bugfixes. The few that aren't cc'd to stable are cleanup or seemed unlikely to affect anyone much" * 'for-4.1' of git://linux-nfs.org/~bfields/linux: uapi: Remove kernel internal declaration nfsd: fix nsfd startup race triggering BUG_ON nfsd: eliminate NFSD_DEBUG nfsd4: fix READ permission checking nfsd4: disallow SEEK with special stateids nfsd4: disallow ALLOCATE with special stateids nfsd: add NFSEXP_PNFS to the exflags array nfsd: Remove duplicate macro define for max sec label length nfsd: allow setting acls with unenforceable DENYs nfsd: NFSD_FAULT_INJECTION depends on DEBUG_FS nfsd: remove unused status arg to nfsd4_cleanup_open_state nfsd: remove bogus setting of status in nfsd4_process_open2 NFSD: Use correct reply size calculating function NFSD: Using path_equal() for checking two paths
| * | | | | | nfsd: fix nsfd startup race triggering BUG_ONGiuseppe Cantavenera2015-04-211-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfsd triggered a BUG_ON in net_generic(...) when rpc_pipefs_event(...) in fs/nfsd/nfs4recover.c was called before assigning ntfsd_net_id. The following was observed on a MIPS 32-core processor: kernel: Call Trace: kernel: [<ffffffffc00bc5e4>] rpc_pipefs_event+0x7c/0x158 [nfsd] kernel: [<ffffffff8017a2a0>] notifier_call_chain+0x70/0xb8 kernel: [<ffffffff8017a4e4>] __blocking_notifier_call_chain+0x4c/0x70 kernel: [<ffffffff8053aff8>] rpc_fill_super+0xf8/0x1a0 kernel: [<ffffffff8022204c>] mount_ns+0xb4/0xf0 kernel: [<ffffffff80222b48>] mount_fs+0x50/0x1f8 kernel: [<ffffffff8023dc00>] vfs_kern_mount+0x58/0xf0 kernel: [<ffffffff802404ac>] do_mount+0x27c/0xa28 kernel: [<ffffffff80240cf0>] SyS_mount+0x98/0xe8 kernel: [<ffffffff80135d24>] handle_sys64+0x44/0x68 kernel: kernel: Code: 0040f809 00000000 2e020001 <00020336> 3c12c00d 3c02801a de100000 6442eb98 0040f809 kernel: ---[ end trace 7471374335809536 ]--- Fixed this behaviour by calling register_pernet_subsys(&nfsd_net_ops) before registering rpc_pipefs_event(...) with the notifier chain. Signed-off-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com> Signed-off-by: Lorenzo Restelli <lorenzo.restelli.ext@nokia.com> Reviewed-by: Kinlong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: eliminate NFSD_DEBUGMark Salter2015-04-213-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f895b252d4edf ("sunrpc: eliminate RPC_DEBUG") introduced use of IS_ENABLED() in a uapi header which leads to a build failure for userspace apps trying to use <linux/nfsd/debug.h>: linux/nfsd/debug.h:18:15: error: missing binary operator before token "(" #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) ^ Since this was only used to define NFSD_DEBUG if CONFIG_SUNRPC_DEBUG is enabled, replace instances of NFSD_DEBUG with CONFIG_SUNRPC_DEBUG. Cc: stable@vger.kernel.org Fixes: f895b252d4edf "sunrpc: eliminate RPC_DEBUG" Signed-off-by: Mark Salter <msalter@redhat.com> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd4: fix READ permission checkingJ. Bruce Fields2015-04-211-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case we already have a struct file (derived from a stateid), we still need to do permission-checking; otherwise an unauthorized user could gain access to a file by sniffing or guessing somebody else's stateid. Cc: stable@vger.kernel.org Fixes: dc97618ddda9 "nfsd4: separate splice and readv cases" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd4: disallow SEEK with special stateidsJ. Bruce Fields2015-04-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the client uses a special stateid then we'll pass a NULL file to vfs_llseek. Fixes: 24bab491220f " NFSD: Implement SEEK" Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: stable@vger.kernel.org Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd4: disallow ALLOCATE with special stateidsJ. Bruce Fields2015-04-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vfs_fallocate will hit a NULL dereference if the client tries an ALLOCATE or DEALLOCATE with a special stateid. Fix that. (We also depend on the open to have broken any conflicting leases or delegations for us.) (If it turns out we need to allow special stateid's then we could do a temporary open here in the special-stateid case, as we do for read and write. For now I'm assuming it's not necessary.) Fixes: 95d871f03cae "nfsd: Add ALLOCATE support" Cc: stable@vger.kernel.org Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: add NFSEXP_PNFS to the exflags arrayChristoph Hellwig2015-04-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: Remove duplicate macro define for max sec label lengthKinglong Mee2015-03-313-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS4_MAXLABELLEN has defined for sec label max length, use it directly. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: allow setting acls with unenforceable DENYsJ. Bruce Fields2015-03-311-49/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've been refusing ACLs that DENY permissions that we can't effectively deny. (For example, we can't deny permission to read attributes.) Andreas points out that any DENY of Window's "read", "write", or "modify" permissions would trigger this. That would be annoying. So maybe we should be a little less paranoid, and ignore entirely the permissions that are meaningless to us. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: NFSD_FAULT_INJECTION depends on DEBUG_FSChengyu Song2015-03-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSD_FAULT_INJECTION depends on DEBUG_FS, otherwise the debugfs_create_* interface may return unexpected error -ENODEV, and cause system crash. Signed-off-by: Chengyu Song <csong84@gatech.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: remove unused status arg to nfsd4_cleanup_open_stateJeff Layton2015-03-313-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: remove bogus setting of status in nfsd4_process_open2Jeff Layton2015-03-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | status is always reset after this (and it doesn't make much sense there anyway). Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | NFSD: Use correct reply size calculating functionKinglong Mee2015-03-311-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ALLOCATE/DEALLOCATE only reply one status value to client, so, using nfsd4_only_status_rsize for reply size calculating. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | NFSD: Using path_equal() for checking two pathsKinglong Mee2015-03-312-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | | | Merge branch 'for-linus-4.1' of ↵Linus Torvalds2015-04-2446-897/+2113
|\ \ \ \ \ \ \ | | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "I've been running these through a longer set of load tests because my commits change the free space cache writeout. It fixes commit stalls on large filesystems (~20T space used and up) that we have been triggering here. We were seeing new writers blocked for 10 seconds or more during commits, which is far from good. Josef and I fixed up ENOSPC aborts when deleting huge files (3T or more), that are triggered because our metadata reservations were not properly accounting for crcs and were not replenishing during the truncate. Also in this series, a number of qgroup fixes from Fujitsu and Dave Sterba collected most of the pending cleanups from the list" * 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (93 commits) btrfs: quota: Update quota tree after qgroup relationship change. btrfs: quota: Automatically update related qgroups or mark INCONSISTENT flags when assigning/deleting a qgroup relations. btrfs: qgroup: clear STATUS_FLAG_ON in disabling quota. btrfs: Update btrfs qgroup status item when rescan is done. btrfs: qgroup: Fix dead judgement on qgroup_rescan_leaf() return value. btrfs: Don't allow subvolid >= (1 << BTRFS_QGROUP_LEVEL_SHIFT) to be created btrfs: Check qgroup level in kernel qgroup assign. btrfs: qgroup: allow to remove qgroup which has parent but no child. btrfs: qgroup: return EINVAL if level of parent is not higher than child's. btrfs: qgroup: do a reservation in a higher level. Btrfs: qgroup, Account data space in more proper timings. Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use. Btrfs: qgroup: free reserved in exceeding quota. Btrfs: qgroup: cleanup, remove an unsued parameter in btrfs_create_qgroup(). btrfs: qgroup: fix limit args override whole limit struct btrfs: qgroup: update limit info in function btrfs_run_qgroups(). btrfs: qgroup: consolidate the parameter of fucntion update_qgroup_limit_item(). btrfs: qgroup: update qgroup in memory at the same time when we update it in btree. btrfs: qgroup: inherit limit info from srcgroup in creating snapshot. btrfs: Support busy loop of write and delete ...
| * | | | | | btrfs: quota: Update quota tree after qgroup relationship change.Qu Wenruo2015-04-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous patch modified the in memory struct but it's not written in quota tree until next commit. So user will still get old data using "btrfs qgroup show" after assign/remove. This patch will call btrfs_run_qgroups in assign ioctl so it will be updated to in memory quota trees and user will get up-to-date results. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>