summaryrefslogtreecommitdiffstats
path: root/fs/locks.c
Commit message (Collapse)AuthorAgeFilesLines
* Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2016-12-241-1/+1
| | | | | | | | | | | | | This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* locking, fs/locks: Add missing file_sem locksPeter Zijlstra2016-10-181-0/+6
| | | | | | | | | | | | | | | | | | I overlooked a few code-paths that can lead to locks_delete_global_locks(). Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Bruce Fields <bfields@fieldses.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-fsdevel@vger.kernel.org Cc: syzkaller <syzkaller@googlegroups.com> Link: http://lkml.kernel.org/r/20161008081228.GF3142@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2016-10-101-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning
| * fs: Replace current_fs_time() with current_time()Deepa Dinamani2016-09-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | current_fs_time() uses struct super_block* as an argument. As per Linus's suggestion, this is changed to take struct inode* as a parameter instead. This is because the function is primarily meant for vfs inode timestamps. Also the function was renamed as per Arnd's suggestion. Change all calls to current_fs_time() to use the new current_time() function instead. current_fs_time() will be deleted. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'work.misc' of ↵Linus Torvalds2016-10-101-23/+30
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...
| * | vfs: do get_write_access() on upper layer of overlayfsMiklos Szeredi2016-09-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with writecount is: we want consistent handling of it for underlying filesystems as well as overlayfs. Making sure i_writecount is correct on all layers is difficult. Instead this patch makes sure that when write access is acquired, it's always done on the underlying writable layer (called the upper layer). We must also make sure to look at the writecount on this layer when checking for conflicting leases. Open for write already updates the upper layer's writecount. Leaving only truncate. For truncate copy up must happen before get_write_access() so that the writecount is updated on the upper layer. Problem with this is if something fails after that, then copy-up was done needlessly. E.g. if break_lease() was interrupted. Probably not a big deal in practice. Another interesting case is if there's a denywrite on a lower file that is then opened for write or truncated. With this patch these will succeed, which is somewhat counterintuitive. But I think it's still acceptable, considering that the copy-up does actually create a different file, so the old, denywrite mapping won't be touched. On non-overlayfs d_real() is an identity function and d_real_inode() is equivalent to d_inode() so this patch doesn't change behavior in that case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Jeff Layton <jlayton@poochiereds.net> Cc: "J. Bruce Fields" <bfields@fieldses.org>
| * | locks: fix file locking on overlayfsMiklos Szeredi2016-09-161-22/+28
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows flock, posix locks, ofd locks and leases to work correctly on overlayfs. Instead of using the underlying inode for storing lock context use the overlay inode. This allows locks to be persistent across copy-up. This is done by introducing locks_inode() helper and using it instead of file_inode() to get the inode in locking code. For non-overlayfs the two are equivalent, except for an extra pointer dereference in locks_inode(). Since lock operations are in "struct file_operations" we must also make sure not to call underlying filesystem's lock operations. Introcude a super block flag MS_NOREMOTELOCK to this effect. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Jeff Layton <jlayton@poochiereds.net> Cc: "J. Bruce Fields" <bfields@fieldses.org>
* | Merge tag 'locks-v4.9-1' of git://git.samba.org/jlayton/linuxLinus Torvalds2016-10-041-3/+18
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | Pull file locking updates from Jeff Layton: "Only a single patch from Nikolay this cycle, with a small change to better handle /proc/locks in a containerized host" * tag 'locks-v4.9-1' of git://git.samba.org/jlayton/linux: locks: Filter /proc/locks output on proc pid ns
| * | locks: Filter /proc/locks output on proc pid nsNikolay Borisov2016-08-181-3/+18
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On busy container servers reading /proc/locks shows all the locks created by all clients. This can cause large latency spikes. In my case I observed lsof taking up to 5-10 seconds while processing around 50k locks. Fix this by limiting the locks shown only to those created in the same pidns as the one the proc fs was mounted in. When reading /proc/locks from the init_pid_ns proc instance then perform no filtering [ jlayton: reformat comments for 80 columns ] Signed-off-by: Nikolay Borisov <kernel@kyup.com> Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
* | fs/locks: Use percpu_down_read_preempt_disable()Peter Zijlstra2016-09-221-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid spurious preemption. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: tj@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | fs/locks: Replace lg_local with a per-cpu spinlockPeter Zijlstra2016-09-221-18/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Oleg suggested, replace file_lock_list with a structure containing the hlist head and a spinlock. This completely removes the lglock from fs/locks. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: tj@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | fs/locks: Replace lg_global with a percpu-rwsemPeter Zijlstra2016-09-221-0/+21
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the global part of the lglock with a percpu-rwsem. Since fcl_lock is a spinlock and itself nests under i_lock, which too is a spinlock we cannot acquire sleeping locks at locks_{insert,remove}_global_locks(). We can however wrap all fcl_lock acquisitions with percpu_down_read such that all invocations of locks_{insert,remove}_global_locks() have that read lock held. This allows us to replace the lg_global part of the lglock with the write side of the rwsem. In the absense of writers, percpu_{down,up}_read() are free of atomic instructions. This further avoids the very long preempt-disable regions caused by lglock on larger machines. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: tj@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locks: use file_inode()Miklos Szeredi2016-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | (Another one for the f_path debacle.) ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask. The reason is that generic_add_lease() used filp->f_path.dentry->inode while all the others use file_inode(). This makes a difference for files opened on overlayfs since the former will point to the overlay inode the latter to the underlying inode. So generic_add_lease() added the lease to the overlay inode and generic_delete_lease() removed it from the underlying inode. When the file was released the lease remained on the overlay inode's lock list, resulting in use after free. Reported-by: Eryu Guan <eguan@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* wrappers for ->i_mutex accessAl Viro2016-01-221-3/+3
| | | | | | | | | | | parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'work.copy_file_range' of ↵Linus Torvalds2016-01-121-13/+9
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs copy_file_range updates from Al Viro: "Several series around copy_file_range/CLONE" * 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: use new dedupe data function pointer vfs: hoist the btrfs deduplication ioctl to the vfs vfs: wire up compat ioctl for CLONE/CLONE_RANGE cifs: avoid unused variable and label nfsd: implement the NFSv4.2 CLONE operation nfsd: Pass filehandle to nfs4_preprocess_stateid_op() vfs: pull btrfs clone API to vfs layer locks: new locks_mandatory_area calling convention vfs: Add vfs_copy_file_range() support for pagecache copies btrfs: add .copy_file_range file operation x86: add sys_copy_file_range to syscall tables vfs: add copy_file_range syscall and vfs helper
| * locks: new locks_mandatory_area calling conventionChristoph Hellwig2015-12-071-13/+9
| | | | | | | | | | | | | | | | | | | | Pass a loff_t end for the last byte instead of the 32-bit count parameter to allow full file clones even on 32-bit architectures. While we're at it also simplify the read/write selection. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | locks: rename __posix_lock_file to posix_lock_inodeJeff Layton2016-01-081-5/+6
| | | | | | | | | | | | | | ...a more descriptive name and we can drop the double underscore prefix. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
* | locks: prink more detail when there are leaked locksJeff Layton2016-01-081-4/+29
| | | | | | | | | | | | | | | | | | Right now, we just get WARN_ON_ONCE, which is not particularly helpful. Have it dump some info about the locks and the inode to make it easier to track down leaked locks in the future. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
* | locks: pass inode pointer to locks_free_lock_contextJeff Layton2016-01-081-1/+3
| | | | | | | | | | | | | | ...so we can print information about it if there are leaked locks. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
* | locks: sprinkle some tracepoints around the file locking codeJeff Layton2016-01-081-3/+9
| | | | | | | | | | | | | | | | | | Add some tracepoints around the POSIX locking code. These were useful when tracking down problems when handling the race between setlk and close. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
* | locks: don't check for race with close when setting OFD lockJeff Layton2016-01-081-6/+10
| | | | | | | | | | | | | | | | | | We don't clean out OFD locks on close(), so there's no need to check for a race with them here. They'll get cleaned out at the same time that flock locks are. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
* | locks: fix unlock when fcntl_setlk races with a closeJeff Layton2016-01-071-21/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dmitry reported that he was able to reproduce the WARN_ON_ONCE that fires in locks_free_lock_context when the flc_posix list isn't empty. The problem turns out to be that we're basically rebuilding the file_lock from scratch in fcntl_setlk when we discover that the setlk has raced with a close. If the l_whence field is SEEK_CUR or SEEK_END, then we may end up with fl_start and fl_end values that differ from when the lock was initially set, if the file position or length of the file has changed in the interim. Fix this by just reusing the same lock request structure, and simply override fl_type value with F_UNLCK as appropriate. That ensures that we really are unlocking the lock that was initially set. While we're there, make sure that we do pop a WARN_ON_ONCE if the removal ever fails. Also return -EBADF in this event, since that's what we would have returned if the close had happened earlier. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Fixes: c293621bbf67 (stale POSIX lock handling) Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
* | fs: make locks.c explicitly non-modularPaul Gortmaker2015-12-181-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Kconfig currently controlling compilation of this code is: config FILE_LOCKING bool "Enable POSIX file locking API" if EXPERT ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modularity so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering gets bumped to one level earlier when we use the more appropriate fs_initcall here. However we've made similar changes before without any fallout and none is expected here either. Cc: Jeff Layton <jlayton@poochiereds.net> Acked-by: Jeff Layton <jlayton@poochiereds.net> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* | locks: use list_first_entry_or_null()Geliang Tang2015-11-181-6/+4
| | | | | | | | | | | | | | Simplify the code with list_first_entry_or_null(). Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* | locks: Allow disabling mandatory locking at compile timeJeff Layton2015-11-161-0/+2
|/ | | | | | | | | | | | | | | | | Mandatory locking appears to be almost unused and buggy and there appears no real interest in doing anything with it. Since effectively no one uses the code and since the code is buggy let's allow it to be disabled at compile time. I would just suggest removing the code but undoubtedly that will break some piece of userspace code somewhere. For the distributions that don't care about this piece of code this gives a nice starting point to make mandatory locking go away. Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jeff Layton <jeff.layton@primarydata.com> Cc: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: cleanup posix_lock_inode_wait and flock_lock_inode_waitBenjamin Coddington2015-10-221-6/+3
| | | | | | | All callers use locks_lock_inode_wait() instead. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* Move locks API users to locks_lock_inode_wait()Benjamin Coddington2015-10-221-1/+1
| | | | | | | | | Instead of having users check for FL_POSIX or FL_FLOCK to call the correct locks API function, use the check within locks_lock_inode_wait(). This allows for some later cleanup. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: introduce locks_lock_inode_wait()Benjamin Coddington2015-10-221-0/+24
| | | | | | | | | | Users of the locks API commonly call either posix_lock_file_wait() or flock_lock_file_wait() depending upon the lock type. Add a new function locks_lock_inode_wait() which will check and call the correct function for the type of lock passed in. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: Use more file_inode and fix a commentBenjamin Coddington2015-10-151-5/+3
| | | | | Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* fs: fix data races on inode->i_flctxDmitry Vyukov2015-09-211-27/+36
| | | | | | | | | | | | | | | | | | | | | | locks_get_lock_context() uses cmpxchg() to install i_flctx. cmpxchg() is a release operation which is correct. But it uses a plain load to load i_flctx. This is incorrect. Subsequent loads from i_flctx can hoist above the load of i_flctx pointer itself and observe uninitialized garbage there. This in turn can lead to corruption of ctx->flc_lock and other members. Documentation/memory-barriers.txt explicitly requires to use a barrier in such context: "A load-load control dependency requires a full read memory barrier". Use smp_load_acquire() in locks_get_lock_context() and in bunch of other functions that can proceed concurrently with locks_get_lock_context(). The data race was found with KernelThreadSanitizer (KTSAN). Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* fs: fix fs/locks.c kernel-doc warningRandy Dunlap2015-08-311-0/+1
| | | | | | | | | Fix kernel-doc warnings in fs/locks.c: Warning(..//fs/locks.c:1577): No description found for parameter 'flags' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: inline posix_lock_file_wait and flock_lock_file_waitJeff Layton2015-07-131-28/+0
| | | | | | | They just call file_inode and then the corresponding *_inode_file_wait function. Just make them static inlines instead. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: new helpers - flock_lock_inode_wait and posix_lock_inode_waitJeff Layton2015-07-131-12/+38
| | | | | | | | Allow callers to pass in an inode instead of a filp. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org> Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
* locks: have flock_lock_file take an inode pointer instead of a filpJeff Layton2015-07-131-6/+6
| | | | | | | | | | | | | | | | | | ...and rename it to better describe how it works. In order to fix a use-after-free in NFS, we need to be able to remove locks from an inode after the filp associated with them may have already been freed. flock_lock_file already only dereferences the filp to get to the inode, so just change it so the callers do that. All of the callers already pass in a lock request that has the fl_file set properly, so we don't need to pass it in individually. With that change it now only dereferences the filp to get to the inode, so just push that out to the callers. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org> Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
* proc: show locks in /proc/pid/fdinfo/XAndrey Vagin2015-04-171-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's show locks which are associated with a file descriptor in its fdinfo file. Currently we don't have a reliable way to determine who holds a lock. We can find some information in /proc/locks, but PID which is reported there can be wrong. For example, a process takes a lock, then forks a child and dies. In this case /proc/locks contains the parent pid, which can be reused by another process. $ cat /proc/locks ... 6: FLOCK ADVISORY WRITE 324 00:13:13431 0 EOF ... $ ps -C rpcbind PID TTY TIME CMD 332 ? 00:00:00 rpcbind $ cat /proc/332/fdinfo/4 pos: 0 flags: 0100000 mnt_id: 22 lock: 1: FLOCK ADVISORY WRITE 324 00:13:13431 0 EOF $ ls -l /proc/332/fd/4 lr-x------ 1 root root 64 Mar 5 14:43 /proc/332/fd/4 -> /run/rpcbind.lock $ ls -l /proc/324/fd/ total 0 lrwx------ 1 root root 64 Feb 27 14:50 0 -> /dev/pts/0 lrwx------ 1 root root 64 Feb 27 14:50 1 -> /dev/pts/0 lrwx------ 1 root root 64 Feb 27 14:49 2 -> /dev/pts/0 You can see that the process with the 324 pid doesn't hold the lock. This information is required for proper dumping and restoring file locks. Signed-off-by: Andrey Vagin <avagin@openvz.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: Jeff Layton <jlayton@poochiereds.net> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* locks: use cmpxchg to assign i_flctx pointerJeff Layton2015-04-031-8/+1
| | | | | | | | | | | | | | | | | | | | | | | During the v3.20/v4.0 cycle, I had originally had the code manage the inode->i_flctx pointer using a compare-and-swap operation instead of the i_lock. Sasha Levin though hit a problem while testing with trinity that made me believe that that wasn't safe. At the time, changing the code to protect the i_flctx pointer seemed to fix the issue, but I now think that was just coincidence. The issue was likely the same race that Kirill Shutemov hit while testing the pre-rc1 v4.0 kernel and that Linus spotted. Due to the way that the spinlock was dropped in the middle of flock_lock_file, you could end up with multiple flock locks for the same struct file on the inode. Reinstate the use of a CAS operation to assign this pointer since it's likely to be more efficient and gets the i_lock completely out of the file locking business. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: get rid of WE_CAN_BREAK_LSLK_NOW dead codeJeff Layton2015-04-031-6/+1
| | | | | | | | | | | | | As Bruce points out, there's no compelling reason to change /proc/locks output at this point. If we did want to do this, then we'd almost certainly want to introduce a new file to display this info (maybe via debugfs?). Let's remove the dead WE_CAN_BREAK_LSLK_NOW ifdef here and just plan to stay with the legacy format. Reported-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: change lm_get_owner and lm_put_owner prototypesJeff Layton2015-04-031-3/+5
| | | | | | | | | | | | The current prototypes for these operations are somewhat awkward as they deal with fl_owners but take struct file_lock arguments. In the future, we'll want to be able to take references without necessarily dealing with a struct file_lock. Change them to take fl_owner_t arguments instead and have the callers deal with assigning the values to the file_lock structs. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
* locks: don't allocate a lock context for an F_UNLCK requestJeff Layton2015-04-031-8/+12
| | | | | | | | | | | | In the event that we get an F_UNLCK request on an inode that has no lock context, there is no reason to allocate one. Change locks_get_lock_context to take a "type" pointer and avoid allocating a new context if it's F_UNLCK. Then, fix the callers to return appropriately if that function returns NULL. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
* locks: Add lockdep assertion for blocked_lock_lockDaniel Wagner2015-04-031-0/+6
| | | | | | | | Annonate insert, remove and iterate function that we need blocked_lock_lock held. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: remove extraneous IS_POSIX and IS_FLOCK testsJeff Layton2015-04-031-2/+2
| | | | | | | We know that the locks being passed into this function are of the correct type, now that they live on their own lists. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: Remove unnecessary IS_POSIX testDaniel Wagner2015-04-031-2/+0
| | | | | | | | | | | | | | | | | | | Since following change commit bd61e0a9c852de2d705b6f1bb2cc54c5774db570 Author: Jeff Layton <jlayton@primarydata.com> Date: Fri Jan 16 15:05:55 2015 -0500 locks: convert posix locks to file_lock_context all Posix locks are kept on their a separate list, so the test is redudant. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Jeff Layton <jlayton@primarydata.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: fix file_lock deletion inside loopYan, Zheng2015-03-271-3/+2
| | | | | | | | | locks_delete_lock_ctx() is called inside the loop, so we should use list_for_each_entry_safe. Fixes: 8634b51f6ca2 (locks: convert lease handling to file_lock_context) Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: fix generic_delete_lease tracepoint to use victim pointerJeff Layton2015-03-141-1/+1
| | | | | | | It's possible that "fl" won't point at a valid lock at this point, so use "victim" instead which is either a valid lock or NULL. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: fix fasync_struct memory leak in lease upgrade/downgrade handlingJeff Layton2015-03-041-1/+2
| | | | | | | | | | | | | | | | | | | Commit 8634b51f6ca2 (locks: convert lease handling to file_lock_context) introduced a regression in the handling of lease upgrade/downgrades. In the event that we already have a lease on a file and are going to either upgrade or downgrade it, we skip doing any list insertion or deletion and simply re-call lm_setup on the existing lease. As of commit 8634b51f6ca2 however, we end up calling lm_setup on the lease that was passed in, instead of on the existing lease. This causes us to leak the fasync_struct that was allocated in the event that there was not already an existing one (as it always appeared that there wasn't one). Fixes: 8634b51f6ca2 (locks: convert lease handling to file_lock_context) Reported-and-Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: fix list insertion when lock is split in twoJeff Layton2015-02-171-0/+1
| | | | | | | | | | | | | In the case where we're splitting a lock in two, the current code the new "left" lock in the incorrect spot. It's inserted just before "right" when it should instead be inserted just before the new lock. When we add a new lock, set "fl" to that value so that we can add "left" before it. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: remove conditional lock release in middle of flock_lock_fileJeff Layton2015-02-171-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Linus pointed out: Say we have an existing flock, and now do a new one that conflicts. I see what looks like three separate bugs. - We go through the first loop, find a lock of another type, and delete it in preparation for replacing it - we *drop* the lock context spinlock. - BUG #1? So now there is no lock at all, and somebody can come in and see that unlocked state. Is that really valid? - another thread comes in while the first thread dropped the lock context lock, and wants to add its own lock. It doesn't see the deleted or pending locks, so it just adds it - the first thread gets the context spinlock again, and adds the lock that replaced the original - BUG #2? So now there are *two* locks on the thing, and the next time you do an unlock (or when you close the file), it will only remove/replace the first one. ...remove the "drop the spinlock" code in the middle of this function as it has always been suspicious. This should eliminate the potential race that can leave two locks for the same struct file on the list. He also pointed out another thing as a bug -- namely that you flock_lock_file removes the lock from the list unconditionally when doing a lock upgrade, without knowing whether it'll be able to set the new lock. Bruce pointed out that this is expected behavior and may help prevent certain deadlock situations. We may want to revisit that at some point, but it's probably best that we do so in the context of a different patchset. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* locks: only remove leases associated with the file being closedJeff Layton2015-02-171-1/+2
| | | | | | We don't want to remove all leases just because one filp was closed. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* Revert "locks: keep a count of locks on the flctx lists"Jeff Layton2015-02-161-29/+16
| | | | | | | | | | | This reverts commit 9bd0f45b7037fcfa8b575c7e27d0431d6e6dc3bb. Linus rightly pointed out that I failed to initialize the counters when adding them, so they don't work as expected. Just revert this patch for now. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* fs: add FL_LAYOUT lease typeChristoph Hellwig2015-02-021-4/+10
| | | | | | | | | | | | | | | | This (ab-)uses the file locking code to allow filesystems to recall outstanding pNFS layouts on a file. This new lease type is similar but not quite the same as FL_DELEG. A FL_LAYOUT lease can always be granted, an a per-filesystem lock (XFS iolock for the initial implementation) ensures not FL_LAYOUT leases granted when we would need to recall them. Also included are changes that allow multiple outstanding read leases of different types on the same file as long as they have a differnt owner. This wasn't a problem until now as nfsd never set FL_LEASE leases, and no one else used FL_DELEG leases, but given that nfsd will also issues FL_LAYOUT leases we will have to handle it now. Signed-off-by: Christoph Hellwig <hch@lst.de>