summaryrefslogtreecommitdiffstats
path: root/fs/open.c
Commit message (Collapse)AuthorAgeFilesLines
* fs: Protect write paths by sb_start_write - sb_end_writeJan Kara2012-07-311-1/+6
| | | | | | | | | | | | | | | | | | | | | | There are several entry points which dirty pages in a filesystem. mmap (handled by block_page_mkwrite()), buffered write (handled by __generic_file_aio_write()), splice write (generic_file_splice_write), truncate, and fallocate (these can dirty last partial page - handled inside each filesystem separately). Protect these places with sb_start_write() and sb_end_write(). ->page_mkwrite() calls are particularly complex since they are called with mmap_sem held and thus we cannot use standard sb_start_write() due to lock ordering constraints. We solve the problem by using a special freeze protection sb_start_pagefault() which ranks below mmap_sem. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: Add freezing handling to mnt_want_write() / mnt_drop_write()Jan Kara2012-07-311-1/+1
| | | | | | | | | | | | | | | | Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* take grabbing f->f_path to do_dentry_open()Al Viro2012-07-291-4/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch dentry_open() to struct path, make it grab references itselfAl Viro2012-07-231-12/+5
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Make chown() and lchown() call fchownat()David Howells2012-07-141-34/+7
| | | | | | | | Make the chown() and lchown() syscalls jump to the fchownat() syscall with the appropriate extra arguments. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* do_dentry_open(): close the race with mark_files_ro() in failure exitAl Viro2012-07-141-1/+1
| | | | | | | we want to take it out of mark_files_ro() reach *before* we start checking if we ought to drop write access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* do_dentry_open(): take initialization of file->f_path to callerAl Viro2012-07-141-14/+12
| | | | | | | ... and get rid of a couple of arguments and a pointless reassignment in finish_open() case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fold __dentry_open() into its sole callerAl Viro2012-07-141-21/+12
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch do_dentry_open() to returning intAl Viro2012-07-141-20/+20
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* make finish_no_open() return intAl Viro2012-07-141-1/+2
| | | | | | | namely, 1 ;-) That's what we want to return from ->atomic_open() instances after finish_no_open(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* kill struct opendataAl Viro2012-07-141-9/+11
| | | | | | | | | Just pass struct file *. Methods are happier that way... There's no need to return struct file * from finish_open() now, so let it return int. Next: saner prototypes for parts in namei.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* kill opendata->{mnt,dentry}Al Viro2012-07-141-3/+3
| | | | | | ->filp->f_path is there for purpose... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't modify od->filp at allAl Viro2012-07-141-3/+2
| | | | | | make put_filp() conditional on flag set by finish_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ->atomic_open() prototype change - pass int * instead of bool *Al Viro2012-07-141-2/+5
| | | | | | | | | ... and let finish_open() report having opened the file via that sucker. Next step: don't modify od->filp at all. [AV: FILE_CREATE was already used by cifs; Miklos' fix folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: remove open intents from nameidataMiklos Szeredi2012-07-141-85/+2
| | | | | | | | | All users of open intents have been converted to use ->atomic_{open,create}. This patch gets rid of nd->intent.open and related infrastructure. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: add i_op->atomic_open()Miklos Szeredi2012-07-141-0/+42
| | | | | | | | | | | | | | | | | | | | | | | Add a new inode operation which is called on the last component of an open. Using this the filesystem can look up, possibly create and open the file in one atomic operation. If it cannot perform this (e.g. the file type turned out to be wrong) it may signal this by returning NULL instead of an open struct file pointer. i_op->atomic_open() is only called if the last component is negative or needs lookup. Handling cached positive dentries here doesn't add much value: these can be opened using f_op->open(). If the cached file turns out to be invalid, the open can be retried, this time using ->atomic_open() with a fresh dentry. For now leave the old way of using open intents in lookup and revalidate in place. This will be removed once all the users are converted. David Howells noticed that if ->atomic_open() opens the file but does not create it, handle_truncate() will be called on it even if it is not a regular file. Fix this by checking the file type in this case too. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: make O_PATH file descriptors usable for 'fchdir()'Linus Torvalds2012-07-071-3/+3
| | | | | | | | | | | | | | | We already use them for openat() and friends, but fchdir() also wants to be able to use O_PATH file descriptors. This should make it comparable to the O_SEARCH of Solaris. In particular, O_PATH allows you to access (not-quite-open) a directory you don't have read persmission to, only execute permission. Noticed during development of multithread support for ksh93. Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org # O_PATH introduced in 3.0+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: nameidata_to_filp(): don't throw away file on errorMiklos Szeredi2012-06-011-3/+5
| | | | | | | | If open fails, don't put the file. This allows it to be reused if open needs to be retried. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: nameidata_to_filp(): inline __dentry_open()Miklos Szeredi2012-06-011-2/+18
| | | | | | | Copy __dentry_open() into nameidata_to_filp(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: do_dentry_open(): don't put filpMiklos Szeredi2012-06-011-1/+2
| | | | | | | Move put_filp() out to __dentry_open(), the only caller now. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: split __dentry_open()Miklos Szeredi2012-06-011-14/+33
| | | | | | | | | | | | | Split __dentry_open() into two functions: do_dentry_open() - does most of the actual work, doesn't put file on failure open_check_o_direct() - after a successful open, checks direct_IO method This will allow i_op->atomic_open to do just the file initialization and leave the direct_IO checking to the VFS. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-05-231-3/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace enhancements from Eric Biederman: "This is a course correction for the user namespace, so that we can reach an inexpensive, maintainable, and reasonably complete implementation. Highlights: - Config guards make it impossible to enable the user namespace and code that has not been converted to be user namespace safe. - Use of the new kuid_t type ensures the if you somehow get past the config guards the kernel will encounter type errors if you enable user namespaces and attempt to compile in code whose permission checks have not been updated to be user namespace safe. - All uids from child user namespaces are mapped into the initial user namespace before they are processed. Removing the need to add an additional check to see if the user namespace of the compared uids remains the same. - With the user namespaces compiled out the performance is as good or better than it is today. - For most operations absolutely nothing changes performance or operationally with the user namespace enabled. - The worst case performance I could come up with was timing 1 billion cache cold stat operations with the user namespace code enabled. This went from 156s to 164s on my laptop (or 156ns to 164ns per stat operation). - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value. Most uid/gid setting system calls treat these value specially anyway so attempting to use -1 as a uid would likely cause entertaining failures in userspace. - If setuid is called with a uid that can not be mapped setuid fails. I have looked at sendmail, login, ssh and every other program I could think of that would call setuid and they all check for and handle the case where setuid fails. - If stat or a similar system call is called from a context in which we can not map a uid we lie and return overflowuid. The LFS experience suggests not lying and returning an error code might be better, but the historical precedent with uids is different and I can not think of anything that would break by lying about a uid we can't map. - Capabilities are localized to the current user namespace making it safe to give the initial user in a user namespace all capabilities. My git tree covers all of the modifications needed to convert the core kernel and enough changes to make a system bootable to runlevel 1." Fix up trivial conflicts due to nearby independent changes in fs/stat.c * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) userns: Silence silly gcc warning. cred: use correct cred accessor with regards to rcu read lock userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq userns: Convert cgroup permission checks to use uid_eq userns: Convert tmpfs to use kuid and kgid where appropriate userns: Convert sysfs to use kgid/kuid where appropriate userns: Convert sysctl permission checks to use kuid and kgids. userns: Convert proc to use kuid/kgid where appropriate userns: Convert ext4 to user kuid/kgid where appropriate userns: Convert ext3 to use kuid/kgid where appropriate userns: Convert ext2 to use kuid/kgid where appropriate. userns: Convert devpts to use kuid/kgid where appropriate userns: Convert binary formats to use kuid/kgid where appropriate userns: Add negative depends on entries to avoid building code that is userns unsafe userns: signal remove unnecessary map_cred_ns userns: Teach inode_capable to understand inodes whose uids map to other namespaces. userns: Fail exec for suid and sgid binaries with ids outside our user namespace. userns: Convert stat to return values mapped from kuids and kgids userns: Convert user specfied uids and gids in chown into kuids and kgid userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs ...
| * userns: Convert user specfied uids and gids in chown into kuids and kgidEric W. Biederman2012-05-031-2/+11
| | | | | | | | | | Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
| * userns: Convert capabilities related permsion checksEric W. Biederman2012-05-031-1/+2
| | | | | | | | | | | | | | | | | | - Use uid_eq when comparing kuids Use gid_eq when comparing kgids - Use make_kuid(user_ns, 0) to talk about the user_namespace root uid Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | SELinux: rename dentry_open to file_openEric Paris2012-04-091-1/+1
|/ | | | | | dentry_open takes a file, rename it to file_open Signed-off-by: Eric Paris <eparis@redhat.com>
* Wrap accesses to the fd_sets in struct fdtableDavid Howells2012-02-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wrap accesses to the fd_sets in struct fdtable (for recording open files and close-on-exec flags) so that we can move away from using fd_sets since we abuse the fd_set structs by not allocating the full-sized structure under normal circumstances and by non-core code looking at the internals of the fd_sets. The first abuse means that use of FD_ZERO() on these fd_sets is not permitted, since that cannot be told about their abnormal lengths. This introduces six wrapper functions for setting, clearing and testing close-on-exec flags and fd-is-open flags: void __set_close_on_exec(int fd, struct fdtable *fdt); void __clear_close_on_exec(int fd, struct fdtable *fdt); bool close_on_exec(int fd, const struct fdtable *fdt); void __set_open_fd(int fd, struct fdtable *fdt); void __clear_open_fd(int fd, struct fdtable *fdt); bool fd_is_open(int fd, const struct fdtable *fdt); Note that I've prepended '__' to the names of the set/clear functions because they require the caller to hold a lock to use them. Note also that I haven't added wrappers for looking behind the scenes at the the array. Possibly that should exist too. Signed-off-by: David Howells <dhowells@redhat.com> Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk>
* switch security_path_chmod() to struct path *Al Viro2012-01-061-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch open and mkdir syscalls to umode_tAl Viro2012-01-031-6/+6
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch sys_chmod()/sys_fchmod()/sys_fchmodat() to umode_tAl Viro2012-01-031-3/+3
| | | | | | SYSCALLx magic should take care of things, according to Linus... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: mnt_drop_write_file()Al Viro2012-01-031-1/+1
| | | | | | | new helper (wrapper around mnt_drop_write()) to be used in pair with mnt_want_write_file(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* leases: fix write-open/read-lease raceJ. Bruce Fields2011-10-281-0/+4
| | | | | | | | | | | | | | | | In setlease, we use i_writecount to decide whether we can give out a read lease. In open, we break leases before incrementing i_writecount. There is therefore a window between the break lease and the i_writecount increment when setlease could add a new read lease. This would leave us with a simultaneous write open and read lease, which shouldn't happen. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* merge fchmod() and fchmodat() guts, kill ancient broken kludgeAl Viro2011-07-261-50/+28
| | | | | | | | | | | | | The kludge in question is undocumented and doesn't work for 32bit binaries on amd64, sparc64 and s390. Passing (mode_t)-1 as mode had (since 0.99.14v and contrary to behaviour of any other Unix, prescriptions of POSIX, SuS and our own manpages) was kinda-sorta no-op. Note that any software relying on that (and looking for examples shows none) would be visibly broken on sparc64, where practically all userland is built 32bit. No such complaints noticed... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filpKonstantin Khlebnikov2011-07-221-1/+1
| | | | | | | Replace unclear (struct dentry *) to (struct file *) typecast with ERR_CAST() macro. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: Use BUG_ON(!mnt) at dentry_open().Tetsuo Handa2011-03-211-11/+2
| | | | | | | dentry_open() requires callers to pass a valid vfsmount. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-03-161-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits) AppArmor: kill unused macros in lsm.c AppArmor: cleanup generated files correctly KEYS: Add an iovec version of KEYCTL_INSTANTIATE KEYS: Add a new keyctl op to reject a key with a specified error code KEYS: Add a key type op to permit the key description to be vetted KEYS: Add an RCU payload dereference macro AppArmor: Cleanup make file to remove cruft and make it easier to read SELinux: implement the new sb_remount LSM hook LSM: Pass -o remount options to the LSM SELinux: Compute SID for the newly created socket SELinux: Socket retains creator role and MLS attribute SELinux: Auto-generate security_is_socket_class TOMOYO: Fix memory leak upon file open. Revert "selinux: simplify ioctl checking" selinux: drop unused packet flow permissions selinux: Fix packet forwarding checks on postrouting selinux: Fix wrong checks for selinux_policycap_netpeer selinux: Fix check for xfrm selinux context algorithm ima: remove unnecessary call to ima_must_measure IMA: remove IMA imbalance checking ...
| * Merge branch 'next' into for-linusJames Morris2011-03-161-1/+2
| |\
| | * Merge branch 'master'; commit 'v2.6.38-rc7' into nextJames Morris2011-03-081-3/+10
| | |\
| | * | IMA: maintain i_readcount in the VFS layerMimi Zohar2011-02-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ima_counts_get() updated the readcount and invalidated the PCR, as necessary. Only update the i_readcount in the VFS layer. Move the PCR invalidation checks to ima_file_check(), where it belongs. Maintaining the i_readcount in the VFS layer, will allow other subsystems to use i_readcount. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com>
* | | | readlinkat(), fchownat() and fstatat() with empty relative pathnamesAl Viro2011-03-151-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For readlinkat() we simply allow empty pathname; it will fail unless we have dfd equal to O_PATH-opened symlink, so we are outside of POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH; let the caller explicitly ask for such behaviour. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | New kind of open files - "location only".Al Viro2011-03-151-6/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New flag for open(2) - O_PATH. Semantics: * pathname is resolved, but the file itself is _NOT_ opened as far as filesystem is concerned. * almost all operations on the resulting descriptors shall fail with -EBADF. Exceptions are: 1) operations on descriptors themselves (i.e. close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD), fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD), fcntl(fd, F_SETFD, ...)) 2) fcntl(fd, F_GETFL), for a common non-destructive way to check if descriptor is open 3) "dfd" arguments of ...at(2) syscalls, i.e. the starting points of pathname resolution * closing such descriptor does *NOT* affect dnotify or posix locks. * permissions are checked as usual along the way to file; no permission checks are applied to the file itself. Of course, giving such thing to syscall will result in permission checks (at the moment it means checking that starting point of ....at() is a directory and caller has exec permissions on it). fget() and fget_light() return NULL on such descriptors; use of fget_raw() and fget_raw_light() is needed to get them. That protects existing code from dealing with those things. There are two things still missing (they come in the next commits): one is handling of symlinks (right now we refuse to open them that way; see the next commit for semantics related to those) and another is descriptor passing via SCM_RIGHTS datagrams. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | open-style analog of vfs_path_lookup()Al Viro2011-03-141-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | new function: file_open_root(dentry, mnt, name, flags) opens the file vfs_path_lookup would arrive to. Note that name can be empty; in that case the usual requirement that dentry should be a directory is lifted. open-coded equivalents switched to it, may_open() got down exactly one caller and became static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | switch do_filp_open() to struct open_flagsAl Viro2011-03-141-1/+72
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | take calculation of open_flags by open(2) arguments into new helper in fs/open.c, move filp_open() over there, have it and do_sys_open() use that helper, switch exec.c callers of do_filp_open() to explicit (and constant) struct open_flags. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | / Check for immutable/append flag in fallocate pathMarco Stornelli2011-03-101-0/+8
| |/ |/| | | | | | | | | | | | | | | | | | | | | In the fallocate path the kernel doesn't check for the immutable/append flag. It's possible to have a race condition in this scenario: an application open a file in read/write and it does something, meanwhile root set the immutable flag on the file, the application at that point can call fallocate with success. In addition, we don't allow to do any unreserve operation on an append only file but only the reserve one. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Fix possible filp_cachep memory corruptionLinus Torvalds2011-02-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 31e6b01f4183 ("fs: rcu-walk for path lookup") we started doing path lookup using RCU, which then falls back to a careful non-RCU lookup in case of problems (LOOKUP_REVAL). So do_filp_open() has this "re-do the lookup carefully" looping case. However, that means that we must not release the open-intent file data if we are going to loop around and use it once more! Fix this by moving the release of the open-intent data to the function that allocates it (do_filp_open() itself) rather than the helper functions that can get called multiple times (finish_open() and do_last()). This makes the logic for the lifetime of that field much more obvious, and avoids the possible double free. Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fallocate should be a file operationChristoph Hellwig2011-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently all filesystems except XFS implement fallocate asynchronously, while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC I/O we really want our allocation on disk, especially for the !KEEP_SIZE case where we actually grow the file with user-visible zeroes. On the other hand always commiting the transaction is a bad idea for fast-path uses of fallocate like for example in recent Samba versions. Given that block allocation is a data plane operation anyway change it from an inode operation to a file operation so that we have the file structure available that lets us check for O_SYNC. This also includes moving the code around for a few of the filesystems, and remove the already unnedded S_ISDIR checks given that we only wire up fallocate for regular files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fs: add hole punching to fallocateJosef Bacik2011-01-121-1/+6
|/ | | | | | | | | | | | Hole punching has already been implemented by XFS and OCFS2, and has the potential to be implemented on both BTRFS and EXT4 so we need a generic way to get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE to fallocate() since it already looks like the normal fallocate() operation. I've tested this patch with XFS and BTRFS to make sure XFS did what it's supposed to do and that BTRFS failed like it was supposed to. Thank you, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fix open/umount raceAl Viro2010-10-291-3/+3
| | | | | | | | | | | | | | nameidata_to_filp() drops nd->path or transfers it to opened file. In the former case it's a Bad Idea(tm) to do mnt_drop_write() on nd->path.mnt, since we might race with umount and vfsmount in question might be gone already. Fix: don't drop it, then... IOW, have nameidata_to_filp() grab nd->path in case it transfers it to file and do path_drop() in callers. After they are through with accessing nd->path... Reported-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: cleanup files_lock lockingNick Piggin2010-08-181-2/+2
| | | | | | | | | | | | | | | fs: cleanup files_lock locking Lock tty_files with a new spinlock, tty_files_lock; provide helpers to manipulate the per-sb files list; unexport the files_lock spinlock. Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: clarify that nonseekable_open() will never failDmitry Torokhov2010-08-111-1/+3
| | | | | | | | | Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Kacur <jkacur@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds2010-08-101-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits) fanotify: use both marks when possible fsnotify: pass both the vfsmount mark and inode mark fsnotify: walk the inode and vfsmount lists simultaneously fsnotify: rework ignored mark flushing fsnotify: remove global fsnotify groups lists fsnotify: remove group->mask fsnotify: remove the global masks fsnotify: cleanup should_send_event fanotify: use the mark in handler functions audit: use the mark in handler functions dnotify: use the mark in handler functions inotify: use the mark in handler functions fsnotify: send fsnotify_mark to groups in event handling functions fsnotify: Exchange list heads instead of moving elements fsnotify: srcu to protect read side of inode and vfsmount locks fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called fsnotify: use _rcu functions for mark list traversal fsnotify: place marks on object in order of group memory address vfs/fsnotify: fsnotify_close can delay the final work in fput fsnotify: store struct file not struct path ... Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.