summaryrefslogtreecommitdiffstats
path: root/fs/ocfs2/file.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2013-02-261-7/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
| * kill f_vfsmntAl Viro2013-02-261-2/+2
| | | | | | | | | | | | very few users left... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * new helper: file_inode(file)Al Viro2013-02-221-5/+5
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ocfs2: Compare kuids and kgids using uid_eq and gid_eqEric W. Biederman2013-02-131-4/+4
| | | | | | | | | | | | Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | ocfs2: For tracing report the uid and gid values in the initial user namespaceEric W. Biederman2013-02-131-1/+2
|/ | | | | | Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* ocfs2: drop vmtruncateMarco Stornelli2012-12-201-18/+0
| | | | | | | Removed vmtruncate Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lseek: the "whence" argument is called "whence"Andrew Morton2012-12-171-3/+3
| | | | | | | | | But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* writeback: remove nr_pages_dirtied arg from balance_dirty_pages_ratelimited_nr()Namjae Jeon2012-12-111-4/+1
| | | | | | | | | | | | There is no reason to pass the nr_pages_dirtied argument, because nr_pages_dirtied value from the caller is unused in balance_dirty_pages_ratelimited_nr(). Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* userns: Modify dqget to take struct kqidEric W. Biederman2012-09-181-4/+2
| | | | | | | | | | | | | | | Modify dqget to take struct kqid instead of a type and an identifier pair. Modify the callers of dqget in ocfs2 and dquot to take generate a struct kqid so they can continue to call dqget. The conversion to create struct kqid should all be the final conversions that are needed in those code paths. Cc: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* ocfs2: Convert to new freezing mechanismJan Kara2012-07-311-2/+9
| | | | | | | | | | | | | | Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze protection. We also protect several ioctl entry points which were missing the protection. Finally, we add freeze protection to the journaling mechanism so that iput() of unlinked inode cannot modify a frozen filesystem. CC: Mark Fasheh <mfasheh@suse.com> CC: Joel Becker <jlbec@evilplan.org> CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ocfs2: fix NULL pointer dereference in __ocfs2_change_file_space()Luis Henriques2012-07-111-1/+1
| | | | | | | | | | | | | | | | | | As ocfs2_fallocate() will invoke __ocfs2_change_file_space() with a NULL as the first parameter (file), it may trigger a NULL pointer dereferrence due to a missing check. Addresses http://bugs.launchpad.net/bugs/1006012 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Reported-by: Bret Towe <magnade@gmail.com> Tested-by: Bret Towe <magnade@gmail.com> Cc: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Acked-by: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: clear unaligned io flag when dio failsJunxiao Bi2012-07-031-1/+3
| | | | | | | | | | | The unaligned io flag is set in the kiocb when an unaligned dio is issued, it should be cleared even when the dio fails, or it may affect the following io which are using the same kiocb. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Joel Becker <jlbec@evilplan.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-01-081-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)
| * treewide: Fix typos in various parts of the kernel, and fix some comments.Justin P. Mattock2011-12-021-1/+1
| | | | | | | | | | | | | | | | | | The below patch fixes some typos in various parts of the kernel, as well as fixes some comments. Please let me know if I missed anything, and I will try to get it changed and resent. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | Merge branch 'upstream-linus' of ↵Linus Torvalds2011-12-011-2/+94
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits) ocfs2: avoid unaligned access to dqc_bitmap ocfs2: Use filemap_write_and_wait() instead of write_inode_now() ocfs2: honor O_(D)SYNC flag in fallocate ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2 ocfs2: send correct UUID to cleancache initialization ocfs2: Commit transactions in error cases -v2 ocfs2: make direntry invalid when deleting it fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free ocfs2: Avoid livelock in ocfs2_readpage() ocfs2: serialize unaligned aio ocfs2: Implement llseek() ocfs2: Fix ocfs2_page_mkwrite() ocfs2: Add comment about orphan scanning ocfs2: Clean up messages in the fs ocfs2/cluster: Cluster up now includes network connections too ocfs2/cluster: Add new function o2net_fill_node_map() ocfs2/cluster: Fix output in file elapsed_time_in_ms ocfs2/dlm: dlmlock_remote() needs to account for remastery ocfs2/dlm: Take inflight reference count for remotely mastered resources too ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery() ...
| * ocfs2: honor O_(D)SYNC flag in fallocateMark Fasheh2011-11-171-0/+3
| | | | | | | | | | | | | | | | We need to sync the transaction which updates i_size if the file is marked as needing sync semantics. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Joel Becker <jlbec@evilplan.org>
| * Merge branch 'mw-3.1-jul25' of git://oss.oracle.com/git/smushran/linux-2.6 ↵Joel Becker2011-08-211-18/+78
| |\ | | | | | | | | | into ocfs2-fixes
| | * ocfs2: Implement llseek()Sunil Mushran2011-07-251-2/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocfs2 implements its own llseek() to provide the SEEK_HOLE/SEEK_DATA functionality. SEEK_HOLE sets the file pointer to the start of either a hole or an unwritten (preallocated) extent, that is greater than or equal to the supplied offset. SEEK_DATA sets the file pointer to the start of an allocated extent (not unwritten) that is greater than or equal to the supplied offset. If the supplied offset is on a desired region, then the file pointer is set to it. Offsets greater than or equal to the file size return -ENXIO. Unwritten (preallocated) extents are considered holes because the file system treats reads to such regions in the same way as it does to holes. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
| * | ocfs2: serialize unaligned aioMark Fasheh2011-07-281-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a corruption that can happen when we have (two or more) outstanding aio's to an overlapping unaligned region. Ext4 (e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix similar issues. In our case what happens is that we can have an outstanding aio on a region and if a write comes in with some bytes overlapping the original aio we may decide to read that region into a page before continuing (typically because of buffered-io fallback). Since we have no ordering guarantees with the aio, we can read stale or bad data into the page and then write it back out. If the i/o is page and block aligned, then we avoid this issue as there won't be any need to read data from disk. I took the same approach as Eric in the ext4 patch and introduced some serialization of unaligned async direct i/o. I don't expect this to have an effect on the most common cases of AIO. Unaligned aio will be slower though, but that's far more acceptable than data corruption. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
* | | fs: take the ACL checks to common codeChristoph Hellwig2011-07-251-2/+2
| |/ |/| | | | | | | | | | | | | | | Replace the ->check_acl method with a ->get_acl method that simply reads an ACL from disk after having a cache miss. This means we can replace the ACL checking boilerplate code with a single implementation in namei.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlersJosef Bacik2011-07-201-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fs: always maintain i_dio_countChristoph Hellwig2011-07-201-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING. This these filesystems to also protect truncate against direct I/O requests by using common code. Right now the only non-DIO_LOCKING filesystem that appears to do so is XFS, which uses an opencoded variant of the i_dio_count scheme. Behaviour doesn't change for filesystems never calling inode_dio_wait. For ext4 behaviour changes when using the dioread_nonlock option, which previously was missing any protection between truncate and direct I/O reads. For ocfs2 that handcrafted i_dio_count manipulations are replaced with the common code now enable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fs: move inode_dio_wait calls into ->setattrChristoph Hellwig2011-07-201-0/+2
| | | | | | | | | | | | | | | | | | | | Let filesystems handle waiting for direct I/O requests themselves instead of doing it beforehand. This means filesystem-specific locks to prevent new dio referenes from appearing can be held. This is important to allow generalizing i_dio_count to non-DIO_LOCKING filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fs: kill i_alloc_semChristoph Hellwig2011-07-201-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i_alloc_sem is a rather special rw_semaphore. It's the last one that may be released by a non-owner, and it's write side is always mirrored by real exclusion. It's intended use it to wait for all pending direct I/O requests to finish before starting a truncate. Replace it with a hand-grown construct: - exclusion for truncates is already guaranteed by i_mutex, so it can simply fall way - the reader side is replaced by an i_dio_count member in struct inode that counts the number of pending direct I/O requests. Truncate can't proceed as long as it's non-zero - when i_dio_count reaches non-zero we wake up a pending truncate using wake_up_bit on a new bit in i_flags - new references to i_dio_count can't appear while we are waiting for it to read zero because the direct I/O count always needs i_mutex (or an equivalent like XFS's i_iolock) for starting a new operation. This scheme is much simpler, and saves the space of a spinlock_t and a struct list_head in struct inode (typically 160 bits on a non-debug 64-bit system). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ->permission() sanitizing: don't pass flags to ->permission()Al Viro2011-07-201-2/+2
| | | | | | | | | | | | not used by the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ->permission() sanitizing: don't pass flags to generic_permission()Al Viro2011-07-201-1/+1
| | | | | | | | | | | | | | redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of them removes that bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | kill check_acl callback of generic_permission()Al Viro2011-07-201-1/+3
|/ | | | | | | its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Ocfs2: Teach local-mounted ocfs2 to handle unwritten_extents correctly.Tristan Ye2011-05-251-0/+1
| | | | | | | | Oops, local-mounted of 'ocfs2_fops_no_plocks' is just missing the support of unwritten_extents/punching-hole due to no func pointer was given correctly to '.follocate' field. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
* ocfs2: skip existing hole when removing the last extent_rec in punching-hole ↵Tristan Ye2011-05-131-0/+12
| | | | | | | | | | | | | | | | | codes. In the case of removing a partial extent record which covers a hole, current punching-hole logic will try to remove more than the length of whole extent record, which leads to the failure of following assert(fs/ocfs2/alloc.c): 5507 BUG_ON(cpos < le32_to_cpu(rec->e_cpos) || trunc_range > rec_range); This patch tries to skip existing hole at the last attempt of removing a partial extent record, what's more, it also adds some necessary comments for better understanding of punching-hole codes. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
* ocfs2: Remove mlog(0) from fs/ocfs2/file.cTao Ma2011-02-221-97/+93
| | | | | | | This is the 2nd step to remove the debug info of INODE. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
* ocfs2: Remove EXIT from masklog.Tao Ma2011-03-071-19/+2
| | | | | | | | | | | | | | | | | | mlog_exit is used to record the exit status of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. This patch just try to remove it or change it. So: 1. if all the error paths already use mlog_errno, it is just removed. Otherwise, it will be replaced by mlog_errno. 2. if it is used to print some return value, it is replaced with mlog(0,...). mlog_exit_ptr is changed to mlog(0. All those mlog(0,...) will be replaced with trace events later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
* ocfs2: Remove ENTRY from masklog.Tao Ma2011-02-211-41/+32
| | | | | | | | | | | | | | ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
* fallocate should be a file operationChristoph Hellwig2011-01-171-5/+3
| | | | | | | | | | | | | | | | | | | Currently all filesystems except XFS implement fallocate asynchronously, while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC I/O we really want our allocation on disk, especially for the !KEEP_SIZE case where we actually grow the file with user-visible zeroes. On the other hand always commiting the transaction is a bad idea for fast-path uses of fallocate like for example in recent Samba versions. Given that block allocation is a data plane operation anyway change it from an inode operation to a file operation so that we have the file structure available that lets us check for O_SYNC. This also includes moving the code around for a few of the filesystems, and remove the already unnedded S_ISDIR checks given that we only wire up fallocate for regular files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* make the feature checks in ->fallocate future proofChristoph Hellwig2011-01-171-0/+2
| | | | | | | | | | Instead of various home grown checks that might need updates for new flags just check for any bit outside the mask of the features supported by the filesystem. This makes the check future proof for any newly added flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Ocfs2: handle hole punching via fallocate properlyJosef Bacik2011-01-121-2/+6
| | | | | | | | | | This patch just makes ocfs2 use its UNRESERVP ioctl when we get the hole punch flag in fallocate. I didn't test it, but it seems simple enough. Thanks, Acked-by: Jan Kara <jack@suse.cz> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: provide rcu-walk aware permission i_opsNick Piggin2011-01-071-2/+5
| | | | Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem.Tristan Ye2010-12-091-2/+13
| | | | | | | | | | | | | | Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX rw_lock like buffered writes did(rw_level == 1), it turns out messing the usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being failed to get up_read'd correctly. This patch tries to teach ocfs2_dio_end_io to understand well on all locking stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private data, just like what we did for rw_lock. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* fs: kill block_prepare_writeChristoph Hellwig2010-10-251-5/+4
| | | | | | | | | __block_write_begin and block_prepare_write are identical except for slightly different calling conventions. Convert all callers to the __block_write_begin calling conventions and drop block_prepare_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ocfs2: drop the BLKDEV_IFL_WAIT flagLinus Torvalds2010-10-221-2/+1
| | | | | | | | | Commit dd3932eddf42 ("block: remove BLKDEV_IFL_WAIT") had removed the flag argument to blkdev_issue_flush(), but the ocfs2 merge brought in a new one. It didn't cause a merge conflict, so the merges silently worked out fine, but the result didn't actually compile. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: Add a mount option "coherency=*" to handle cluster coherency for ↵Tristan Ye2010-10-111-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | O_DIRECT writes. Currently, the default behavior of O_DIRECT writes was allowing concurrent writing among nodes to the same file, with no cluster coherency guaranteed (no EX lock held). This can leave stale data in the cache for buffered reads on other nodes. The new mount option introduce a chance to choose two different behaviors for O_DIRECT writes: * coherency=full, as the default value, will disallow concurrent O_DIRECT writes by taking EX locks. * coherency=buffered, allow concurrent O_DIRECT writes without EX lock among nodes, which gains high performance at risk of getting stale data on other nodes. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* ocfs2: Silence unused warning.Joel Becker2010-09-151-3/+3
| | | | | | | | When CONFIG_OCFS2_DEBUG_MASKLOG is undefined, we don't use the dentry variable in ocfs2_sync_file(). Let's just move all access to the dentry inside the logging call. Signed-off-by: Joel Becker <joel.becker@oracle.com>
* Merge branch 'cow_readahead' of git://oss.oracle.com/git/tma/linux-2.6 into ↵Joel Becker2010-09-101-7/+10
|\ | | | | | | merge-2
| * ocfs2: Add struct file to ocfs2_refcount_cow.Tao Ma2010-08-121-4/+4
| | | | | | | | | | | | | | Add a new parameter 'struct file *' to ocfs2_refcount_cow so that we can add readahead support later. Signed-off-by: Tao Ma <tao.ma@oracle.com>
| * ocfs2: pass struct file* to ocfs2_prepare_inode_for_write.Tao Ma2010-08-121-3/+6
| | | | | | | | | | | | | | | | struct file * has file_ra_state to store the readahead state and data. So pass this to ocfs2_prepare_inode_for_write. so that it can be used in ocfs2_refcount_cow. Signed-off-by: Tao Ma <tao.ma@oracle.com>
* | ocfs2: Remove ocfs2_sync_inode()Jan Kara2010-09-101-10/+0
| | | | | | | | | | | | | | | | | | | | | | ocfs2_sync_inode() is used only from ocfs2_sync_file(). But all data has already been written before calling ocfs2_sync_file() and ocfs2 doesn't use inode's private_list for tracking metadata buffers thus sync_mapping_buffers() is superfluous as well. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* | ocfs2: Remove obscure error handling in direct_write.Tao Ma2010-09-101-11/+0
| | | | | | | | | | | | | | | | | | In ocfs2, actually we don't allow any direct write pass i_size, see the function ocfs2_prepare_inode_for_write. So we don't need the bogus simple_setsize. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* | Ocfs2: Fix a regression bug from mainline ↵Tristan Ye2010-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit(6b933c8e6f1a2f3118082c455eef25f9b1ac7b45). The patch is to fix the regression bug brought from commit 6b933c8...( 'ocfs2: Avoid direct write if we fall back to buffered I/O'): http://oss.oracle.com/bugzilla/show_bug.cgi?id=1285 The commit 6b933c8e6f1a2f3118082c455eef25f9b1ac7b45 changed __generic_file_aio_write to generic_file_buffered_write, which didn't call filemap_{write,wait}_range to flush the pagecaches when we were falling O_DIRECT writes back to buffered ones. it did hurt the O_DIRECT semantics somehow in extented odirect writes. This patch tries to guarantee O_DIRECT writes of 'fall back to buffered' to be correctly flushed. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
* | ocfs2: Fix deadlock when allocating pageJan Kara2010-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | We cannot call grab_cache_page() when holding filesystem locks or with a transaction started as grab_cache_page() calls page allocation with GFP_KERNEL flag and thus page reclaim can recurse back into the filesystem causing deadlocks or various assertion failures. We have to use find_or_create_page() instead and pass it GFP_NOFS as we do with other allocations. Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <tao.ma@oracle.com>
* | ocfs2: Flush drive's caches on fdatasyncJan Kara2010-09-081-1/+10
|/ | | | | | | | | | | When 'barrier' mount option is specified, we have to issue a cache flush during fdatasync(2). We have to do this even if inode doesn't have I_DIRTY_DATASYNC set because we still have to get written *data* to disk so that they are not lost in case of crash. Acked-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Singed-off-by: Tao Ma <tao.ma@oracle.com>
* check ATTR_SIZE contraints in inode_change_okChristoph Hellwig2010-08-091-3/+3
| | | | | | | | | | | | | | | | | | | | | Make sure we check the truncate constraints early on in ->setattr by adding those checks to inode_change_ok. Also clean up and document inode_change_ok to make this obvious. As a fallout we don't have to call inode_newsize_ok from simple_setsize and simplify it down to a truncate_setsize which doesn't return an error. This simplifies a lot of setattr implementations and means we use truncate_setsize almost everywhere. Get rid of fat_setsize now that it's trivial and mark ext2_setsize static to make the calling convention obvious. Keep the inode_newsize_ok in vmtruncate for now as all callers need an audit for its removal anyway. Note: setattr code in ecryptfs doesn't call inode_change_ok at all and needs a deeper audit, but that is left for later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>