summaryrefslogtreecommitdiffstats
path: root/fs/gfs2
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2014-06-122-22/+19
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...
| * ->splice_write() via ->write_iter()Al Viro2014-06-121-2/+2
| | | | | | | | | | | | | | | | | | | | iter_file_splice_write() - a ->splice_write() instance that gathers the pipe buffers, builds a bio_vec-based iov_iter covering those and feeds it to ->write_iter(). A bunch of simple cases coverted to that... [AV: fixed the braino spotted by Cyrill] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * gfs2: switch to ->write_iter()Al Viro2014-05-061-10/+8
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * switch simple generic_file_aio_read() users to ->read_iter()Al Viro2014-05-061-4/+4
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * switch {__,}blockdev_direct_IO() to iov_iterAl Viro2014-05-061-1/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * get rid of pointless iov_length() in ->direct_IO()Al Viro2014-05-061-1/+1
| | | | | | | | | | | | all callers have iov_length(iter->iov, iter->nr_segs) == iov_iter_count(iter) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * pass iov_iter to ->direct_IO()Al Viro2014-05-061-6/+5
| | | | | | | | | | | | unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | mm: non-atomically mark page accessed during page cache allocation where ↵Mel Gorman2014-06-042-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | possible aops->write_begin may allocate a new page and make it visible only to have mark_page_accessed called almost immediately after. Once the page is visible the atomic operations are necessary which is noticable overhead when writing to an in-memory filesystem like tmpfs but should also be noticable with fast storage. The objective of the patch is to initialse the accessed information with non-atomic operations before the page is visible. The bulk of filesystems directly or indirectly use grab_cache_page_write_begin or find_or_create_page for the initial allocation of a page cache page. This patch adds an init_page_accessed() helper which behaves like the first call to mark_page_accessed() but may called before the page is visible and can be done non-atomically. The primary APIs of concern in this care are the following and are used by most filesystems. find_get_page find_lock_page find_or_create_page grab_cache_page_nowait grab_cache_page_write_begin All of them are very similar in detail to the patch creates a core helper pagecache_get_page() which takes a flags parameter that affects its behavior such as whether the page should be marked accessed or not. Then old API is preserved but is basically a thin wrapper around this core function. Each of the filesystems are then updated to avoid calling mark_page_accessed when it is known that the VM interfaces have already done the job. There is a slight snag in that the timing of the mark_page_accessed() has now changed so in rare cases it's possible a page gets to the end of the LRU as PageReferenced where as previously it might have been repromoted. This is expected to be rare but it's worth the filesystem people thinking about it in case they see a problem with the timing change. It is also the case that some filesystems may be marking pages accessed that previously did not but it makes sense that filesystems have consistent behaviour in this regard. The test case used to evaulate this is a simple dd of a large file done multiple times with the file deleted on each iterations. The size of the file is 1/10th physical memory to avoid dirty page balancing. In the async case it will be possible that the workload completes without even hitting the disk and will have variable results but highlight the impact of mark_page_accessed for async IO. The sync results are expected to be more stable. The exception is tmpfs where the normal case is for the "IO" to not hit the disk. The test machine was single socket and UMA to avoid any scheduling or NUMA artifacts. Throughput and wall times are presented for sync IO, only wall times are shown for async as the granularity reported by dd and the variability is unsuitable for comparison. As async results were variable do to writback timings, I'm only reporting the maximum figures. The sync results were stable enough to make the mean and stddev uninteresting. The performance results are reported based on a run with no profiling. Profile data is based on a separate run with oprofile running. async dd 3.15.0-rc3 3.15.0-rc3 vanilla accessed-v2 ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%) tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%) btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%) ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%) xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%) The XFS figure is a bit strange as it managed to avoid a worst case by sheer luck but the average figures looked reasonable. samples percentage ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed [akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'gfs2-merge-window' of ↵Linus Torvalds2014-06-0417-162/+245
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw into next Pull gfs2 updates from Steven Whitehouse: "This must be about the smallest merge window patch set ever for GFS2. It is probably also the first one without a single patch from me. That is down to a combination of factors, and I have some things in the works that are not quite ready yet, that I hope to put in next time around. Returning to what is here this time... we have 3 patches which fix various warnings. Two are bug fixes (for quotas and also a rare recovery race condition). The final patch, from Ben Marzinski, is an important change in the freeze code which has been in progress for some time. This removes the need to take and drop the transaction lock for every single transaction, when the only time it was used, was at file system freeze time. Ben's patch integrates the freeze operation into the journal flush code as an alternative with lower overheads and also lands up resolving some difficult to fix races at the same time" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Prevent recovery before the local journal is set GFS2: fs/gfs2/file.c: kernel-doc warning fixes GFS2: fs/gfs2/bmap.c: kernel-doc warning fixes GFS2: remove transaction glock GFS2: lops.c: replace 0 by NULL for pointers GFS2: quotas not being refreshed in gfs2_adjust_quota
| * | GFS2: Prevent recovery before the local journal is setBob Peterson2014-06-023-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses a completion to prevent dlm's recovery process from referencing and trying to recover a journal before a journal has been opened. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: fs/gfs2/file.c: kernel-doc warning fixesFabian Frederick2014-05-161-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | Related function is not gfs2_set_flags but do_gfs2_set_flags Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: fs/gfs2/bmap.c: kernel-doc warning fixesFabian Frederick2014-05-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix 2 typos and move one definition which was between function comments and function definition (yet another kernel-doc warning) Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: remove transaction glockBenjamin Marzinski2014-05-1415-152/+226
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: lops.c: replace 0 by NULL for pointersFabian Frederick2014-04-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sparse warning: fs/gfs2/lops.c:78:29: "warning: Using plain integer as NULL pointer" Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: quotas not being refreshed in gfs2_adjust_quotaAbhi Das2014-04-171-0/+1
| |/ | | | | | | | | | | | | | | | | | | | | Old values of user quota limits were being used and could allow users to exceed their allotted quotas. This patch refreshes the limits to the latest values so that quotas are enforced correctly. Resolves: rhbz#1077463 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* / arch: Mass conversion of smp_mb__*()Peter Zijlstra2014-04-185-10/+10
|/ | | | | | | | | | | Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* mm: implement ->map_pages for page cacheKirill A. Shutemov2014-04-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | filemap_map_pages() is generic implementation of ->map_pages() for filesystems who uses page cache. It should be safe to use filemap_map_pages() for ->map_pages() if filesystem use filemap_fault() for ->fault(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Ning Qu <quning@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2014-04-041-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits) ext4: fix premature freeing of partial clusters split across leaf blocks ext4: remove unneeded test of ret variable ext4: fix comment typo ext4: make ext4_block_zero_page_range static ext4: atomically set inode->i_flags in ext4_set_inode_flags() ext4: optimize Hurd tests when reading/writing inodes ext4: kill i_version support for Hurd-castrated file systems ext4: each filesystem creates and uses its own mb_cache fs/mbcache.c: doucple the locking of local from global data fs/mbcache.c: change block and index hash chain to hlist_bl_node ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate ext4: refactor ext4_fallocate code ext4: Update inode i_size after the preallocation ext4: fix partial cluster handling for bigalloc file systems ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents ext4: only call sync_filesystm() when remounting read-only fs: push sync_filesystem() down to the file system's remount_fs() jbd2: improve error messages for inconsistent journal heads jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() jbd2: minimize region locked by j_list_lock in journal_get_create_access() ...
| * fs: push sync_filesystem() down to the file system's remount_fs()Theodore Ts'o2014-03-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org
* | Merge tag 'gfs2-merge-window' of ↵Linus Torvalds2014-04-0428-437/+604
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull GFS2 updates from Steven Whitehouse: "One of the main highlights this time, is not the patches themselves but instead the widening contributor base. It is good to see that interest is increasing in GFS2, and I'd like to thank all the contributors to this patch set. In addition to the usual set of bug fixes and clean ups, there are patches to improve inode creation performance when xattrs are required and some improvements to the transaction code which is intended to help improve scalability after further changes in due course. Journal extent mapping is also updated to make it more efficient and again, this is a foundation for future work in this area. The maximum number of ACLs has been increased to 300 (for a 4k block size) which means that even with a few additional xattrs from selinux, everything should fit within a single fs block. There is also a patch to bring GFS2's own copy of the writepages code up to the same level as the core VFS. Eventually we may be able to merge some of this code, since it is fairly similar. The other major change this time, is bringing consistency to the printing of messages via fs_<level>, pr_<level> macros" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (29 commits) GFS2: Fix address space from page function GFS2: Fix uninitialized VFS inode in gfs2_create_inode GFS2: Fix return value in slot_get() GFS2: inline function gfs2_set_mode GFS2: Remove extraneous function gfs2_security_init GFS2: Increase the max number of ACLs GFS2: Re-add a call to log_flush_wait when flushing the journal GFS2: Ensure workqueue is scheduled after noexp request GFS2: check NULL return value in gfs2_ok_to_move GFS2: Convert gfs2_lm_withdraw to use fs_err GFS2: Use fs_<level> more often GFS2: Use pr_<level> more consistently GFS2: Move recovery variables to journal structure in memory GFS2: global conversion to pr_foo() GFS2: return -E2BIG if hit the maximum limits of ACLs GFS2: Clean up journal extent mapping GFS2: replace kmalloc - __vmalloc / memset 0 GFS2: Remove extra "if" in gfs2_log_flush() fs: NULL dereference in posix_acl_to_xattr() GFS2: Move log buffer accounting to transaction ...
| * | GFS2: Fix address space from page functionSteven Whitehouse2014-03-313-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that rgrps use the address space which is part of the super block, we need to update gfs2_mapping2sbd() to take account of that. The only way to do that easily is to use a different set of address_space_operations for rgrps. Reported-by: Abhi Das <adas@redhat.com> Tested-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Fix uninitialized VFS inode in gfs2_create_inodeAbhi Das2014-03-313-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When gfs2_create_inode() fails due to quota violation, the VFS inode is not completely uninitialized. This can cause a list corruption error. This patch correctly uninitializes the VFS inode when a quota violation occurs in the gfs2_create_inode codepath. Resolves: rhbz#1059808 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Fix return value in slot_get()Abhi Das2014-03-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ENOSPC was being returned in slot_get inspite of successful execution of the function. This patch fixes this return code. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: inline function gfs2_set_modeBob Peterson2014-03-191-15/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is a revised patch based on Steve's feedback: This patch eliminates function gfs2_set_mode which was only called in one place, and always returned 0. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Remove extraneous function gfs2_security_initBob Peterson2014-03-191-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch eliminates function gfs2_security_init in favor of just calling security_inode_init_security directly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Increase the max number of ACLsBob Peterson2014-03-192-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch increases the maximum number of ACLs from 25 to 300 for a 4K block size. The value is adjusted accordingly if the block size is smaller. Note that this is an arbitrary limit with a performance tradeoff, and that the physical limit is slightly over 500. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Re-add a call to log_flush_wait when flushing the journalBob Peterson2014-03-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit 34cc178 changed a line of code from calling function log_flush_commit to calling log_write_header. This had the effect of eliminating a call to function log_flush_wait. That causes the journal to skip over log headers, which results in multiple wrap points, which itself leads to infinite loops in journal replay, both in the kernel code and fsck.gfs2 code. This patch re-adds that call. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Ensure workqueue is scheduled after noexp requestBob Peterson2014-03-121-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch closes a small timing window whereby a request to hold the transaction glock can get stuck. The problem is that after the DLM has granted the lock, it can get into a state whereby it doesn't transition the glock to a held state, due to not having requeued the glock state machine to finish the transition. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: check NULL return value in gfs2_ok_to_moveAbhi Das2014-03-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gfs2_lookupi() can return NULL if the path to the root is broken by another rename/rmdir. In this case gfs2_ok_to_move() must check for this NULL pointer and return error. Resolves: rhbz#1060246 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Convert gfs2_lm_withdraw to use fs_errJoe Perches2014-03-073-50/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vprintk use is not prefixed by a KERN_<LEVEL>, so emit these messages at KERN_ERR level. Using %pV can save some code and allow fs_err to be used, so do it. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Use fs_<level> more oftenJoe Perches2014-03-073-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Convert a couple of uses of pr_<level> to fs_<level> Add and use fs_emerg. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Use pr_<level> more consistentlyJoe Perches2014-03-0712-71/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add pr_fmt, remove embedded "GFS2: " prefixes. This now consistently emits lower case "gfs2: " for each message. Other miscellanea around these changes: o Add missing newlines o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Move recovery variables to journal structure in memoryBob Peterson2014-03-075-43/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If multiple nodes fail and their recovery work runs simultaneously, they would use the same unprotected variables in the superblock. For example, they would stomp on each other's revoked blocks lists, which resulted in file system metadata corruption. This patch moves the necessary variables so that each journal has its own separate area for tracking its journal replay. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: global conversion to pr_foo()Fabian Frederick2014-03-0610-52/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -All printk(KERN_foo converted to pr_foo(). -Messages updated to fit in 80 columns. -fs_macros converted as well. -fs_printk removed. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: return -E2BIG if hit the maximum limits of ACLsJie Liu2014-03-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return -E2BIG rather than -EINVAL if hit the maximum size limits of ACLs, as the former errno is consistent with VFS xattr syscalls. This is pointed out by Dave Chinner in previous discussion thread: http://www.spinics.net/lists/linux-fsdevel/msg71125.html Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Clean up journal extent mappingSteven Whitehouse2014-03-036-75/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a long standing issue in mapping the journal extents. Most journals will consist of only a single extent, and although the cache took account of that by merging extents, it did not actually map large extents, but instead was doing a block by block mapping. Since the journal was only being mapped on mount, this was not normally noticeable. With the updated code, it is now possible to use the same extent mapping system during journal recovery (which will be added in a later patch). This will allow checking of the integrity of the journal before any reply of the journal content is attempted. For this reason the code is moving to bmap.c, since it will be used more widely in due course. An exercise left for the reader is to compare the new function gfs2_map_journal_extents() with gfs2_write_alloc_required() Additionally, should there be a failure, the error reporting is also updated to show more detail about what went wrong. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: replace kmalloc - __vmalloc / memset 0Fabian Frederick2014-02-271-4/+3
| | | | | | | | | | | | | | | | | | | | | Use kzalloc and __vmalloc __GFP_ZERO for clean sd_quota_bitmap allocation. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Remove extra "if" in gfs2_log_flush()Steven Whitehouse2014-02-251-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | By reordering some of the assignments in gfs2_log_flush() it is possible to remove one of the "if" statements as it can be merged with one higher up the function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Move log buffer accounting to transactionSteven Whitehouse2014-02-245-65/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we have a master transaction into which other transactions are merged, the accounting can be done using this master transaction. We no longer require the superblock fields which were being used for this function. In addition, this allows for a clean up in calc_reserved() making it rather easier understand. Also, by reducing the number of variables used to track the buffers being added and removed from the journal, a number of error checks are now no longer required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Move log buffer lists into transactionSteven Whitehouse2014-02-247-29/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Over time, we hope to be able to improve the concurrency available in the log code. This is one small step towards that, by moving the buffer lists from the super block, and into the transaction structure, so that each transaction builds its own buffer lists. At transaction commit time, the buffer lists are merged into the currently accumulating transaction. That transaction then is passed into the before and after commit functions at journal flush time. Thus there should be no change in overall behaviour yet. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Reduce struct gfs2_trans in sizeSteven Whitehouse2014-02-212-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | A couple of "int" fields were being used as boolean values so we can make them bitfields of one bit, and put them in what might otherwise be a hole in the structure with 64 bit alignment. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: add missing newlineDavid Teigland2014-02-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Log message is missing newline. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Mark functions as static in gfs2/rgrp.cRashika Kheria2014-02-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark functions as static in gfs2/rgrp.c because they are not used outside this file. This eliminates the following warning in gfs2/rgrp.c: fs/gfs2/rgrp.c:1092:5: warning: no previous prototype for ‘gfs2_rgrp_bh_get’ [-Wmissing-prototypes] fs/gfs2/rgrp.c:1157:5: warning: no previous prototype for ‘update_rgrp_lvb’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Add meta readahead field in directory entriesSteven Whitehouse2014-02-071-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intent of this new field in the directory entry is to allow a subsequent lookup to know how many blocks, which are contiguous with the inode, contain metadata which relates to the inode. This will then allow the issuing of a single read to read these blocks, rather than reading the inode first, and then issuing a second read for the metadata. This only works under some fairly strict conditions, since we do not have back pointers from inodes to directory entries we must ensure that the blocks referenced in this way will always belong to the inode. This rules out being able to use this system for indirect blocks, as these can change as a result of truncate/rewrite. So the idea here is to restrict this to xattr blocks only for the time being. For most inodes, that means only a single block. Also, when using ACLs and/or SELinux or other LSMs, these will be added at inode creation time so that they will be contiguous with the inode on disk and also will almost always be needed when we read the inode in for permissions checks. Once an xattr block for an inode is allocated, it will never change until the inode is deallocated. This patch adds the new field, a further patch will add the readahead in due course. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Lock i_mutex and use a local gfs2_holder for fallocateBob Peterson2014-02-061-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch causes GFS2 to lock the i_mutex during fallocate. It also switches from using a dinode's inode glock to using a local holder like the other GFS2 i_operations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: journal data writepages updateSteven Whitehouse2014-02-061-36/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GFS2 has carried what is more or less a copy of the write_cache_pages() for some time. It seems that this copy has slipped behind the core code over time. This patch brings it back uptodate, and in addition adds the tracepoint which would otherwise be missing. We could go further, and eliminate some or all of the code duplication here. The issue is that if we do that, then the function we need to split out from the existing write_cache_pages(), which will look a lot like gfs2_jdata_write_pagevec(), would land up putting quite a lot of extra variables on the stack. I know that has been a problem in the past in the writeback code path, which is why I've hesitated to do it here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Allocate block for xattr at inode alloc time, if requiredSteven Whitehouse2014-02-042-8/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is another step towards improving the allocation of xattr blocks at inode allocation time. Here we take advantage of Christoph's recent work on ACLs to allocate a block for the xattrs early if we know that we will be adding ACLs to the inode later on. The advantage of that is that it is much more likely that we'll get a contiguous run of two blocks where the first is the inode and the second is the xattr block. We still have to fall back to the original system in case we don't get the requested two contiguous blocks, or in case the ACLs are too large to fit into the block. Future patches will move more of the ACL setting code further up the gfs2_inode_create() function. Also, I'd like to be able to do the same thing with the xattrs from LSMs in due course, too. That way we should be able to slowly reduce the number of independent transactions, at least in the most common cases. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Plug on AIL flushSteven Whitehouse2014-02-031-0/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we do a flush of the AIL list, we are writing out what is likely to be a lot of small I/Os, which are possibly in an order which is not ideal performance-wise. Since this is done by calling filemap_fdatatwrite for each individual inode's address space there is no overall plugging going on. In addition to that, we do not always wait for AIL i/o when we flush it, so that it is possible for things to get left behind on the queue. By adding explicit plugging here, we reduce the chances of this being an issues. A quick test using the AIL flush tracepoint shows a small, but measurable improvement. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* / mm + fs: store shadow entries in page cacheJohannes Weiner2014-04-031-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reclaim will be leaving shadow entries in the page cache radix tree upon evicting the real page. As those pages are found from the LRU, an iput() can lead to the inode being freed concurrently. At this point, reclaim must no longer install shadow pages because the inode freeing code needs to ensure the page tree is really empty. Add an address_space flag, AS_EXITING, that the inode freeing code sets under the tree lock before doing the final truncate. Reclaim will check for this flag before installing shadow pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-blockLinus Torvalds2014-01-302-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...