summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
*---. Merge branches 'sched-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds2012-01-191-0/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/accounting, proc: Fix /proc/stat interrupts sum * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore x86/kprobes: Fix typo transferred from Intel manual * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits x86, tsc: Fix SMI induced variation in quick_pit_calibrate() x86, opcode: ANDN and Group 17 in x86-opcode-map.txt x86/kconfig: Move the ZONE_DMA entry under a menu x86/UV2: Add accounting for BAU strong nacks x86/UV2: Ack BAU interrupt earlier x86/UV2: Remove stale no-resources test for UV2 BAU x86/UV2: Work around BAU bug x86/UV2: Fix BAU destination timeout initialization x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode x86: Get rid of dubious one-bit signed bitfield
| * | | sched/accounting, proc: Fix /proc/stat interrupts sumRussell King2012-01-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3292beb340c7688 ("sched/accounting: Change cpustat fields to an array") deleted the code which provides us with the sum of all interrupts in the system, causing vmstat to report zero interrupts occuring in the system. Fix this by restoring the code. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> # [on ARM] Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Glauber Costa <glommer@parallels.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Tuner <pjt@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-01-193-40/+27
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: qnx4: don't leak ->BitMap on late failure exits qnx4: reduce the insane nesting in qnx4_checkroot() qnx4: di_fname is an array, for crying out loud... vfs: remove printk from set_nlink() wake up s_wait_unfrozen when ->freeze_fs fails
| * | | qnx4: don't leak ->BitMap on late failure exitsAl Viro2012-01-191-1/+3
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | qnx4: reduce the insane nesting in qnx4_checkroot()Al Viro2012-01-191-34/+22
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | qnx4: di_fname is an array, for crying out loud...Al Viro2012-01-191-14/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | (struct qnx4_inode_entry *)(bh->b_data + some_offset)->di_fname is not going to be NULL, TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | vfs: remove printk from set_nlink()Miklos Szeredi2012-01-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't log a message for set_nlink(0). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | wake up s_wait_unfrozen when ->freeze_fs failsKazuya Mio2012-01-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dd slept infinitely when fsfeeze failed because of EIO. To fix this problem, if ->freeze_fs fails, freeze_super() wakes up the tasks waiting for the filesystem to become unfrozen. When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(), the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen. However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then freeze_super() returns an error number. In this case, FITHAW ioctl returns EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-01-172-19/+14
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits) audit: no leading space in audit_log_d_path prefix audit: treat s_id as an untrusted string audit: fix signedness bug in audit_log_execve_info() audit: comparison on interprocess fields audit: implement all object interfield comparisons audit: allow interfield comparison between gid and ogid audit: complex interfield comparison helper audit: allow interfield comparison in audit rules Kernel: Audit Support For The ARM Platform audit: do not call audit_getname on error audit: only allow tasks to set their loginuid if it is -1 audit: remove task argument to audit_set_loginuid audit: allow audit matching on inode gid audit: allow matching on obj_uid audit: remove audit_finish_fork as it can't be called audit: reject entry,always rules audit: inline audit_free to simplify the look of generic code audit: drop audit_set_macxattr as it doesn't do anything audit: inline checks for not needing to collect aux records audit: drop some potentially inadvisable likely notations ... Use evil merge to fix up grammar mistakes in Kconfig file. Bad speling and horrible grammar (and copious swearing) is to be expected, but let's keep it to commit messages and comments, rather than expose it to users in config help texts or printouts.
| * | | | audit: do not call audit_getname on errorEric Paris2012-01-171-15/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just a code cleanup really. We don't need to make a function call just for it to return on error. This also makes the VFS function even easier to follow and removes a conditional on a hot path. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | audit: only allow tasks to set their loginuid if it is -1Eric Paris2012-01-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment we allow tasks to set their loginuid if they have CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they log in and it be impossible to ever reset. We had to make it mutable even after it was once set (with the CAP) because on update and admin might have to restart sshd. Now sshd would get his loginuid and the next user which logged in using ssh would not be able to set his loginuid. Systemd has changed how userspace works and allowed us to make the kernel work the way it should. With systemd users (even admins) are not supposed to restart services directly. The system will restart the service for them. Thus since systemd is going to loginuid==-1, sshd would get -1, and sshd would be allowed to set a new loginuid without special permissions. If an admin in this system were to manually start an sshd he is inserting himself into the system chain of trust and thus, logically, it's his loginuid that should be used! Since we have old systems I make this a Kconfig option. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | audit: remove task argument to audit_set_loginuidEric Paris2012-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function always deals with current. Don't expose an option pretending one can use it for something. You can't. Signed-off-by: Eric Paris <eparis@redhat.com>
* | | | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2012-01-1718-542/+374
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: cleanup xfs_file_aio_write xfs: always return with the iolock held from xfs_file_aio_write_checks xfs: remove the i_new_size field in struct xfs_inode xfs: remove the i_size field in struct xfs_inode xfs: replace i_pin_wait with a bit waitqueue xfs: replace i_flock with a sleeping bitlock xfs: make i_flags an unsigned long xfs: remove the if_ext_max field in struct xfs_ifork xfs: remove the unused dm_attrs structure xfs: cleanup xfs_iomap_eof_align_last_fsb xfs: remove xfs_itruncate_data
| * | | | | xfs: cleanup xfs_file_aio_writeChristoph Hellwig2012-01-171-45/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With all the size field updates out of the way xfs_file_aio_write can be further simplified by pushing all iolock handling into xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using the generic generic_write_sync helper for synchronous writes. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: always return with the iolock held from xfs_file_aio_write_checksChristoph Hellwig2012-01-171-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While xfs_iunlock is fine with 0 lockflags the calling conventions are much cleaner if xfs_file_aio_write_checks never returns without the iolock held. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: remove the i_new_size field in struct xfs_inodeChristoph Hellwig2012-01-175-92/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we use the VFS i_size field throughout XFS there is no need for the i_new_size field any more given that the VFS i_size field gets updated in ->write_end before unlocking the page, and thus is always uptodate when writeback could see a page. Removing i_new_size also has the advantage that we will never have to trim back di_size during a failed buffered write, given that it never gets updated past i_size. Note that currently the generic direct I/O code only updates i_size after calling our end_io handler, which requires a small workaround to make sure di_size actually makes it to disk. I hope to fix this properly in the generic code. A downside is that we lose the support for parallel non-overlapping O_DIRECT appending writes that recently was added. I don't think keeping the complex and fragile i_new_size infrastructure for this is a good tradeoff - if we really care about parallel appending writers we should investigate turning the iolock into a range lock, which would also allow for parallel non-overlapping buffered writers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: remove the i_size field in struct xfs_inodeChristoph Hellwig2012-01-1712-82/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no fundamental need to keep an in-memory inode size copy in the XFS inode. We already have the on-disk value in the dinode, and the separate in-memory copy that we need for regular files only in the XFS inode. Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the VFS inode i_size field for regular files. Switch code that was directly accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where we are limited to regular files direct access of the VFS inode i_size field. This also allows dropping some fairly complicated code in the write path which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size that is getting updated inside ->write_end. Note that we do not bother resetting the VFS i_size when truncating a file that gets freed to zero as there is no point in doing so because the VFS inode is no longer in use at this point. Just relax the assert in xfs_ifree to only check the on-disk size instead. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: replace i_pin_wait with a bit waitqueueChristoph Hellwig2012-01-174-9/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace i_pin_wait, which is only used during synchronous inode flushing with a bit waitqueue. This trades off a much smaller inode against slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit) bytes in the XFS inode. Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: replace i_flock with a sleeping bitlockChristoph Hellwig2012-01-176-46/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We almost never block on i_flock, the exception is synchronous inode flushing. Instead of bloating the inode with a 16/24-byte completion that we abuse as a semaphore just implement it as a bitlock that uses a bit waitqueue for the rare sleeping path. This primarily is a tradeoff between a much smaller inode and a faster non-blocking path vs faster wakeups, and we are much better off with the former. A small downside is that we will lose lockdep checking for i_flock, but given that it's always taken inside the ilock that should be acceptable. Note that for example the inode writeback locking is implemented in a very similar way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: make i_flags an unsigned longChristoph Hellwig2012-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To be used for bit wakeup i_flags needs to be an unsigned long or we'll run into trouble on big endian systems. Because of the 1-byte i_update field right after it this actually causes a fairly large size increase on its own (4 or 8 bytes), but that increase will be more than offset by the next two patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: remove the if_ext_max field in struct xfs_iforkChristoph Hellwig2012-01-178-115/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We spent a lot of effort to maintain this field, but it always equals to the fork size divided by the constant size of an extent. The prime use of it is to assert that the two stay in sync. Just divide the fork size by the extent size in the few places that we actually use it and remove the overhead of maintaining it. Also introduce a few helpers to consolidate the places where we actually care about the value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: remove the unused dm_attrs structureChristoph Hellwig2012-01-131-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. and the just as dead bhv_desc forward declaration while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: cleanup xfs_iomap_eof_align_last_fsbChristoph Hellwig2012-01-131-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the nasty if, else if, elseif condition with more natural C flow that expressed the logic we want here better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | xfs: remove xfs_itruncate_dataChristoph Hellwig2012-01-137-142/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This wrapper isn't overly useful, not to say rather confusing. Around the call to xfs_itruncate_extents it does: - add tracing - add a few asserts in debug builds - conditionally update the inode size in two places - log the inode Both the tracing and the inode logging can be moved to xfs_itruncate_extents as they are useful for the attribute fork as well - in fact the attr code already does an equivalent xfs_trans_log_inode call just after calling xfs_itruncate_extents. The conditional size updates are a mess, and there was no reason to do them in two places anyway, as the first one was conditional on the inode having extents - but without extents we xfs_itruncate_extents would be a no-op and the placement wouldn't matter anyway. Instead move the size assignments and the asserts that make sense to the callers that want it. As a side effect of this clean up xfs_setattr_size by introducing variables for the old and new inode size, and moving the size updates into a common place. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* | | | | | Merge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2012-01-177-125/+110
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: take allocation of ->tree_root into open_ctree() btrfs: let ->s_fs_info point to fs_info, not root... btrfs: consolidate failure exits in btrfs_mount() a bit btrfs: make free_fs_info() call ->kill_sb() unconditional btrfs: merge free_fs_info() calls on fill_super failures btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super() btrfs: make open_ctree() return int btrfs: sanitizing ->fs_info, part 5 btrfs: sanitizing ->fs_info, part 4 btrfs: sanitizing ->fs_info, part 3 btrfs: sanitizing ->fs_info, part 2 btrfs: sanitizing ->fs_info, part 1 btrfs: fix a deadlock in btrfs_scan_one_device() btrfs: fix mount/umount race btrfs: get ->kill_sb() of its own btrfs: preparation to fixing mount/umount race
| * | | | | | btrfs: take allocation of ->tree_root into open_ctree()Al Viro2012-01-083-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | now that we don't need it for sget() anymore... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: let ->s_fs_info point to fs_info, not root...Al Viro2012-01-085-45/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the latter can be obtained from the former (by looking as ->tree_root) just as cheaply as we currently are doing the other way round. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: consolidate failure exits in btrfs_mount() a bitAl Viro2012-01-081-16/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: make free_fs_info() call ->kill_sb() unconditionalAl Viro2012-01-081-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and don't bother with it after btrfs_fill_super() failure - ->kill_sb() (unlike ->put_super()) will be called even if we have not got non-NULL ->s_root. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: merge free_fs_info() calls on fill_super failuresAl Viro2012-01-082-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... all the way up into btrfs_mount(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()Al Viro2012-01-081-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not (fortunately) modify ->s_fs_info of superblock on the fly in btrfs_fill_super(); apparent assignment is a no-op. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: make open_ctree() return intAl Viro2012-01-083-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It returns either ERR_PTR(-ve) or sb->s_fs_info. The latter can be found by caller just as well, TYVM, no need to return it. Just return -ve or 0... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: sanitizing ->fs_info, part 5Al Viro2012-01-081-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | close_ctree() uses a weird mix of accesses to root->fs_info and its value at the beginning of function stored in local variable. Since ->fs_info *never* changes, let's just use the local variable to avoid confusion. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: sanitizing ->fs_info, part 4Al Viro2012-01-083-20/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new helper: btrfs_alloc_root(fs_info); allocates btrfs_root and sets ->fs_info. All places allocating the suckers converted to it. At that point we *never* reassign ->fs_info of btrfs_root; it's set before anyone sees the address of newly allocated struct btrfs_root and never assigned anywhere else. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: sanitizing ->fs_info, part 3Al Viro2012-01-081-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | move assignments to ->fs_info in open_ctree() up, to the place just after the original allocations. Assignment for tree_root becomes a no-op - we'd obtained fs_info from tree_root->fs_info in the first place. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: sanitizing ->fs_info, part 2Al Viro2012-01-081-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lift assignment to callers of find_and_setup_root() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: sanitizing ->fs_info, part 1Al Viro2012-01-081-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | take assignment of ->fs_info to callers of __setup_root() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: fix a deadlock in btrfs_scan_one_device()Al Viro2012-01-081-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pathname resolution under a global mutex, taken on some paths in ->mount() is a Bad Idea(tm) - think what happens if said pathname resolution triggers automount of some btrfs instance and walks into attempt to grab the same mutex. Deadlock - we are waiting for daemon to finish walking the path, daemon is waiting for us to release the mutex... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: fix mount/umount raceAl Viro2012-01-081-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do *NOT* skip doomed superblocks in btrfs_test_super(); we want sget() to wait for their shutdown to complete. Since we don't mutilate ->s_fs_info in ->put_super() anymore (or free what it used to point to until the superblock is past being findable by sget()), we can just DTRT there and report a match. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: get ->kill_sb() of its ownAl Viro2012-01-081-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and move free_fs_info() to that, out of ->put_super(). Do NOT set ->s_fs_info to NULL in the latter; we need it for sget() to be able to see and wait for fs in the middle of umount if we get a mount/umount race. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | btrfs: preparation to fixing mount/umount raceAl Viro2012-01-082-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need fs_info and root to live until the moment when the victim superblock leaves the list, so we need to postpone free_fs_info() until after ->put_super(). The call is buried in close_ctree(), though, so we need to lift it into the callers (including btrfs_put_super()) first. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-01-1733-921/+6746
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits) Btrfs: use larger system chunks Btrfs: add a delalloc mutex to inodes for delalloc reservations Btrfs: space leak tracepoints Btrfs: protect orphan block rsv with spin_lock Btrfs: add allocator tracepoints Btrfs: don't call btrfs_throttle in file write Btrfs: release space on error in page_mkwrite Btrfs: fix btrfsck error 400 when truncating a compressed Btrfs: do not use btrfs_end_transaction_throttle everywhere Btrfs: add balance progress reporting Btrfs: allow for resuming restriper after it was paused Btrfs: allow for canceling restriper Btrfs: allow for pausing restriper Btrfs: add skip_balance mount option Btrfs: recover balance on mount Btrfs: save balance parameters to disk Btrfs: soft profile changing mode (aka soft convert) Btrfs: implement online profile changing Btrfs: do not reduce profile in do_chunk_alloc() Btrfs: virtual address space subset filter ... Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new mnt_drop_write_file() helper.
| * | | | | | | Btrfs: use larger system chunksChris Mason2012-01-162-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | system chunks by default are very small. This makes them slightly larger and also fixes the conditional checks to make sure we don't allocate a billion of them at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | | | | | Btrfs: add a delalloc mutex to inodes for delalloc reservationsJosef Bacik2012-01-165-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I was using i_mutex for this, but we're getting bogus lockdep warnings by doing that and theres no real way to get rid of those, so just stop using i_mutex to protect delalloc metadata reservations and use a delalloc mutex instead. This shouldn't be contended often at all, only if you are writing and mmap writing to the file at the same time. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| * | | | | | | Btrfs: space leak tracepointsJosef Bacik2012-01-164-20/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This in addition to a script in my btrfs-tracing tree will help track down space leaks when we're getting space left over in block groups on umount. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| * | | | | | | Btrfs: protect orphan block rsv with spin_lockJosef Bacik2012-01-161-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've been seeing warnings coming out of the orphan commit stuff forever from ceph. Turns out it's because we're racing with checking if the orphan block reserve is set, because we clear it outside of the spin_lock. So leave the normal fastpath checks where they are, but take the spin_lock and _recheck_ to make sure we haven't had an orphan block rsv added in the meantime. Then clear the root's orphan block rsv and release the lock. With this patch a user said the warnings went away and they usually showed up pretty soon after he started ceph. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| * | | | | | | Btrfs: add allocator tracepointsJosef Bacik2012-01-162-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I used these tracepoints when figuring out what the cluster stuff was doing, so add them to mainline in case we need to profile this stuff again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| * | | | | | | Btrfs: don't call btrfs_throttle in file writeJosef Bacik2012-01-161-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs_throttle will make us wait if there is a currently committing transaction until we can open new transactions, which is ridiculous since we don't actually start any transactions within the file write path anyway, so all this does is introduce big latencies if we have a sync/fsync heavy workload going on while somebody else is trying to do work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | | | | | Btrfs: release space on error in page_mkwriteJosef Bacik2012-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If updating the inode gave us an ENOSPC we were just returning in page_mkwrite, which is a problem since we make our reservation right before trying to update the inode, so fix the out label so that we actually free our reservation. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | | | | | Btrfs: fix btrfsck error 400 when truncating a compressedMiao Xie2012-01-161-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reproduce steps: # mkfs.btrfs /dev/sdb5 # mount /dev/sdb5 -o compress=lzo /mnt # dd if=/dev/zero of=/mnt/tmpfile bs=128K count=1 # sync # truncate -s 64K /mnt/tmpfile root 5 inode 257 errors 400 This is because of the wrong if condition, which is used to check if we should subtract the bytes of the dropped range from i_blocks/i_bytes of i-node or not. When we truncate a compressed extent, btrfs substracts the bytes of the whole extent, it's wrong. We should substract the real size that we truncate, no matter it is a compressed extent or not. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>