| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
The slow path in __fget_light() can use __fget() to avoid the
code duplication. Saves 232 bytes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
|
| |
Apart from FMODE_PATH check fget_light() and fget_raw_light() are
identical, shift the code into the new helper, __fget_light(fd, mask).
Saves 208 bytes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
|
| |
Apart from FMODE_PATH check fget() and fget_raw() are identical,
shift the code into the new simple helper, __fget(fd, mask). Saves
160 bytes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
put_files_struct() and close_files() do rcu_read_lock() to make
rcu_dereference_check_fdtable() happy.
This looks a bit ugly, files_fdtable() just reads the pointer,
we can simply use rcu_dereference_raw() to avoid the warning.
The patch also changes close_files() to return fdt, this avoids
another rcu_read_lock()/files_fdtable() in put_files_struct().
I think close_files() needs more cleanups:
- we do not need xchg() exactly because we are the last
user of this files_struct
- "if (file)" should be turned into WARN_ON(!file)
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rcu_my_thread_group_empty()
rcu_dereference_check_fdtable() looks very wrong,
1. rcu_my_thread_group_empty() was added by 844b9a8707f1 "vfs: fix
RCU-lockdep false positive due to /proc" but it doesn't really
fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
hit the same race with get_files_struct().
And otoh rcu_my_thread_group_empty() can suppress the correct
warning if the caller is the CLONE_FILES (without CLONE_THREAD)
task.
2. files->count == 1 check is not really right too. Even if this
files_struct is not shared it is not safe to access it lockless
unless the caller is the owner.
Otoh, this check is sub-optimal. files->count == 0 always means
it is safe to use it lockless even if files != current->files,
but put_files_struct() has to take rcu_read_lock(). See the next
patch.
This patch removes the buggy checks and turns fcheck_files() into
__fcheck_files() which uses rcu_dereference_raw(), the "unshared"
callers, fget_light() and fget_raw_light(), can use it to avoid
the warning from RCU-lockdep.
fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
plus __fcheck_files().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
| |
it's never called with NULL argument...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
| |
kill pointless method instances and don't bother with ->owner - it's
ignored for procfs files anyway, make use of remove_proc_subtree() for
removal, get rid of cell->proc_dir.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
| |
pass owner explicitly to __register_nls(), make register_nls() a macro passing
THIS_MODULE as the owner argument to __register_nls().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
| |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
| |
* don't assume that ->dest_count won't change between copy_from_user()
and memdup_user()
* use fdget instead of fget
* don't bother comparing superblocks when we'd already compared vfsmounts
* get rid of excessive goto
* use file_inode() instead of open-coding the sucker
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
|
| |
* pass on-disk superblock to qnx4_chkroot() explicitly
* don't leave stale (and unused) pointers in qnx4_super_block
* free stuff in ->kill_sb(); ->put_super() becomes empty and dies
* simplify failure exits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
| |
simplifies failure exits in ->mount()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
| |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
| |
failure exits are simpler that way
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
| |
... and return saner errors from ->mount(), while we are at it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
| |
don't bother open-coding it...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
| |
If ecryptfs_readlink_lower() fails, buf remains an uninitialized
pointer and passing it nd_set_link() won't do anything good.
Fixed by switching ecryptfs_readlink_lower() to saner API - make it
return buf or ERR_PTR(...) and update callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
"This is a set of 3 regression fixes.
This fixes /proc/mounts when using "ip netns add <netns>" to display
the actual mount point.
This fixes a regression in clone that broke lxc-attach.
This fixes a regression in the permission checks for mounting /proc
that made proc unmountable if binfmt_misc was in use. Oops.
My apologies for sending this pull request so late. Al Viro gave
interesting review comments about the d_path fix that I wanted to
address in detail before I sent this pull request. Unfortunately a
bad round of colds kept from addressing that in detail until today.
The executive summary of the review was:
Al: Is patching d_path really sufficient?
The prepend_path, d_path, d_absolute_path, and __d_path family of
functions is a really mess.
Me: Yes, patching d_path is really sufficient. Yes, the code is mess.
No it is not appropriate to rewrite all of d_path for a regression
that has existed for entirely too long already, when a two line
change will do"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Fix a regression in mounting proc
fork: Allow CLONE_PARENT after setns(CLONE_NEWPID)
vfs: In d_path don't call d_dname on a mount point
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Gao feng <gaofeng@cn.fujitsu.com> reported that commit
e51db73532955dc5eaba4235e62b74b460709d5b
userns: Better restrictions on when proc and sysfs can be mounted
caused a regression on mounting a new instance of proc in a mount
namespace created with user namespace privileges, when binfmt_misc
is mounted on /proc/sys/fs/binfmt_misc.
This is an unintended regression caused by the absolutely bogus empty
directory check in fs_fully_visible. The check fs_fully_visible replaced
didn't even bother to attempt to verify proc was fully visible and
hiding proc files with any kind of mount is rare. So for now fix
the userspace regression by allowing directory with nlink == 1
as /proc/sys/fs/binfmt_misc has.
I will have a better patch but it is not stable material, or
last minute kernel material. So it will have to wait.
Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Tested-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Aditya Kali (adityakali@google.com) wrote:
> Commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
> "proc: Fix the namespace inode permission checks." converted
> the namespace files into symlinks. The same commit changed
> the way namespace bind mounts appear in /proc/mounts:
> $ mount --bind /proc/self/ns/ipc /mnt/ipc
> Originally:
> $ cat /proc/mounts | grep ipc
> proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0
>
> After commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
> $ cat /proc/mounts | grep ipc
> proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
>
> This breaks userspace which expects the 2nd field in
> /proc/mounts to be a valid path.
The symlink /proc/<pid>/ns/{ipc,mnt,net,pid,user,uts} point to
dentries allocated with d_alloc_pseudo that we can mount, and
that have interesting names printed out with d_dname.
When these files are bind mounted /proc/mounts is not currently
displaying the mount point correctly because d_dname is called instead
of just displaying the path where the file is mounted.
Solve this by adding an explicit check to distinguish mounted pseudo
inodes and unmounted pseudo inodes. Unmounted pseudo inodes always
use mount of their filesstem as the mnt_root in their path making
these two cases easy to distinguish.
CC: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reported-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback fix from Wu Fengguang:
"Fix data corruption on NFS writeback.
It has been in linux-next for one month"
* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Fix data corruption on NFS
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit 4f8ad655dbc8 "writeback: Refactor writeback_single_inode()" added
a condition to skip clean inode. However this is wrong in WB_SYNC_ALL
mode because there we also want to wait for outstanding writeback on
possibly clean inode. This was causing occasional data corruption issues
on NFS because it uses sync_inode() to make sure all outstanding writes
are flushed to the server before truncating the inode and with
sync_inode() returning prematurely file was sometimes extended back
by an outstanding write after it was truncated.
So modify the test to also check for pages under writeback in
WB_SYNC_ALL mode.
CC: stable@vger.kernel.org # >= 3.5
Fixes: 4f8ad655dbc82cf05d2edc11e66b78a42d38bf93
Reported-and-tested-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean. It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.
The problem shows itself with the following kernel log message:
nilfs_sufile_do_cancel_free: segment 6533 must be clean
Usually a few hours later the file system gets corrupted:
NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)
The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.
This is what happens:
1. The cleaner starts the segment construction
2. nilfs_segctor_collect is called
3. sc_stage is on NILFS_ST_SUFILE and segments are freed
4. sc_stage is on NILFS_ST_DAT current segment is full
5. nilfs_segctor_extend_segments is called, which
allocates a new segment
6. The new segment is one of the segments freed in step 3
7. nilfs_sufile_cancel_freev is called and produces an error message
8. Loop around and the collection starts again
9. sc_stage is on NILFS_ST_SUFILE and segments are freed
including the newly allocated segment, which will contain active
data and can be allocated at a later time
10. A few hours later another segment construction allocates the
segment and causes file system corruption
This can be prevented by simply reordering the statements. If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull xfs bugfixes from Ben Myers:
"Here we have a bugfix for an off-by-one in the remote attribute
verifier that results in a forced shutdown which you can hit with v5
superblock by creating a 64k xattr, and a fix for a missing
destroy_work_on_stack() in the allocation worker.
It's a bit late, but they are both fairly straightforward"
* tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs:
xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
xfs: fix off-by-one error in xfs_attr3_rmt_verify
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().
Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 6f96b3063cdd473c68664a190524ed966ac0cd92)
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With CRC check is enabled, if trying to set an attributes value just
equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
attr write verification procedure failure, which would yield the back
trace like below:
<snip>
XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
<snip>
Call Trace:
[<ffffffff816f0042>] dump_stack+0x45/0x56
[<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
[<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
[<ffffffff81097230>] ? wake_up_state+0x20/0x20
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
[<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
[<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
[<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
[<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
[<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
[<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
[<ffffffff811df9b2>] generic_setxattr+0x62/0x80
[<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
[<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
[<ffffffff811e054e>] setxattr+0x12e/0x1c0
[<ffffffff811c6e82>] ? final_putname+0x22/0x50
[<ffffffff811c708b>] ? putname+0x2b/0x40
[<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
[<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
[<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
[<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f
Tests:
setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile
This patch fix it to check the remote EA size is greater than the
XATTR_SIZE_MAX rather than more than or equal to it, because it's
valid if the specified EA value size is equal to the limitation as
per VFS setxattr interface.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 85dd0707f0cad26d60f2dc574d17a5ab948d10f7)
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bugfix from Ted Ts'o:
"Fix a regression introduced in v3.13-rc6"
* tag 'ext4_for_linus_stable' of http://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix bigalloc regression
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Commit f5a44db5d2 introduced a regression on filesystems created with
the bigalloc feature (cluster size > blocksize). It causes xfstests
generic/006 and /013 to fail with an unexpected JBD2 failure and
transaction abort that leaves the test file system in a read only state.
Other xfstests run on bigalloc file systems are likely to fail as well.
The cause is the accidental use of a cluster mask where a cluster
offset was needed in ext4_ext_map_blocks().
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Merge patches from Andrew Morton:
"Ten fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL
sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
drivers/dma/ioat/dma.c: check DMA mapping error in ioat_dma_self_test()
mm/memory-failure.c: transfer page count from head page to tail page after split thp
MAINTAINERS: set up proper record for Xilinx Zynq
mm: remove bogus warning in copy_huge_pmd()
memcg: fix memcg_size() calculation
mm: fix use-after-free in sys_remap_file_pages
mm: munlock: fix deadlock in __munlock_pagevec()
mm: munlock: fix a bug where THP tail page is encountered
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
epoll_ctl(b, EPOLL_CTL_DEL, a, x). The deadlock was introduced with
commmit 67347fe4e632 ("epoll: do not take global 'epmutex' for simple
topologies").
The acquistion of the ep->mtx for the destination 'ep' was added such
that a concurrent EPOLL_CTL_ADD operation would see the correct state of
the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')
However, by simply not acquiring the lock, we do not serialize behind
the ep->mtx from the add path, and thus may perform a full path check
when if we had waited a little longer it may not have been necessary.
However, this is a transient state, and performing the full loop
checking in this case is not harmful.
The important point is that we wouldn't miss doing the full loop
checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
its operating upon. The reason we don't need to do lock ordering in the
add path, is that we are already are holding the global 'epmutex'
whenever we do the double lock. Further, the original posting of this
patch, which was tested for the intended performance gains, did not
perform this additional locking.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \ \ \
| |/ / / / /
|/| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull GFS2 fixes from Steven Whitehouse:
"Here is a set of small fixes for GFS2. There is a fix to drop
s_umount which is copied in from the core vfs, two patches relate to a
hard to hit "use after free" and memory leak. Two patches related to
using DIO and buffered I/O on the same file to ensure correct
operation in relation to glock state changes. The final patch adds an
RCU read lock to ensure correct locking on an error path"
* tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Fix unsafe dereference in dump_holder()
GFS2: Wait for async DIO in glock state changes
GFS2: Fix incorrect invalidation for DIO/buffered I/O
GFS2: Fix slab memory leak in gfs2_bufdata
GFS2: Fix use-after-free race when calling gfs2_remove_from_ail
GFS2: don't hold s_umount over blkdev_put
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that
RCU read lock is held when using task_struct returned from pid_task().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We need to wait for any outstanding DIO to complete in a couple
of situations. Firstly, in case we are changing out of deferred
mode (in inode_go_sync) where GLF_DIRTY will not be set. That
call could be prefixed with a test for gl_state == LM_ST_DEFERRED
but it doesn't seem worth it bearing in mind that the test for
outstanding DIO is very quick anyway, in the usual case that there
is none.
The second case is in inode_go_lock which will catch the cases
where we have a cached EX lock, but where we grant deferred locks
against it so that there is no glock state transistion. We only
need to wait if the state is not deferred, since DIO is valid
anyway in that state.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In patch 209806aba9d540dde3db0a5ce72307f85f33468f we allowed
local deferred locks to be granted against a cached exclusive
lock. That opened up a corner case which this patch now
fixes.
The solution to the problem is to check whether we have cached
pages each time we do direct I/O and if so to unmap, flush
and invalidate those pages. Since the glock state machine
normally does that for us, mostly the code will be a no-op.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This patch fixes a slab memory leak that sometimes can occur
for files with a very short lifespan. The problem occurs when
a dinode is deleted before it has gotten to the journal properly.
In the leak scenario, the bd object is pinned for journal
committment (queued to the metadata buffers queue: sd_log_le_buf)
but is subsequently unpinned and dequeued before it finds its way
to the ail or the revoke queue. In this rare circumstance, the bd
object needs to be freed from slab memory, or it is forgotten.
We have to be very careful how we do it, though, because
multiple processes can call gfs2_remove_from_journal. In order to
avoid double-frees, only the process that does the unpinning is
allowed to free the bd.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Function gfs2_remove_from_ail drops the reference on the bh via
brelse. This patch fixes a race condition whereby bh is deferenced
after the brelse when setting bd->bd_blkno = bh->b_blocknr;
Under certain rare circumstances, bh might be gone or reused,
and bd->bd_blkno is set to whatever that memory happens to be,
which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails
the test "bd->bd_blkno >= blkno" which causes it to never be freed.
The end result is that the bd is never freed from the bufdata cache,
which results in this error:
slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |_|_|/
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is a GFS2 version of Tejun's patch:
4f331f01b9c43bf001d3ffee578a97a1e0633eac
vfs: don't hold s_umount over close_bdev_exclusive() call
In this case its blkdev_put itself that is the issue and this
patch uses the same solution of dropping and retaking s_umount.
Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Set FILE_CREATED on O_CREAT|O_EXCL.
cifs code didn't change during commit 116cc0225381415b96551f725455d067f63a76a0
Kernel bugzilla 66251
Signed-off-by: Shirish Pargaonkar <spargaonkar@suse.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When we obtain tcon from cifs_sb, we use cifs_sb_tlink() to first obtain
tlink which also grabs a reference to it. We do not drop this reference
to tlink once we are done with the call.
The patch fixes this issue by instead passing tcon as a parameter and
avoids having to obtain a reference to the tlink. A lookup for the tcon
is already made in the calling functions and this way we avoid having to
re-run the lookup. This is also consistent with the argument list for
other similar calls for M-F symlinks.
We should also return an ENOSYS when we do not find a protocol specific
function to lookup the MF Symlink data.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Gregor Beck <gbeck@sernet.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
|
|\ \ \ \ \
| | |/ / /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A collection of bug fixes destined for stable and some printk cleanups
and a patch so that instead of BUG'ing we use the ext4_error()
framework to mark the file system is corrupted"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: add explicit casts when masking cluster sizes
ext4: fix deadlock when writing in ENOSPC conditions
jbd2: rename obsoleted msg JBD->JBD2
jbd2: revise KERN_EMERG error messages
jbd2: don't BUG but return ENOSPC if a handle runs out of space
ext4: Do not reserve clusters when fs doesn't support extents
ext4: fix del_timer() misuse for ->s_err_report
ext4: check for overlapping extents in ext4_valid_extent_entries()
ext4: fix use-after-free in ext4_mb_new_blocks
ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The missing casts can cause the high 64-bits of the physical blocks to
be lost. Set up new macros which allows us to make sure the right
thing happen, even if at some point we end up supporting larger
logical block numbers.
Thanks to the Emese Revfy and the PaX security team for reporting this
issue.
Reported-by: PaX Team <pageexec@freemail.hu>
Reported-by: Emese Revfy <re.emese@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Akira-san has been reporting rare deadlocks of his machine when running
xfstests test 269 on ext4 filesystem. The problem turned out to be in
ext4_da_reserve_metadata() and ext4_da_reserve_space() which called
ext4_should_retry_alloc() while holding i_data_sem. Since
ext4_should_retry_alloc() can force a transaction commit, this is a
lock ordering violation and leads to deadlocks.
Fix the problem by just removing the retry loops. These functions should
just report ENOSPC to the caller (e.g. ext4_da_write_begin()) and that
function must take care of retrying after dropping all necessary locks.
Reported-and-tested-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Rename performed via: perl -pi -e 's/JBD:/JBD2:/g' fs/jbd2/*.c
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Some of KERN_EMERG printk messages do not really deserve this log
level and the one in log_wait_commit() is even rather useless (the
journal has been previously aborted and *that* is where we should have
been complaining). So make some messages just KERN_ERR and remove the
useless message.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If a handle runs out of space, we currently stop the kernel with a BUG
in jbd2_journal_dirty_metadata(). This makes it hard to figure out
what might be going on. So return an error of ENOSPC, so we can let
the file system layer figure out what is going on, to make it more
likely we can get useful debugging information). This should make it
easier to debug problems such as the one which was reported by:
https://bugzilla.kernel.org/show_bug.cgi?id=44731
The only two callers of this function are ext4_handle_dirty_metadata()
and ocfs2_journal_dirty(). The ocfs2 function will trigger a
BUG_ON(), which means there will be no change in behavior. The ext4
function will call ext4_error_inode() which will print the useful
debugging information and then handle the situation using ext4's error
handling mechanisms (i.e., which might mean halting the kernel or
remounting the file system read-only).
Also, since both file systems already call WARN_ON(), drop the WARN_ON
from jbd2_journal_dirty_metadata() to avoid two stack traces from
being displayed.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker <jlbec@evilplan.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When the filesystem doesn't support extents (like in ext2/3
compatibility modes), there is no need to reserve any clusters. Space
estimates for writing are exact, hole punching doesn't need new
metadata, and there are no unwritten extents to convert.
This fixes a problem when filesystem still having some free space when
accessed with a native ext2/3 driver suddently reports ENOSPC when
accessed with ext4 driver.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
That thing should be del_timer_sync(); consider what happens
if ext4_put_super() call of del_timer() happens to come just as it's
getting run on another CPU. Since that timer reschedules itself
to run next day, you are pretty much guaranteed that you'll end up
with kfree'd scheduled timer, with usual fun consequences. AFAICS,
that's -stable fodder all way back to 2010... [the second del_timer_sync()
is almost certainly not needed, but it doesn't hurt either]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
A corrupted ext4 may have out of order leaf extents, i.e.
extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
^^^^ overlap with previous extent
Reading such extent could hit BUG_ON() in ext4_es_cache_extent().
BUG_ON(end < lblk);
The problem is that __read_extent_tree_block() tries to cache holes as
well but assumes 'lblk' is greater than 'prev' and passes underflowed
length to ext4_es_cache_extent(). Fix it by checking for overlapping
extents in ext4_valid_extent_entries().
I hit this when fuzz testing ext4, and am able to reproduce it by
modifying the on-disk extent by hand.
Also add the check for (ee_block + len - 1) in ext4_valid_extent() to
make sure the value is not overflow.
Ran xfstests on patched ext4 and no regression.
Cc: Lukáš Czerner <lczerner@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count.
While ext4_mb_use_preallocated checks pa->pa_deleted first and then
increments pa->count later, ext4_mb_put_pa decrements pa->pa_count
before holding pa->pa_lock and then sets pa->pa_deleted.
* Free sequence
ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count
ext4_mb_put_pa (2): lock pa->pa_lock
ext4_mb_put_pa (3): check pa->pa_deleted
ext4_mb_put_pa (4): set pa->pa_deleted=1
ext4_mb_put_pa (5): unlock pa->pa_lock
ext4_mb_put_pa (6): remove pa from a list
ext4_mb_pa_callback: free pa
* Use sequence
ext4_mb_use_preallocated (1): iterate over preallocation
ext4_mb_use_preallocated (2): lock pa->pa_lock
ext4_mb_use_preallocated (3): check pa->pa_deleted
ext4_mb_use_preallocated (4): increase pa->pa_count
ext4_mb_use_preallocated (5): unlock pa->pa_lock
ext4_mb_release_context: access pa
* Use-after-free sequence
[initial status] <pa->pa_deleted = 0, pa_count = 1>
ext4_mb_use_preallocated (1): iterate over preallocation
ext4_mb_use_preallocated (2): lock pa->pa_lock
ext4_mb_use_preallocated (3): check pa->pa_deleted
ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count
[pa_count decremented] <pa->pa_deleted = 0, pa_count = 0>
ext4_mb_use_preallocated (4): increase pa->pa_count
[pa_count incremented] <pa->pa_deleted = 0, pa_count = 1>
ext4_mb_use_preallocated (5): unlock pa->pa_lock
ext4_mb_put_pa (2): lock pa->pa_lock
ext4_mb_put_pa (3): check pa->pa_deleted
ext4_mb_put_pa (4): set pa->pa_deleted=1
[race condition!] <pa->pa_deleted = 1, pa_count = 1>
ext4_mb_put_pa (5): unlock pa->pa_lock
ext4_mb_put_pa (6): remove pa from a list
ext4_mb_pa_callback: free pa
ext4_mb_release_context: access pa
AddressSanitizer has detected use-after-free in ext4_mb_new_blocks
Bug report: http://goo.gl/rG1On3
Signed-off-by: Junho Ryu <jayr@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
|