| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit ddf08f4b90a413892bbb9bb2e8a57aed991cd47d upstream.
For kmap_atomic() we call kunmap_atomic() on the returned pointer.
That's different from kmap() and kunmap() and so it's easy to get them
backwards.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 69b62d01ec44fe0d505d89917392347732135a4d upstream.
Prior to 2.6.32, setting /proc/sys/vm/dirty_writeback_centisecs disabled
periodic dirty writeback from kupdate. This got broken and now causes
excessive sys CPU usage if set to zero, as we'll keep beating on
schedule().
Reported-by: Justin Maggard <jmaggard10@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 15ddb4aec54422ead137b03ea4e9b3f5db3f7cc2 upstream.
The /proc/fs/nfsd/versions file calls nfsd_vers() to check whether
the particular nfsd version is present/available. The problem is
that once I turn off e.g. NFSD-V4 this call returns -1 which is
true from the callers POV which is wrong.
The proposal is to report false in that case.
The bug has existed since 6658d3a7bbfd1768 "[PATCH] knfsd: remove
nfsd_versbits as intermediate storage for desired versions".
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 973bec34bfc1bc2465646181653d67f767d418c8 upstream.
As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
And nilfs does not set s_bdi anywhere. I noticed this problem by the
warning introduced by the recent commit 5129a469 ("Catch filesystem
lacking s_bdi").
WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
Hardware name: PowerEdge 2850
Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
Call Trace:
[<c1028422>] warn_slowpath_common+0x60/0x90
[<c102845f>] warn_slowpath_null+0xd/0x10
[<c1095936>] vfs_kern_mount+0xc5/0x14e
[<c1095a03>] do_kern_mount+0x32/0xbd
[<c10a811e>] do_mount+0x671/0x6d0
[<c1073794>] ? __get_free_pages+0x1f/0x21
[<c10a684f>] ? copy_mount_options+0x2b/0xe2
[<c107b634>] ? strndup_user+0x48/0x67
[<c10a81de>] sys_mount+0x61/0x8f
[<c100280c>] sysenter_do_call+0x12/0x32
This ensures to set s_bdi for nilfs and fixes the sync silent failure.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 7ac512aa8237c43331ffaf77a4fd8b8d684819ba upstream.
cachefiles_determine_cache_security() is expected to return with a
security override in place. However, if set_create_files_as() fails, we
fail to do this. In this case, we should just reinstate the security
override that was set by the caller.
Furthermore, if set_create_files_as() fails, we should dispose of the
new credentials we were in the process of creating.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 34441427aab4bdb3069a4ffcda69a99357abcb2e upstream.
Originally, commit d899bf7b ("procfs: provide stack information for
threads") attempted to introduce a new feature for showing where the
threadstack was located and how many pages are being utilized by the
stack.
Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
applied to fix the NO_MMU case.
Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
64-bit") was applied to fix a bug in ia32 executables being loaded.
Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel
threads") was applied to fix a bug which had kernel threads printing a
userland stack address.
Commit 1306d603f ('proc: partially revert "procfs: provide stack
information for threads"') was then applied to revert the stack pages
being used to solve a significant performance regression.
This patch nearly undoes the effect of all these patches.
The reason for reverting these is it provides an unusable value in
field 28. For x86_64, a fork will result in the task->stack_start
value being updated to the current user top of stack and not the stack
start address. This unpredictability of the stack_start value makes
it worthless. That includes the intended use of showing how much stack
space a thread has.
Other architectures will get different values. As an example, ia64
gets 0. The do_fork() and copy_process() functions appear to treat the
stack_start and stack_size parameters as architecture specific.
I only partially reverted c44972f1 ("procfs: disable per-task stack usage
on NOMMU") . If I had completely reverted it, I would have had to change
mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
configured. Since I could not test the builds without significant effort,
I decided to not change mm/Makefile.
I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
information for threads on 64-bit") . I left the KSTK_ESP() change in
place as that seemed worthwhile.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 1306d603fcf1f6682f8575d1ff23631a24184b21 upstream.
Commit d899bf7b (procfs: provide stack information for threads) introduced
to show stack information in /proc/{pid}/status. But it cause large
performance regression. Unfortunately /proc/{pid}/status is used ps
command too and ps is one of most important component. Because both to
take mmap_sem and page table walk are heavily operation.
If many process run, the ps performance is,
[before d899bf7b]
% perf stat ps >/dev/null
Performance counter stats for 'ps':
4090.435806 task-clock-msecs # 0.032 CPUs
229 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
234 page-faults # 0.000 M/sec
8587565207 cycles # 2099.425 M/sec
9866662403 instructions # 1.149 IPC
3789415411 cache-references # 926.409 M/sec
30419509 cache-misses # 7.437 M/sec
128.859521955 seconds time elapsed
[after d899bf7b]
% perf stat ps > /dev/null
Performance counter stats for 'ps':
4305.081146 task-clock-msecs # 0.028 CPUs
480 context-switches # 0.000 M/sec
2 CPU-migrations # 0.000 M/sec
237 page-faults # 0.000 M/sec
9021211334 cycles # 2095.480 M/sec
10605887536 instructions # 1.176 IPC
3612650999 cache-references # 839.160 M/sec
23917502 cache-misses # 5.556 M/sec
152.277819582 seconds time elapsed
Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps
provide almost same information. we can use it.
Commit d899bf7b introduced two features:
1) Add the annotattion of [thread stack: xxxx] mark to
/proc/{pid}/task/{tid}/maps.
2) Add StackUsage field to /proc/{pid}/status.
I only revert (2), because I haven't seen (1) cause regression.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
| |
commit 5dc6416414fb3ec6e2825fd4d20c8bf1d7fe0395 upstream.
The existing code would have allowed you to clone a file that was
only open for writing
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit b3b38d842fa367d862b83e7670af4e0fd6a80fc0 upstream.
inotify_new_group() receives a get_uid-ed user_struct and saves the
reference on group->inotify_data.user. The problem is that free_uid() is
never called on it.
Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify
using fsnotify) after 2.6.30.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit e08733446e72b983fed850fc5d8bd21b386feb29 upstream.
There is a race in the inotify add/rm watch code. A task can find and
remove a mark which doesn't have all of it's references. This can
result in a use after free/double free situation.
Task A Task B
------------ -----------
inotify_new_watch()
allocate a mark (refcnt == 1)
add it to the idr
inotify_rm_watch()
inotify_remove_from_idr()
fsnotify_put_mark()
refcnt hits 0, free
take reference because we are on idr
[at this point it is a use after free]
[time goes on]
refcnt may hit 0 again, double free
The fix is to take the reference BEFORE the object can be found in the
idr.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3d69438031b00c601c991ab447cafb7d5c3c59a6 upstream.
When we made serverino the default, we trusted that the field sent by the
server in the "uniqueid" field was actually unique. It turns out that it
isn't reliably so.
Samba, in particular, will just put the st_ino in the uniqueid field when
unix extensions are enabled. When a share spans multiple filesystems, it's
quite possible that there will be collisions. This is a server bug, but
when the inodes in question are a directory (as is often the case) and
there is a collision with the root inode of the mount, the result is a
kernel panic on umount.
Fix this by checking explicitly for directory inodes with the same
uniqueid. If that is the case, then we can assume that using server inode
numbers will be a problem and that they should be disabled.
Fixes Samba bugzilla 7407
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9bf729c0af67897ea8498ce17c29b0683f7f2028 upstream
On low memory boxes or those with highmem, kernel can OOM before the
background reclaims inodes via xfssyncd. Add a shrinker to run inode
reclaim so that it inode reclaim is expedited when memory is low.
This is more complex than it needs to be because the VM folk don't
want a context added to the shrinker infrastructure. Hence we need
to add a global list of XFS mount structures so the shrinker can
traverse them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 2b0b39517d1af5294128dbc2fd7ed39c8effa540 upstream.
Resizing the filesystem would result in an diAllocExt error in some
instances because changes in bmp->db_agsize would not get noticed if
goto extendBmap was called.
Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit aca92ff6f57c000d1b4523e383c8bd6b8269b8b1 upstream.
ext4_fiemap() rounds the length of the requested range down to
blocksize, which is is not the true number of blocks that cover the
requested region. This problem is especially impressive if the user
requests only the first byte of a file: not a single extent will be
reported.
We fix this by calculating the last block of the region and then
subtract to find the number of blocks in the extents.
Signed-off-by: Leonard Michlmayr <leonard.michlmayr@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 356e76b855bdbfd8d1c5e75bcf0c6bf0dfe83496 upstream.
NFSv4 mounts ignore the rsize and wsize mount options, and always use
the default transfer size for both. This seems to be because all
NFSv4 mounts are now cloned, and the cloning logic doesn't copy the
rsize and wsize settings from the parent nfs_server.
I tested Fedora's 2.6.32.11-99 and it seems to have this problem as
well, so I'm guessing that .33, .32, and perhaps older kernels have
this issue as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d9e80b7de91db05c1c4d2e5ebbfd70b3b3ba0e0f upstream.
If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.
Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit a36d515c7a2dfacebcf41729f6812dbc424ebcf0 upstream.
When asked for a partial read of the LVB in a dlmfs file, we can
accidentally calculate a negative count.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit a42ab8e1a37257da37e0f018e707bf365ac24531 upstream.
Online resize writes out the new superblock and its backups directly.
The metaecc data wasn't being recomputed. Let's do that directly.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 0350cb078f5035716ebdad4ad4709d02fe466a8a upstream.
If "handle" is non null at the end of the function then we assume it's a
valid pointer and pass it to ocfs2_commit_trans();
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit c21a534e2f24968cf74976a4e721ac194db30ded upstream.
In reflink we update the id info on the disk but forgot to update
the corresponding information in the VFS inode. Update them
accordingly when we want to preserve the attributes.
Reported-by: Jeff Liu <jeff.liu@oracle.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3835541dd481091c4dbf5ef83c08aed12e50fd61 upstream.
Correct the file_operations struct in fdinfo entry of tid_base_stuff[].
Presently /proc/*/task/*/fdinfo contains symlinks to opened files like
/proc/*/fd/.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 2bc3c1179c781b359d4f2f3439cb3df72afc17fc upstream.
When read_buf is called to move over to the next page in the pagelist
of an NFSv4 request, it sets argp->end to essentially a random
number, certainly not an address within the page which argp->p now
points to. So subsequent calls to READ_BUF will think there is much
more than a page of spare space (the cast to u32 ensures an unsigned
comparison) so we can expect to fall off the end of the second
page.
We never encountered thsi in testing because typically the only
operations which use more than two pages are write-like operations,
which have their own decoding logic. Something like a getattr after a
write may cross a page boundary, but it would be very unusual for it to
cross another boundary after that.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit fb2162df74bb19552db3d988fd11c787cf5fad56 upstream.
Commit 48b32a3553a54740d236b79a90f20147a25875e3 ("reiserfs: use generic
xattr handlers") introduced a problem that causes corruption when extended
attributes are replaced with a smaller value.
The issue is that the reiserfs_setattr to shrink the xattr file was moved
from before the write to after the write.
The root issue has always been in the reiserfs xattr code, but was papered
over by the fact that in the shrink case, the file would just be expanded
again while the xattr was written.
The end result is that the last 8 bytes of xattr data are lost.
This patch fixes it to use new_size.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Cc: Jethro Beekman <kernel@jbeekman.nl>
Cc: Greg Surbey <gregsurbey@hotmail.com>
Cc: Marco Gatti <marco.gatti@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit cac36f707119b792b2396aed371d6b5cdc194890 upstream.
Commit 677c9b2e393a0cd203bd54e9c18b012b2c73305a ("reiserfs: remove
privroot hiding in lookup") removed the magic from the lookup code to hide
the .reiserfs_priv directory since it was getting loaded at mount-time
instead. The intent was that the entry would be hidden from the user via
a poisoned d_compare, but this was faulty.
This introduced a security issue where unprivileged users could access and
modify extended attributes or ACLs belonging to other users, including
root.
This patch resolves the issue by properly hiding .reiserfs_priv. This was
the intent of the xattr poisoning code, but it appears to have never
worked as expected. This is fixed by using d_revalidate instead of
d_compare.
This patch makes -oexpose_privroot a no-op. I'm fine leaving it this way.
The effort involved in working out the corner cases wrt permissions and
caching outweigh the benefit of the feature.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Reported-by: Matt McCutchen <matt@mattmccutchen.net>
Tested-by: Matt McCutchen <matt@mattmccutchen.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit a1de02dccf906faba2ee2d99cac56799bda3b96a upstream.
The "offset" member in ext4_io_end holds bytes, not blocks, so
ext4_lblk_t is wrong - and too small (u32).
This caused the async i/o writes to sparse files beyond 4GB to fail
when they wrapped around to 0.
Also fix up the type of arguments to ext4_convert_unwritten_extents(),
it gets ssize_t from ext4_end_aio_dio_nolock() and
ext4_ext_direct_IO().
Reported-by: Giel de Nijs <giel@vectorwise.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit c8afb44682fcef6273e8b8eb19fab13ddd05b386 upstream.
Creating many small files in rapid succession on a small
filesystem can lead to spurious ENOSPC; on a 104MB filesystem:
for i in `seq 1 22500`; do
echo -n > $SCRATCH_MNT/$i
echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
done
leads to ENOSPC even though after a sync, 40% of the fs is free
again.
This is because we reserve worst-case metadata for delalloc writes,
and when data is allocated that worst-case reservation is not
usually needed.
When freespace is low, kicking off an async writeback will start
converting that worst-case space usage into something more realistic,
almost always freeing up space to continue.
This resolves the testcase for me, and survives all 4 generic
ENOSPC tests in xfstests.
We'll still need a hard synchronous sync to squeeze out the last bit,
but this fixes things up to a large degree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 17bd55d037a02b04d9119511cfd1a4b985d20f63 upstream.
ext4, at least, would like to start pushing on writeback if it starts
to get close to ENOSPC when reserving worst-case blocks for delalloc
writes. Writing out delalloc data will convert those worst-case
predictions into usually smaller actual usage, freeing up space
before we hit ENOSPC based on this speculation.
Thanks to Jens for the suggestion for the helper function,
& the naming help.
I've made the helper return status on whether writeback was
started even though I don't plan to use it in the ext4 patch;
it seems like it would be potentially useful to test this
in some cases.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit cfce08c6bdfb20ade979284e55001ca1f100ed51 upstream.
If the lower file system driver has extended attributes disabled,
ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP.
This breaks execution of programs in the ecryptfs mount, since the
kernel expects the latter error when checking for security
capabilities in xattrs.
Signed-off-by: Christian Pulvermacher <pulvermacher@gmx.de>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3a60a1686f0d51c99bd0df8ac93050fb6dfce647 upstream.
Create a getattr handler for eCryptfs symlinks that is capable of
reading the lower target and decrypting its path. Prior to this patch,
a stat's st_size field would represent the strlen of the encrypted path,
while readlink() would return the strlen of the decrypted path. This
could lead to confusion in some userspace applications, since the two
values should be equal.
https://bugs.launchpad.net/bugs/524919
Reported-by: Loïc Minier <loic.minier@canonical.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 133b8f9d632cc23715c6d72d1c5ac449e054a12a upstream.
Since tmpfs has no persistent storage, it pins all its dentries in memory
so they have d_count=1 when other file systems would have d_count=0.
->lookup is only used to create new dentries. If the caller doesn't
instantiate it, it's freed immediately at dput(). ->readdir reads
directly from the dcache and depends on the dentries being hashed.
When an ecryptfs mount is mounted, it associates the lower file and dentry
with the ecryptfs files as they're accessed. When it's umounted and
destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in
2008 and several months later commit caeeeecf removed the d_delete. I
believe the d_drop() needs to be removed as well.
The d_drop effectively hides any file that has been accessed via ecryptfs
from the underlying tmpfs since it depends on it being hashed for it to
be accessible. I've removed the d_drop on my development node and see no
ill effects with basic testing on both tmpfs and persistent storage.
As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
BUGs on umount. This is due to the dentries being unhashed.
tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
the reference pinning the dentry. It skips unhashed and negative dentries,
but shrink_dcache_for_umount_subtree doesn't. Since those dentries
still have an elevated d_count, we get a BUG().
This patch removes the d_drop call and fixes both issues.
This issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=567887
Reported-by: Árpád Bíró <biroa@demasz.hu>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit f78233dd44a110c574fe760ad6f9c1e8741a0d00 upstream.
While investigating a bug, I came across a possible bug in v9fs. The
problem is similar to the one reported for NFS by ASANO Masahiro in
http://lkml.org/lkml/2005/12/21/334.
v9fs_file_lock() will skip locks on file which has mode set to 02666.
This is a problem in cases where the mode of the file is changed after
a process has obtained a lock on the file. Such a lock will be skipped
during unlock and the machine will end up with a BUG in
locks_remove_flock().
v9fs_file_lock() should skip the check for mandatory locks when
unlocking a file.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 78c37eb0d5e6a9727b12ea0f1821795ffaa66cfe upstream.
In ocfs2_validate_gd_parent, we check bg_chain against the
cl_next_free_rec of the dinode. Actually in resize, we have
the chance of bg_chain == cl_next_free_rec. So add some
additional condition check for it.
I also rename paramter "clean_error" to "resize", since the
old one is not clearly enough to indicate that we should only
meet with this case in resize.
btw, the correpsonding bug is
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit fcefd25ac89239cb57fa198f125a79ff85468c75 upstream.
ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory
inode, but never setting it on the disk copy. Thus, acls were some times not
getting propagated between nodes. This patch fixes the issue by adding a
helper function ocfs2_acl_set_mode() which does this the right way.
ocfs2_set_acl() and ocfs2_init_acl() are then updated to call
ocfs2_acl_set_mode().
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 08261673cb6dc638c39f44d69b76fffb57b92a8b upstream.
dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and
atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a
change done by an atomic operation can be overwritten by a change done by a
non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk.
Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
| |
commit 462d60577a997aa87c935ae4521bd303733a9f2b upstream.
RFC says we need to follow the chain of mounts if there's more
than one stacked on that point.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 0df5dd4aae211edeeeb84f7f84f6d093406d7c22 upstream.
Arnaud Giersch reports that NFSv4 locking is broken when we hold a
delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4:
Don't allow posix locking against servers that don't support it).
According to Arnaud, the lock succeeds the first time he opens the file
(since we cannot do a delegated open) but then fails after we start using
delegated opens.
The following patch fixes it by ensuring that locking behaviour is
governed by a per-filesystem capability flag that is initially set, but
gets cleared if the server ever returns an OPEN without the
NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.
Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
| |
commit 80e60639f1b7c121a7fea53920c5a4b94009361a upstream.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit a24e2d7d8f512340991ef0a59cb5d08d491b8e98 upstream.
By doing this we always overwrite nbytes value that is being passed on to
CIFSSMBWrite() and need not rely on the callers to initialize. CIFSSMBWrite2 is
doing this already.
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 6513a81e9325d712f1bfb9a1d7b750134e49ff18 upstream.
While chasing a bug report involving a OS/2 server, I noticed the server sets
pSMBr->CountHigh to a incorrect value even in case of normal writes. This
results in 'nbytes' being computed wrongly and triggers a kernel BUG at
mm/filemap.c.
void iov_iter_advance(struct iov_iter *i, size_t bytes)
{
BUG_ON(i->count < bytes); <--- BUG here
Why the server is setting 'CountHigh' is not clear but only does so after
writing 64k bytes. Though this looks like the server bug, the client side
crash may not be acceptable.
The workaround is to mask off high 16 bits if the number of bytes written as
returned by the server is greater than the bytes requested by the client as
suggested by Jeff Layton.
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d965736b8cb42ae51ba9c3f13488035a98d025c6 upstream.
ext3_xattr_set_handle() was zeroing out an inode outside
of journaling constraints; this is one of the accesses that
was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
Although ext3 doesn't have the crc issue, modifications
out of journal control are a Bad Thing.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit b918397542388de75bd86c32fbfa820e5d629fa9 upstream.
commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext3_statfs()
to update the on-disk superblock counters, but modified this buffer
directly without any journaling of the change. This is one of the
accesses that was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
The modifications were originally to keep the sb "more" in sync,
so that a readonly fsck of the device didn't flag this as an
error (as often), but apparently e2fsprogs deals with this differently
now, anyway.
Based on Ted's patch for ext4, which was in turn based on my
work on that bug and another preliminary patch...
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 55ab3a1ff843e3f0e24d2da44e71bffa5d853010 upstream.
Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke
the raw driver.
We now call through generic_file_aio_write -> generic_write_sync ->
vfs_fsync_range. vfs_fsync_range has:
if (!fop || !fop->fsync) {
ret = -EINVAL;
goto out;
}
But drivers/char/raw.c doesn't set an fsync method.
We have two options: fix it or remove the raw driver completely. I'm
happy to do either, the fact this has been broken for so long suggests it
is rarely used.
The patch below adds an fsync method to the raw driver. My knowledge of
the block layer is pretty sketchy so this could do with a once over.
If we instead decide to remove the raw driver, this patch might still be
useful as a backport to 2.6.33 and 2.6.32.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit f1f724e4b523d444c5a598d74505aefa3d6844d2 upstream
The radix-tree code requires it's users to serialize tag updates
against other updates to the tree. While XFS protects tag updates
against each other it does not serialize them against updates of the
tree contents, which can lead to tag corruption. Fix the inode
cache to always take pag_ici_lock in exclusive mode when updating
radix tree tags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 77d7a0c2eeb285c9069e15396703d0cb9690ac50 upstream
The introduction of barriers to loop devices has created a new IO
order completion dependency that XFS does not handle. The loop
device implements barriers using fsync and so turns a log IO in the
XFS filesystem on the loop device into a data IO in the backing
filesystem. That is, the completion of log IOs in the loop
filesystem are now dependent on completion of data IO in the backing
filesystem.
This can cause deadlocks when a flush daemon issues a log force with
an inode locked because the IO completion of IO on the inode is
blocked by the inode lock. This in turn prevents further data IO
completion from occuring on all XFS filesystems on that CPU (due to
the shared nature of the completion queues). This then prevents the
log IO from completing because the log is waiting for data IO
completion as well.
The fix for this new completion order dependency issue is to make
the IO completion inode locking non-blocking. If the inode lock
can't be grabbed, simply requeue the IO completion back to the work
queue so that it can be processed later. This prevents the
completion queue from being blocked and allows data IO completion on
other inodes to proceed, hence avoiding completion order dependent
deadlocks.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit e8b217e7530c6a073ac69f1c85b922d93fdf5647 upstream
Date: Tue, 2 Feb 2010 10:16:26 +1100
We always need to flush the disk write cache and can't skip it just because
the no inode attributes have changed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit cbe132a8bdcff0f9afd9060948fb50597c7400b8 upstream
If we hold onto reserved blocks when doing a remount,ro we end
up writing the blocks used count to disk that includes the reserved
blocks. Reserved blocks are not actually used, so this results in
the values in the superblock being incorrect.
Hence if we run xfs_check or xfs_repair -n while the filesystem is
mounted remount,ro we end up with an inconsistent filesystem being
reported. Also, running xfs_copy on the remount,ro filesystem will
result in an inconsistent image being generated.
To fix this, unreserve the blocks when doing the remount,ro, and
reserved them again on remount,rw. This way a remount,ro filesystem
will appear consistent on disk to all utilities.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9b00f30762fe9f914eb6e03057a616ed63a4e8ca upstream
A "df" run on an NFS client of an exported XFS file system reports
the wrong information for "available" blocks. When a block quota is
enforced, the amount reported as free is limited by the quota, but
the amount reported available is not (and should be).
Reported-by: Guk-Bong, Kwon <gbkwon@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit e09f98606dcc156de1146c209d45a0d6d5f51c3f upstream
When swapping extents, we can corrupt inodes by swapping data forks
that are in incompatible formats. This is caused by the two indoes
having different fork offsets due to the presence of an attribute
fork on an attr2 filesystem. xfs_fsr tries to be smart about
setting the fork offset, but the trick it plays only works on attr1
(old fixed format attribute fork) filesystems.
Changing the way xfs_fsr sets up the attribute fork will prevent
this situation from ever occurring, so in the kernel code we can get
by with a preventative fix - check that the data fork in the
defragmented inode is in a format valid for the inode it is being
swapped into. This will lead to files that will silently and
potentially repeatedly fail defragmentation, so issue a warning to
the log when this particular failure occurs to let us know that
xfs_fsr needs updating/fixing.
To help identify how to improve xfs_fsr to avoid this issue, add
trace points for the inodes being swapped so that we can determine
why the swap was rejected and to confirm that the code is making the
right decisions and modifications when swapping forks.
A further complication is even when the swap is allowed to proceed
when the fork offset is different between the two inodes then value
for the maximum number of extents the data fork can hold can be
wrong. Make sure these are also set correctly after the swap occurs.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 4b6a46882cca8349e8942e2650c33b11bc571c92 upstream
When reclaiming stale inodes, we need to guarantee that inodes are
unpinned before returning with a "clean" status. If we don't we can
reclaim inodes that are pinned, leading to use after free in the
transaction subsystem as transactions complete.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 57817c68229984818fea9e614d6f95249c3fb098 upstream
We cannot do direct inode reclaim without taking the flush lock to
ensure that we do not reclaim an inode under IO. We check the inode
is clean before doing direct reclaim, but this is not good enough
because the inode flush code marks the inode clean once it has
copied the in-core dirty state to the backing buffer.
It is the flush lock that determines whether the inode is still
under IO, even though it is marked clean, and the inode is still
required at IO completion so we can't reclaim it even though it is
clean in core. Hence the requirement that we need to take the flush
lock even on clean inodes because this guarantees that the inode
writeback IO has completed and it is safe to reclaim the inode.
With delayed write inode flushing, we could end up waiting a long
time on the flush lock even for a clean inode. The background
reclaim already handles this efficiently, so avoid all the problems
by killing the direct reclaim path altogether.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|