summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2009-04-0237-146/+275
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: Remove two unneeded exports and make two symbols static in fs/mpage.c Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225 Trim includes of fdtable.h Don't crap into descriptor table in binfmt_som Trim includes in binfmt_elf Don't mess with descriptor table in load_elf_binary() Get rid of indirect include of fs_struct.h New helper - current_umask() check_unsafe_exec() doesn't care about signal handlers sharing New locking/refcounting for fs_struct Take fs_struct handling to new file (fs/fs_struct.c) Get rid of bumping fs_struct refcount in pivot_root(2) Kill unsharing fs_struct in __set_personality()
| * Remove two unneeded exports and make two symbols static in fs/mpage.cDmitri Vorobiev2009-04-011-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 29a814d2ee0e43c2980f33f91c1311ec06c0aa35 (vfs: add hooks for ext4's delayed allocation support) exported the following functions mpage_bio_submit() __mpage_writepage() for the benefit of ext4's delayed allocation support. Since commit a1d6cc563bfdf1bf2829d3e6ce4d8b774251796b (ext4: Rework the ext4_da_writepages() function), these functions are not used by the ext4 driver anymore. However, the now unnecessary exports still remain, and this patch removes those. Moreover, these two functions can become static again. The issue was spotted by namespacecheck. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225Al Viro2009-04-012-1/+1
| | | | | | | | | | | | | | fsync_bdev() export and a bunch of stubs for !CONFIG_BLOCK case had been left behind Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Trim includes of fdtable.hAl Viro2009-03-311-1/+0
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Don't crap into descriptor table in binfmt_somAl Viro2009-03-311-7/+0
| | | | | | | | | | | | | | Same story as in binfmt_elf, except that in binfmt_som we actually forget to close the sucker. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Trim includes in binfmt_elfAl Viro2009-03-311-7/+0
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Don't mess with descriptor table in load_elf_binary()Al Viro2009-03-311-13/+2
| | | | | | | | | | | | | | | | ... since we don't tell anyone which descriptor does the file get. We used to, but only in case of ELF binary with a.out loader and that stuff has been gone for a while. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Get rid of indirect include of fs_struct.hAl Viro2009-03-318-0/+8
| | | | | | | | | | | | | | | | Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * New helper - current_umask()Al Viro2009-03-3122-28/+34
| | | | | | | | | | | | | | current->fs->umask is what most of fs_struct users are doing. Put that into a helper function. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * check_unsafe_exec() doesn't care about signal handlers sharingAl Viro2009-03-311-5/+2
| | | | | | | | | | | | ... since we'll unshare sighand anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * New locking/refcounting for fs_structAl Viro2009-03-315-29/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * all changes of current->fs are done under task_lock and write_lock of old fs->lock * refcount is not atomic anymore (same protection) * its decrements are done when removing reference from current; at the same time we decide whether to free it. * put_fs_struct() is gone * new field - ->in_exec. Set by check_unsafe_exec() if we are trying to do execve() and only subthreads share fs_struct. Cleared when finishing exec (success and failure alike). Makes CLONE_FS fail with -EAGAIN if set. * check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread is in progress. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Take fs_struct handling to new file (fs/fs_struct.c)Al Viro2009-03-316-81/+150
| | | | | | | | | | | | | | | | | | | | Pure code move; two new helper functions for nfsd and daemonize (unshare_fs_struct() and daemonize_fs_struct() resp.; for now - the same code as used to be in callers). unshare_fs_struct() exported (for nfsd, as copy_fs_struct()/exit_fs() used to be), copy_fs_struct() and exit_fs() don't need exports anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Get rid of bumping fs_struct refcount in pivot_root(2)Al Viro2009-03-311-9/+17
| | | | | | | | | | | | | | | | Not because execve races with _that_ are serious - we really need a situation when final drop of fs_struct refcount is done by something that used to have it as current->fs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fs/ufs: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | | | Make ufs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/sysv: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | | | Make sysv file system return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/squashfs: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | | | Make squashfs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/reiserfs: return f_fsid for statfs(2)Coly Li2009-04-022-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make reiserfs3 return f_fsid info for statfs(2). By Andreas' suggestion, this patch populates a persistent f_fsid between boots/mounts with help of on-disk uuid record. Randy Dunlap reported a compiling error from v2 patch like: fs/built-in.o: In function `reiserfs_statfs': super.c:(.text+0x7332b): undefined reference to `crc32_le' super.c:(.text+0x7333f): undefined reference to `crc32_le' Also he provided helpful solution to fix this error. The modification of v3 patch is based on Randy's suggestion, add 'select CRC32' in fs/reiserfs/Kconfig. Signed-off-by: Coly Li <coly.li@suse.de> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/qnx4: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | | | Make qnx4 file system return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Acked-by: Anders Larsen <al@alarsen.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/omfs: return f_fsid for statfs(2)Coly Li2009-04-021-0/+5
| | | | | | | | | | | | | | | | | | Make omfs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/minix: return f_fsid for statfs(2)Coly Li2009-04-021-3/+8
| | | | | | | | | | | | | | | | Make minix file system return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/isofs: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | | | Make isofs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/hpfs: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | | | Make hpfs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/hfsplus: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | | | Make hfsplus return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/hfs: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | | | Make hfs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/fat: return f_fsid for statfs(2)Coly Li2009-04-021-1/+5
| | | | | | | | | | | | | | | | | | Make fat return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/efs: return f_fsid for statfs(2)Coly Li2009-04-021-8/+12
| | | | | | | | | | | | | | | | | | Make efs return f_fsid info for statfs(2), and do a little variable renaming in efs_statfs(). Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/cramfs: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | Make cramfs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/befs: return f_fsid for statfs(2)Coly Li2009-04-021-0/+3
| | | | | | | | | | | | | | | | | | Make befs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Sergey S. Kostyliov <rathamahata@php4.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/affs: return f_fsid for statfs(2)Coly Li2009-04-021-0/+4
| | | | | | | | | | | | | | | | | | Make affs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/adfs: return f_fsid for statfs(2)Coly Li2009-04-021-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently many file systems in Linux kernel do not return f_fsid in statfs info, the value is set as 0 in vfs layer. Anyway, in some conditions, f_fsid from statfs(2) is useful, especially being used as (f_fsid, ino) pair to uniquely identify a file. Basic idea of the patches is generating a unique fs ID by huge_encode_dev(sb->s_bdev->bd_dev) during file system mounting life time (no endian consistent issue). sb is a point of struct super_block of current mounted file system being accessed by statfs(2). This patch: Make adfs return f_fsid info for statfs(2), and do a little variable renaming in adfs_statfs(). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: "Sergey S. Kostyliov" <rathamahata@php4.ru> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Bob Copeland <me@bobcopeland.com> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: Andreas Dilger <adilger@sun.com> Cc: Jamie Lokier <jamie@shareable.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | preadv/pwritev: switch compat readv/preadv/writev/pwritev from fget to ↵Gerd Hoffmann2009-04-021-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | fget_light Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | preadv/pwritev: Add preadv and pwritev system calls.Gerd Hoffmann2009-04-022-0/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds preadv and pwritev system calls. These syscalls are a pretty straightforward combination of pread and readv (same for write). They are quite useful for doing vectored I/O in threaded applications. Using lseek+readv instead opens race windows you'll have to plug with locking. Other systems have such system calls too, for example NetBSD, check here: http://www.daemon-systems.org/man/preadv.2.html The application-visible interface provided by glibc should look like this to be compatible to the existing implementations in the *BSD family: ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset); ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset); This prototype has one problem though: On 32bit archs is the (64bit) offset argument unaligned, which the syscall ABI of several archs doesn't allow to do. At least s390 needs a wrapper in glibc to handle this. As we'll need a wrappers in glibc anyway I've decided to push problem to glibc entriely and use a syscall prototype which works without arch-specific wrappers inside the kernel: The offset argument is explicitly splitted into two 32bit values. The patch sports the actual system call implementation and the windup in the x86 system call tables. Other archs follow as separate patches. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | preadv/pwritev: create compat_writev()Gerd Hoffmann2009-04-021-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor out some code from compat_sys_writev() which can be shared with the upcoming compat_sys_pwritev(). Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | preadv/pwritev: create compat_readv()Gerd Hoffmann2009-04-021-8/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch series: Implement the preadv() and pwritev() syscalls. *BSD has this syscall for quite some time. Test code: #if 0 set -x gcc -Wall -O2 -o preadv $0 exit 0 #endif /* * preadv demo / test * * (c) 2008 Gerd Hoffmann <kraxel@redhat.com> * * build with "sh $thisfile" */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <errno.h> #include <inttypes.h> #include <sys/uio.h> /* ----------------------------------------------------------------- */ /* syscall windup */ #include <sys/syscall.h> #if 0 /* WARNING: Be sure you know what you are doing if you enable this. * linux syscall code isn't upstream yet, syscall numbers are subject * to change */ # ifndef __NR_preadv # ifdef __i386__ # define __NR_preadv 333 # define __NR_pwritev 334 # endif # ifdef __x86_64__ # define __NR_preadv 295 # define __NR_pwritev 296 # endif # endif #endif #ifndef __NR_preadv # error preadv/pwritev syscall numbers are unknown #endif static ssize_t preadv(int fd, const struct iovec *iov, int iovcnt, off_t offset) { uint32_t pos_high = (offset >> 32) & 0xffffffff; uint32_t pos_low = offset & 0xffffffff; return syscall(__NR_preadv, fd, iov, iovcnt, pos_high, pos_low); } static ssize_t pwritev(int fd, const struct iovec *iov, int iovcnt, off_t offset) { uint32_t pos_high = (offset >> 32) & 0xffffffff; uint32_t pos_low = offset & 0xffffffff; return syscall(__NR_pwritev, fd, iov, iovcnt, pos_high, pos_low); } /* ----------------------------------------------------------------- */ /* demo/test app */ static char filename[] = "/tmp/preadv-XXXXXX"; static char outbuf[11] = "0123456789"; static char inbuf[11] = "----------"; static struct iovec ovec[2] = {{ .iov_base = outbuf + 5, .iov_len = 5, },{ .iov_base = outbuf + 0, .iov_len = 5, }}; static struct iovec ivec[3] = {{ .iov_base = inbuf + 6, .iov_len = 2, },{ .iov_base = inbuf + 4, .iov_len = 2, },{ .iov_base = inbuf + 2, .iov_len = 2, }}; void cleanup(void) { unlink(filename); } int main(int argc, char **argv) { int fd, rc; fd = mkstemp(filename); if (-1 == fd) { perror("mkstemp"); exit(1); } atexit(cleanup); /* write to file: "56789-01234" */ rc = pwritev(fd, ovec, 2, 0); if (rc < 0) { perror("pwritev"); exit(1); } /* read from file: "78-90-12" */ rc = preadv(fd, ivec, 3, 2); if (rc < 0) { perror("preadv"); exit(1); } printf("result : %s\n", inbuf); printf("expected: %s\n", "--129078--"); exit(0); } This patch: Factor out some code from compat_sys_readv() which can be shared with the upcoming compat_sys_preadv(). Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | cramfs: propagate uncompression errorsDavid VomLehn2009-04-022-11/+27
| | | | | | | | | | | | | | | | | | | | Decompression errors can arise due to corruption of compressed blocks on flash or in memory. This patch propagates errors detected during decompression back to the block layer. Signed-off-by: David VomLehn <dvomlehn@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | bin_elf_fdpic: check the return value of clear_userMike Frysinger2009-04-021-8/+17
| | | | | | | | | | | | | | | | | | Signed-off-by: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | hppfs: hppfs_read_file() may return -ERRORRoel Kluin2009-04-021-1/+6
| | | | | | | | | | | | | | | | | | | | hppfs_read_file() may return (ssize_t) -ENOMEM, or -EFAULT. When stored in size_t 'count', these errors will not be noticed, a large value will be added to *ppos. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | ext3: avoid false EIO errorsJan Kara2009-04-021-65/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes block_write_begin() can map buffers in a page but later we fail to copy data into those buffers (because the source page has been paged out in the mean time). We then end up with !uptodate mapped buffers. To add a bit more to the confusion, block_write_end() does not commit any data (and thus does not any mark buffers as uptodate) if we didn't succeed with copying all the data. Commit f4fc66a894546bdc88a775d0e83ad20a65210bcb (ext3: convert to new aops) missed these cases and thus we were inserting non-uptodate buffers to transaction's list which confuses JBD code and it reports IO errors, aborts a transaction and generally makes users afraid about their data ;-P. This patch fixes the problem by reorganizing ext3_..._write_end() code to first call block_write_end() to mark buffers with valid data uptodate and after that we file only uptodate buffers to transaction's lists. We also fix a problem where we could leave blocks allocated beyond i_size (i_disksize in fact) because of failed write. We now add inode to orphan list when write fails (to be safe in case we crash) and then truncate blocks beyond i_size in a separate transaction. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | ext3: return -EIO not -ESTALE on directory traversal through deleted inodeBryan Donlan2009-04-021-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext3_iget() returns -ESTALE if invoked on a deleted inode, in order to report errors to NFS properly. However, in ext[234]_lookup(), this -ESTALE can be propagated to userspace if the filesystem is corrupted such that a directory entry references a deleted inode. This leads to a misleading error message - "Stale NFS file handle" - and confusion on the part of the admin. The bug can be easily reproduced by creating a new filesystem, making a link to an unused inode using debugfs, then mounting and attempting to ls -l said link. This patch thus changes ext3_lookup to return -EIO if it receives -ESTALE from ext3_iget(), as ext3 does for other filesystem metadata corruption; and also invokes the appropriate ext*_error functions when this case is detected. Signed-off-by: Bryan Donlan <bdonlan@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | ext3: use unsigned instead of int for type of blocksize in fs/ext3/namei.cWei Yongjun2009-04-021-8/+9
| | | | | | | | | | | | | | | | | | | | Use unsigned instead of int for the parameter which carries a blocksize. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | jbd: fix oops in jbd_journal_init_inode() on corrupted fsJan Kara2009-04-021-10/+24
| | | | | | | | | | | | | | | | | | | | | | On 32-bit system with CONFIG_LBD getblk can fail because provided block number is too big. Make JBD gracefully handle that. Signed-off-by: Jan Kara <jack@suse.cz> Cc: <dmaciejak@fortinet.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | ext3: remove the BKL in ext3/ioctl.cCyrus Massoumi2009-04-023-41/+22
| | | | | | | | | | | | | | | | | | | | | | Reformat ext3/ioctl.c to make it look more like ext4/ioctl.c and remove the BKL around ext3_ioctl(). Signed-off-by: Cyrus Massoumi <cyrusm@gmx.net> Cc: <linux-ext4@vger.kernel.org> Acked-by: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | vfs: check bh->b_blocknr only if BH_Mapped is setNikanth Karthikesan2009-04-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Check bh->b_blocknr only if BH_Mapped is set. akpm: I doubt if b_blocknr is ever uninitialised here, but it could conceivably cause a problem if we're doing a lookup for block zero. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | writeback: guard against jiffies wraparound on inode->dirtied_when checks ↵Jeff Layton2009-04-021-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (try #3) The dirtied_when value on an inode is supposed to represent the first time that an inode has one of its pages dirtied. This value is in units of jiffies. It's used in several places in the writeback code to determine when to write out an inode. The problem is that these checks assume that dirtied_when is updated periodically. If an inode is continuously being used for I/O it can be persistently marked as dirty and will continue to age. Once the time compared to is greater than or equal to half the maximum of the jiffies type, the logic of the time_*() macros inverts and the opposite of what is needed is returned. On 32-bit architectures that's just under 25 days (assuming HZ == 1000). As the least-recently dirtied inode, it'll end up being the first one that pdflush will try to write out. sync_sb_inodes does this check: /* Was this inode dirtied after sync_sb_inodes was called? */ if (time_after(inode->dirtied_when, start)) break; ...but now dirtied_when appears to be in the future. sync_sb_inodes bails out without attempting to write any dirty inodes. When this occurs, pdflush will stop writing out inodes for this superblock. Nothing can unwedge it until jiffies moves out of the problematic window. This patch fixes this problem by changing the checks against dirtied_when to also check whether it appears to be in the future. If it does, then we consider the value to be far in the past. This should shrink the problematic window of time to such a small period (30s) as not to matter. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | vfs: skip I_CLEAR state inodesWu Fengguang2009-04-023-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so _outside_ of inode_lock. So any I_FREEING testing is incomplete without a coupled testing of I_CLEAR. So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and add_dquot_ref(). Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara reminds fixing the other two cases. Masayoshi MIZUMA has a nice panic flow: ===================================================================== [process A] | [process B] | | | prune_icache() | drop_pagecache() | spin_lock(&inode_lock) | drop_pagecache_sb() | inode->i_state |= I_FREEING; | | | spin_unlock(&inode_lock) | V | | | spin_lock(&inode_lock) | V | | | dispose_list() | | | list_del() | | | clear_inode() | | | inode->i_state = I_CLEAR | | | | | V | | | if (inode->i_state & (I_FREEING|I_WILL_FREE)) | | | continue; <==== NOT MATCH | | | | | | (DANGER from here on! Accessing disposing inode!) | | | | | | __iget() | | | list_move() <===== PANIC on poisoned list !! V V | (time) ===================================================================== Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | nommu: fix a number of issues with the per-MM VMA patchDavid Howells2009-04-022-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a number of issues with the per-MM VMA patch: (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on a NOMMU system with more than 2G pages. Makes no difference on a 32-bit system. (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value, lest it overflow. (3) Move the allocation of the vm_area_struct slab back for fork.c. (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs. (5) Use BUG_ON() rather than if () BUG(). (6) Make the default validate_nommu_regions() a static inline rather than a #define. (7) Make free_page_series()'s objection to pages with a refcount != 1 more informative. (8) Adjust the __put_nommu_region() banner comment to indicate that the semaphore must be held for writing. (9) Limit the number of warnings about munmaps of non-mmapped regions. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2009-04-0123-420/+584
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (58 commits) SUNRPC: Ensure IPV6_V6ONLY is set on the socket before binding to a port NSM: Fix unaligned accesses in nsm_init_private() NFS: Simplify logic to compare socket addresses in client.c NFS: Start PF_INET6 callback listener only if IPv6 support is available lockd: Start PF_INET6 listener only if IPv6 support is available SUNRPC: Remove CONFIG_SUNRPC_REGISTER_V4 SUNRPC: rpcb_register() should handle errors silently SUNRPC: Simplify kernel RPC service registration SUNRPC: Simplify svc_unregister() SUNRPC: Allow callers to pass rpcb_v4_register a NULL address SUNRPC: rpcbind actually interprets r_owner string SUNRPC: Clean up address type casts in rpcb_v4_register() SUNRPC: Don't return EPROTONOSUPPORT in svc_register()'s helpers SUNRPC: Use IPv4 loopback for registering AF_INET6 kernel RPC services SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks SUNRPC: Remove @family argument from svc_create() and svc_create_pooled() SUNRPC: Change svc_create_xprt() to take a @family argument SUNRPC: svc_setup_socket() gets protocol family from socket SUNRPC: Pass a family argument to svc_register() ...
| * \ Merge branch 'devel' into for-linusTrond Myklebust2009-04-0123-420/+584
| |\ \
| | * | NSM: Fix unaligned accesses in nsm_init_private()Mans Rullgard2009-04-011-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes unaligned accesses in nsm_init_private() when creating nlm_reboot keys. Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * | NFS: Simplify logic to compare socket addresses in client.cChuck Lever2009-03-281-64/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Callback requests from IPv4 servers are now always guaranteed to be AF_INET, and never mapped IPv4 AF_INET6 addresses. Both nfs_match_client() and nfs_find_client() can now share the same address comparison logic, so fold them together. We can also dispense with of most of the conditional compilation in here. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>