summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* exec: RT sub-thread can livelock and monopolize CPU on execOleg Nesterov2007-10-171-13/+15
| | | | | | | | | | | | | | de_thread() yields waiting for ->group_leader to be a zombie. This deadlocks if an rt-prio execer shares the same cpu with ->group_leader. Change the code to use ->group_exit_task/notify_count mechanics. This patch certainly uglifies the code, perhaps someone can suggest something better. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* exec: consolidate 2 fast-pathsOleg Nesterov2007-10-171-9/+0
| | | | | | | | | | | Now that we don't pre-allocate the new ->sighand, we can kill the first fast path, it doesn't make sense any longer. At best, it can save one "list_empty()" check but leads to the code duplication. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* exec: simplify the new ->sighand allocationOleg Nesterov2007-10-171-15/+9
| | | | | | | | | | | | | | | | | | | | | de_thread() pre-allocates newsighand to make sure that exec() can't fail after killing all sub-threads. Imho, this buys nothing, but complicates the code: - this is (mostly) needed to handle CLONE_SIGHAND without CLONE_THREAD tasks, this is very unlikely (if ever used) case - unless we already have some serious problems, GFP_KERNEL allocation should not fail - ENOMEM still can happen after de_thread(), ->sighand is not the last object we have to allocate Change the code to allocate the new ->sighand on demand. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* exec: simplify ->sighand switchingOleg Nesterov2007-10-171-7/+1
| | | | | | | | | | | | | | | There is no any reason to do recalc_sigpending() after changing ->sighand. To begin with, recalc_sigpending() does not take ->sighand into account. This means we don't need to take newsighand->siglock while changing sighands. rcu_assign_pointer() provides a necessary barrier, and if another process reads the new ->sighand it should either take tasklist_lock or it should use lock_task_sighand() which has a corresponding smp_read_barrier_depends(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix f_version type: should be u64 instead of unsigned longMathieu Desnoyers2007-10-174-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix f_version type: should be u64 instead of long There is a type inconsistency between struct inode i_version and struct file f_version. fs.h: struct inode u64 i_version; and struct file unsigned long f_version; Users do: fs/ext3/dir.c: if (filp->f_version != inode->i_version) { So why isn't f_version a u64 ? It becomes a problem if versions gets higher than 2^32 and we are on an architecture where longs are 32 bits. This patch changes the f_version type to u64, and updates the users accordingly. It applies to 2.6.23-rc2-mm2. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Martin Bligh <mbligh@google.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: <linux-ext4@vger.kernel.org> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: account I/O wait time properlyJeff Moyer2007-10-171-3/+3
| | | | | | | | | | | | | | | | | | | | | Some months back I proposed changing the schedule() call in read_events to an io_schedule(): http://osdir.com/ml/linux.kernel.aio.general/2006-10/msg00024.html This was rejected as there are AIO operations that do not initiate disk I/O. I've had another look at the problem, and the only AIO operation that will not initiate disk I/O is IOCB_CMD_NOOP. However, this command isn't even wired up! Given that it doesn't work, and hasn't for *years*, I'm going to suggest again that we do proper I/O accounting when using AIO. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Suparna Bhattacharya <suparna@in.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use ERESTART_RESTARTBLOCK if poll() is interrupted by a signalChris Wright2007-10-171-1/+30
| | | | | | | | | | | | | | | | | | | | | | | Lomesh reported poll returning EINTR during suspend/resume cycle. This is caused by the STOP/CONT cycle that the freezer uses, generating a pending signal for what in effect is an ignored signal. In general poll is a little eager in returning EINTR, when it could try not bother userspace and simply restart the syscall. Both select and ppoll do use ERESTARTNOHAND to restart the syscall. Oleg points out that simply using ERESTARTNOHAND will cause poll to restart with original timeout value. which could ultimately lead to process never returning to userspace. Instead use ERESTART_RESTARTBLOCK, and restart poll with updated timeout value. Inspired by Manfred's use ERESTARTNOHAND in poll patch. [bunk@kernel.org: do_restart_poll() can become static] Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: "Agarwal, Lomesh" <lomesh.agarwal@intel.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* allow disabling DNOTIFY without EMBEDDEDAdrian Bunk2007-10-171-2/+2
| | | | | | | | | | | | | | | | Allow disabling DNOTIFY with CONFIG_EMBEDDED=n. I'm currently running a kernel with dnotify disabled and I haven't run into any problem. Is there any popular application left that breaks without dnotify support in the kernel? Note that this patch does not remove dnotify support, it still defaults to "y", and the help text recommends enabling it. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* make fs/libfs.c:simple_commit_write() staticAdrian Bunk2007-10-171-3/+2
| | | | | | | | simple_commit_write() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* limit minixfs printks on corrupted dir i_sizeEric Sandeen2007-10-172-4/+14
| | | | | | | | | | | | | | | | | | | This attempts to address CVE-2006-6058 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058 first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html Essentially a corrupted minix dir inode reporting a very large i_size will loop for a very long time in minix_readdir, minix_find_entry, etc, because on EIO they just move on to try the next page. This is under the BKL, printk-storming as well. This can lock up the machine for a very long time. Simply ratelimiting the printks gets things back under control. Make the message a bit more informative while we're here. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Bodo Eggert <7eggert@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ext2/4: use is_power_of_2()vignesh babu2007-10-172-2/+3
| | | | | | | | | Replace n & (n - 1) with is_power_of_2(n) Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cramfs: error message about endianessAndi Drebes2007-10-171-1/+10
| | | | | | | | | | | | | | | | | The README file in the cramfs subdirectory says: "All data is currently in host-endian format; neither mkcramfs nor the kernel ever do swabbing." If somebody tries to mount a cramfs with the wrong endianess, cramfs only complains about a wrong magic but doesn't inform the user that only the endianess isn't right. The following patch adds an error message to the cramfs sources. If a user tries to mount a cramfs with the wrong endianess using the patched sources, cramfs will display the message "cramfs: wrong endianess". Signed-off-by: Andi Drebes <lists-receive@programmierforen.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* clean out unused code in dentry pruningMiklos Szeredi2007-10-171-14/+10
| | | | | | | | | | It looks like in the end all pruners want parents removed. So remove unused code and function arguments. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* exec: remove unnecessary check for MNT_NOEXECMiklos Szeredi2007-10-171-5/+1
| | | | | | | | | | | vfs_permission(MAY_EXEC) checks if the filesystem is mounted with "noexec", so there's no need to repeat this check in sys_uselib() and open_exec(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fix execute checking in permission()Miklos Szeredi2007-10-171-12/+24
| | | | | | | | | | | | | | | | | | | | | permission() checks that MAY_EXEC is only allowed on regular files if at least one execute bit is set in the file mode. generic_permission() already ensures this, so the extra check in permission() is superfluous. If the filesystem defines it's own ->permission() the check may still be needed. In this case move it after ->permission(). This is needed because filesystems such as FUSE may need to refresh the inode attributes before checking permissions. This check should be moved inside ->permission(), but that's another story. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: check nanoseconds in utimensatMiklos Szeredi2007-10-171-0/+13
| | | | | | | | | | utimensat() (and possibly other callers of do_utimes()) didn't check if the nanosecond value was within the allowed range. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ext2/ext3/ext4: add block bitmap validationAneesh Kumar K.V2007-10-173-38/+134
| | | | | | | | | | | | | | When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given by ext4_blk_bitmap, ext4_inode_bitmap and ext4_inode_table. Validate the block bitmap against these blocks. [akpm@linux-foundation.org: cleanups] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add MMF_DUMP_ELF_HEADERSRoland McGrath2007-10-171-25/+53
| | | | | | | | | | | | | | | | | | | | | | This adds the MMF_DUMP_ELF_HEADERS option to /proc/pid/coredump_filter. This dumps the first page (only) of a private file mapping if it appears to be a mapping of an ELF file. Including these pages in the core dump may give sufficient identifying information to associate the original DSO and executable file images and their debugging information with a core file in a generic way just from its contents (e.g. when those binaries were built with ld --build-id). I expect this to become the default behavior eventually. Existing versions of gdb can be confused by the core dumps it creates, so it won't enabled by default for some time to come. Soon many people will have systems with a gdb that handle these dumps, so they can arrange to set the bit at boot and have it inherited system-wide. This also cleans up the checking of the MMF_DUMP_* flag bits, which did not need to be using atomic macros. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ufs: Fix mount check in ufs_fill_super()Satyam Sharma2007-10-171-7/+8
| | | | | | | | | | | | | | The current code skips the check to verify whether the filesystem was previously cleanly unmounted, if (flags & UFS_ST_MASK) == UFS_ST_44BSD or UFS_ST_OLD. This looks like an inadvertent bug that slipped in due to parantheses in the compound conditional to me, especially given that ufs_get_fs_state() handles the UFS_ST_44BSD case perfectly well. So, let's fix the compound condition appropriately. Signed-off-by: Satyam Sharma <satyam@infradead.org> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ufs: move non-layout parts of ufs_fs.h to fs/ufs/Christoph Hellwig2007-10-1712-2/+171
| | | | | | | | | | | | | | | Move prototypes and in-core structures to fs/ufs/ similar to what most other filesystems already do. I made little modifications: move also ufs debug macros and mount options constants into fs/ufs/ufs.h, this stuff also private for ufs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Deprecate a.out ELF interpretersAndi Kleen2007-10-171-0/+8
| | | | | | | | | | | | | | | | | | | | | | | The Linux ELF loader is quite complicated and messy code (that could probably need a rewrite, but that's a different chapter). One particular messy part in it is the support for non ELF a.out ld.sos. This was originally added to make transition from a.out to ELF easier because an a.out ELF ld.so could be still build using an older a.out toolkit. But by now that should be fully obsolete and removing it would clean up binfmt_elf.c up a bit. I propose to deprecate this support and remove for 2.6.25. Drawback is that someone still runs their system with a.out ld.so they would need to update the ld.so when updating to a new kernel. This patch just adds an entry to the deprecation file and a printk warning users. [akpm@linux-foundation.org: better warning message] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/autofs4/inode.c: kmalloc + memset conversion to kzallocMariusz Kozlowski2007-10-171-3/+1
| | | | | | | | | | fs/autofs4/inode.c | 10467 -> 10435 (-32 bytes) fs/autofs4/inode.o | 98576 -> 98552 (-24 bytes) Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/afs/: possible cleanupsAdrian Bunk2007-10-1710-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | This patch contains the following possible cleanups: - make the following needlessly global functions static: - rxrpc.c: afs_send_pages() - vlocation.c: afs_vlocation_queue_for_updates() - write.c: afs_writepages_region() - make the following needlessly global variables static: - mntpt.c: afs_mntpt_expiry_timeout - proc.c: afs_vlocation_states[] - server.c: afs_server_timeout - vlocation.c: afs_vlocation_timeout - vlocation.c: afs_vlocation_update_timeout - #if 0 the following unused function: - cell.c: afs_get_cell_maybe() - #if 0 the following unused variables: - callback.c: afs_vnode_update_timeout - cmservice.c: struct afs_cm_workqueue Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* core_pattern: fix up a few miscellaneous bugsNeil Horman2007-10-171-2/+15
| | | | | | | | | | | | | | Fix do_coredump to detect a crash in the user mode helper process and abort the attempt to recursively dump core to another copy of the helper process, potentially ad-infinitum. [akpm@linux-foundation.org: cleanups] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: <martin.pitt@ubuntu.com> Cc: <wwoods@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* core_pattern: allow passing of arguments to user mode helper when ↵Neil Horman2007-10-171-3/+22
| | | | | | | | | | | | | | | | | core_pattern is a pipe A rewrite of my previous post for this enhancement. It uses jeremy's split_argv/free_argv library functions to translate core_pattern into an argv array to be passed to the user mode helper process. It also adds a translation to format_corename such that the origional value of RLIMIT_CORE can be passed to userspace as an argument. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: <martin.pitt@ubuntu.com> Cc: <wwoods@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* core_pattern: ignore RLIMIT_CORE if core_pattern is a pipeNeil Horman2007-10-176-23/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For some time /proc/sys/kernel/core_pattern has been able to set its output destination as a pipe, allowing a user space helper to receive and intellegently process a core. This infrastructure however has some shortcommings which can be enhanced. Specifically: 1) The coredump code in the kernel should ignore RLIMIT_CORE limitation when core_pattern is a pipe, since file system resources are not being consumed in this case, unless the user application wishes to save the core, at which point the app is restricted by usual file system limits and restrictions. 2) The core_pattern code should be able to parse and pass options to the user space helper as an argv array. The real core limit of the uid of the crashing proces should also be passable to the user space helper (since it is overridden to zero when called). 3) Some miscellaneous bugs need to be cleaned up (specifically the recognition of a recursive core dump, should the user mode helper itself crash. Also, the core dump code in the kernel should not wait for the user mode helper to exit, since the same context is responsible for writing to the pipe, and a read of the pipe by the user mode helper will result in a deadlock. This patch: Remove the check of RLIMIT_CORE if core_pattern is a pipe. In the event that core_pattern is a pipe, the entire core will be fed to the user mode helper. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: <martin.pitt@ubuntu.com> Cc: <wwoods@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ufs: implement show_optionsEvgeniy Dushistov2007-10-171-4/+39
| | | | | | | | | An implementation of show_options method for UFS. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add in SunOS 4.1.x compatible mode for UFSMark Fortescue2007-10-172-19/+72
| | | | | | | | | | | | | | | | | | | | | | | | | Add in support for SunOS 4.1.x flavor of BSD 4.2 UFS filing system Macros have been put in to alow suport for the old static table Cylinder Groups but this implementation does not use them yet. This also fixes Solaris UFS filing system access by disabling fast symbolic links as Sun's version of UFS does not support on-disk fast symbolic links. Tested by: Ppartitioning a new disk using SunOS 4.1.1, creating a UFS filing system on one of the partitions and writing some files to the filing system. Using Linux-2.6.22 (patched) to read the files and then write a shed load of files to the UFS partition. Using SunOS 4.1.1 to verify the filing system is OK and to check the files. The test host is a sun4c SS1 Clone. [akpm@linux-foundation.org: coding style fixes] [adobriyan@gmail.com: fix oops] Signed-off-by: Mark Fortescue <mark@mtfhpc.demon.co.uk> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* remove unused bh in calls to ext234_get_group_descEric Sandeen2007-10-174-37/+23
| | | | | | | | | | | | ext[234]_get_group_desc never tests the bh argument, and only sets it if it is passed in; it is perfectly happy with a NULL bh argument. But, many callers send one in and never use it. May as well call with NULL like other callers who don't use the bh. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: remove the unused mempages parameterDenis Cheng2007-10-173-6/+6
| | | | | | | | | | | | | | | | Since the mempages parameter is actually not used, they should be removed. Now there is only files_init use the mempages parameter, files_init(mempages); but I don't think the adaptation to mempages in files_init is really useful; and if files_init also changed to the prototype void (*func)(void), the wrapper vfs_caches_init would also not need the mempages parameter. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* report the per-irq statistics on all archesRavikiran G Thirumalai2007-10-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4004c69ad68dd03733179277280ea2946990ba36 avoids too many remote cpu references while reporting per-irq stats. Since we will not have the same performance penalty of bringing in remote cpu cachelines while reporting per-irq stats anymore, we can now afford to be consistent and report this statistic on all arches, all configs. akpm: affects ia64, alpha and ppc64, mainly. Kiran earlier said: Read to /proc/stat takes: Plain: 2.622832 With speedup patch: 0.013194 With the per-irq stats commented out: 0.008124 So the performance problems which originally caused those architectures to disable this statistic should now be fixed up. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ext4: show all mount optionsMiklos Szeredi2007-10-171-0/+72
| | | | | | | Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ext3: show all mount optionsMiklos Szeredi2007-10-171-0/+70
| | | | | | | Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ext2: show all mount optionsMiklos Szeredi2007-10-171-2/+59
| | | | | | | | | | | | | | | | | | | | | | | Using mtab is problematic for various reasons, one of them is that unprivileged mounts won't turn up in there. So we want to get rid of it, and use /proc/mounts instead. But most filesystems are lazy, and are not showing all mount options. Which means, that without mtab, the user won't be able to see some or all of the options. It would be nice if the generic code could remember the mount options, and show them without the need to add extra code to filesystems. But this is not easy, because different filesystems handle mount options given options, and not tough the rest. This is not taken into account by mount(8) either, so /etc/mtab will be broken in this case. This series fixes up ->show_options() in ext[234]. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* convert ill defined log2() to ilog2()Fengguang Wu2007-10-171-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | It's *wrong* to have #define log2(n) ffz(~(n)) It should be *reversed*: #define log2(n) flz(~(n)) or #define log2(n) fls(n) or just use ilog2(n) defined in linux/log2.h. This patch follows the last solution, recommended by Andrew Morton. Cc: <linux-ext4@vger.kernel.org> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Chris Ahna <christopher.j.ahna@intel.com> Cc: David Mosberger-Tang <davidm@hpl.hp.com> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Clean up duplicate includes in fs/ecryptfs/Jesper Juhl2007-10-171-1/+0
| | | | | | | | | | This patch cleans up duplicate includes in fs/ecryptfs/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Cc: Michael A Halcrow <mahalcro@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Clean up duplicate includes in fs/Jesper Juhl2007-10-172-2/+0
| | | | | | | | | This patch cleans up duplicate includes in fs/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: use kmem_cache_zalloc insteadDenis Cheng2007-10-171-2/+1
| | | | | | Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/mmu.c: headers butcheryAlexey Dobriyan2007-10-171-19/+2
| | | | | | | | | | | | | | | | | | | | | | | fs/proc/mmu.c consists of only one function which uses only: 1) struct vmalloc_info * 2) struct vm_struct * 3) struct vmalloc_info 4) vmlist 5) VMALLOC_TOTAL, VMALLOC_START, VMALLOC_END 6) read_lock, read_unlock 7) vmlist_lock 8) struct vm_struct This gives us linux/spinlock.h, asm/pgtable.h, "internal.h", linux/vmalloc.h. asm/pgtable.h uses PKMAP_BASE on i386, for which asm/highmem.h is needed. But, linux/highmem.h is actually used to make it compile everywhere. I'll deal later with this particular i386 surprise. Cross-compile tested on many archs and configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_poll: return -EINTR when signalledOleg Nesterov2007-10-171-7/+7
| | | | | | | | | | | | | | do_poll() checks signal_pending() but returns 0 when interrupted. This means the caller has to check signal_pending() again. Change it to return -EINTR when signal_pending() and count == 0. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andi Kleen <ak@suse.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Vadim Lobanov <vlobanov@speakeasy.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_sys_poll: simplify playing with on-stack dataOleg Nesterov2007-10-171-56/+36
| | | | | | | | | | | | | | | | | | | Cleanup. Lessens both the source and compiled code (100 bytes) and imho makes the code much more understandable. With this patch "struct poll_list *head" always points to on-stack stack_pps, so we can remove all "is it on-stack" and "was it initialized" checks. Also, move poll_initwait/poll_freewait and -EINTR detection closer to the do_poll()'s callsite. [akpm@linux-foundation.org: fix warning (size_t != uint)] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Looks-good-to: Andi Kleen <ak@suse.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Vadim Lobanov <vlobanov@speakeasy.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: mark nibblemap constPhilippe De Muyter2007-10-175-5/+5
| | | | | | Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/romfs/inode.c: trivial improvementsWANG Cong2007-10-171-2/+2
| | | | | | | | | | | | - There are no lists in fs/romfs/inode.c, so using list_entry is a bit confusing. Replace it with container_of. - It is unnecessary to cast the return value of kmem_cache_alloc, since it returns a void* pointer. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zisofs use mutex instead of semaphoreDave Young2007-10-171-20/+5
| | | | | | | | | | Use mutex instead of semaphore in fs/isofs/compress.c, and remove an unnecessary variable. Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* SLAB_PANIC more (proc, posix-timers, shmem)Alexey Dobriyan2007-10-171-3/+1
| | | | | | | | These aren't modular, so SLAB_PANIC is OK. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Make unregister_binfmt() return voidAlexey Dobriyan2007-10-171-2/+1
| | | | | | | | | | | list_del() hardly can fail, so checking for return value is pointless (and current code always return 0). Nobody really cared that return value anyway. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use list_head in binfmt handlingAlexey Dobriyan2007-10-171-28/+6
| | | | | | | | | | | | | | Switch single-linked binfmt formats list to usual list_head's. This leads to one-liners in register_binfmt() and unregister_binfmt(). The downside is one pointer more in struct linux_binfmt. This is not a problem, since the set of registered binfmts on typical box is very small -- (ELF + something distro enabled for you). Test-booted, played with executable .txt files, modprobe/rmmod binfmt_misc. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/reiserfs/: cleanupsAdrian Bunk2007-10-173-60/+4
| | | | | | | | | | | | | | | | | - remove the following no longer used functions: - bitmap.c: reiserfs_claim_blocks_to_be_allocated() - bitmap.c: reiserfs_release_claimed_blocks() - bitmap.c: reiserfs_can_fit_pages() - make the following functions static: - inode.c: restart_transaction() - journal.c: reiserfs_async_progress_wait() Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Vladimir V. Saveliev <vs@namesys.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Slab API: remove useless ctor parameter and reorder parametersChristoph Lameter2007-10-1744-57/+48
| | | | | | | | | | | | | | | | | | | | | Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void *object, struct kmem_cache *s, unsigned long flags) to ctor(struct kmem_cache *s, void *object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: count reclaimable pages per BDIPeter Zijlstra2007-10-172-0/+9
| | | | | | | | Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>