summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* irq: provide debug_poll_all_shared_irqs() method under CONFIG_DEBUG_SHIRQIngo Molnar2009-01-161-1/+13
| | | | | | | | | | | | | | Provide a shared interrupt debug facility under CONFIG_DEBUG_SHIRQ: it uses the existing irqpoll facilities to iterate through all registered interrupt handlers and call those which can handle shared IRQ lines. This can be handy for suspend/resume debugging: if we call this function early during resume we can trigger crashes in those drivers which have incorrect assumptions about when exactly their ISRs will be called during suspend/resume. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'linus' into irq/genirqIngo Molnar2009-01-16125-5145/+12690
|\
| * Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2009-01-152-13/+37
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: sched_slice() fixlet sched: fix update_min_vruntime sched: SCHED_OTHER vs SCHED_IDLE isolation sched: SCHED_IDLE weight change sched: fix bandwidth validation for UID grouping Revert "sched: improve preempt debugging"
| | * sched: sched_slice() fixletLin Ming2009-01-151-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mike's change: 0a582440f "sched: fix sched_slice())" broke group scheduling by forgetting to reload cfs_rq on each loop. This patch fixes aim7 regression and specjbb2005 regression becomes less than 1.5% on 8-core stokley. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Jayson King <dev@jaysonking.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * sched: fix update_min_vruntimePeter Zijlstra2009-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix SCHED_IDLE latency problems OK, so we have 1 running task A (which is obviously curr and the tree is equally obviously empty). 'A' nicely chugs along, doing its thing, carrying min_vruntime along as it goes. Then some whacko speed freak SCHED_IDLE task gets inserted due to SMP balancing, which is very likely far right, in that case update_curr update_min_vruntime cfs_rq->rb_leftmost := true (the crazy task sitting in a tree) vruntime = se->vruntime and voila, min_vruntime is waaay right of where it ought to be. OK, so why did I write it like that to begin with... Aah, yes. Say we've just dequeued current schedule deactivate_task(prev) dequeue_entity update_min_vruntime Then we'll set vruntime = cfs_rq->min_vruntime; we find !cfs_rq->curr, but do find someone in the tree. Then we _must_ do vruntime = se->vruntime, because vruntime = min_vruntime(vruntime := cfs_rq->min_vruntime, se->vruntime) will not advance vruntime, and cause lags the other way around (which we fixed with that initial patch: 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69 (sched: more accurate min_vruntime accounting). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Mike Galbraith <efault@gmx.de> Acked-by: Mike Galbraith <efault@gmx.de> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * sched: SCHED_OTHER vs SCHED_IDLE isolationPeter Zijlstra2009-01-151-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stronger SCHED_IDLE isolation: - no SCHED_IDLE buddies - never let SCHED_IDLE preempt on wakeup - always preempt SCHED_IDLE on wakeup - limit SLEEPER fairness for SCHED_IDLE. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * sched: SCHED_IDLE weight changePeter Zijlstra2009-01-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase the SCHED_IDLE weight from 2 to 3, this gives much more stable vruntime numbers. time advanced in 100ms: weight=2 64765.988352 67012.881408 88501.412352 weight=3 35496.181411 34130.971298 35497.411573 Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * sched: fix bandwidth validation for UID groupingPeter Zijlstra2009-01-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: make rt-limit tunables work again Mark Glines reported: > I've got an issue on x86-64 where I can't configure the system to allow > RT tasks for a non-root user. > > In 2.6.26.5, I was able to do the following to set things up nicely: > echo 450000 >/sys/kernel/uids/0/cpu_rt_runtime > echo 450000 >/sys/kernel/uids/1000/cpu_rt_runtime > > Seems like every value I try to echo into the /sys files returns EINVAL. For UID grouping we initialize the root group with infinite bandwidth which by default is actually more than the global limit, therefore the bandwidth check always fails. Because the root group is a phantom group (for UID grouping) we cannot runtime adjust it, therefore we let it reflect the global bandwidth settings. Reported-by: Mark Glines <mark@glines.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * Revert "sched: improve preempt debugging"Ingo Molnar2009-01-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 7317d7b87edb41a9135e30be1ec3f7ef817c53dd. This has been reported (and bisected) by Alexey Zaytsev and Kamalesh Babulal to produce annoying warnings during bootup on both x86 and powerpc. kernel_locked() is not a valid test in IRQ context (we update the BKL's ->lock_depth and the preempt count separately and non-atomicalyy), so we cannot put it into the generic preempt debugging checks which can run in IRQ contexts too. Reported-and-bisected-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Reported-and-bisected-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | resources: fix parameter name and kernel-docRandy Dunlap2009-01-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix __request_region() parameter kernel-doc notation and parameter name: Warning(linux-2.6.28-git10//kernel/resource.c:627): No description found for parameter 'flags' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | cgroups: consolidate cgroup documentsLi Zefan2009-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move Documentation/cpusets.txt and Documentation/controllers/* to Documentation/cgroups/ Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | [IA64] dump stack on kernel unaligned warningsDoug Chapman2009-01-151-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Often the cause of kernel unaligned access warnings is not obvious from just the ip displayed in the warning. This adds the option via proc to dump the stack in addition to the warning. The default is off (just display the 1 line warning). To enable the stack to be shown: echo 1 > /proc/sys/kernel/unaligned-dump-stack Signed-off-by: Doug Chapman <doug.chapman@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | Merge branch 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6Linus Torvalds2009-01-1421-191/+169
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (44 commits) [CVE-2009-0029] s390 specific system call wrappers [CVE-2009-0029] System call wrappers part 33 [CVE-2009-0029] System call wrappers part 32 [CVE-2009-0029] System call wrappers part 31 [CVE-2009-0029] System call wrappers part 30 [CVE-2009-0029] System call wrappers part 29 [CVE-2009-0029] System call wrappers part 28 [CVE-2009-0029] System call wrappers part 27 [CVE-2009-0029] System call wrappers part 26 [CVE-2009-0029] System call wrappers part 25 [CVE-2009-0029] System call wrappers part 24 [CVE-2009-0029] System call wrappers part 23 [CVE-2009-0029] System call wrappers part 22 [CVE-2009-0029] System call wrappers part 21 [CVE-2009-0029] System call wrappers part 20 [CVE-2009-0029] System call wrappers part 19 [CVE-2009-0029] System call wrappers part 18 [CVE-2009-0029] System call wrappers part 17 [CVE-2009-0029] System call wrappers part 16 [CVE-2009-0029] System call wrappers part 15 ...
| | * | [CVE-2009-0029] System call wrappers part 32Heiko Carstens2009-01-141-6/+5
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 31Heiko Carstens2009-01-142-8/+7
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 30Heiko Carstens2009-01-141-1/+1
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 27Heiko Carstens2009-01-144-5/+5
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 26Heiko Carstens2009-01-141-2/+2
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 24Heiko Carstens2009-01-141-6/+7
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 23Heiko Carstens2009-01-141-3/+3
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 19Heiko Carstens2009-01-141-6/+6
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 18Heiko Carstens2009-01-141-10/+11
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 17Heiko Carstens2009-01-141-3/+3
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 09Heiko Carstens2009-01-141-13/+8
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 08Heiko Carstens2009-01-146-26/+19
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 07Heiko Carstens2009-01-145-13/+13
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 06Heiko Carstens2009-01-141-13/+13
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 05Heiko Carstens2009-01-142-27/+21
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 04Heiko Carstens2009-01-146-13/+11
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 03Heiko Carstens2009-01-141-9/+9
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 02Heiko Carstens2009-01-142-10/+10
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] System call wrappers part 01Heiko Carstens2009-01-144-13/+13
| | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] Make sys_syslog a conditional system callHeiko Carstens2009-01-142-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the -ENOSYS implementation for !CONFIG_PRINTK and use the cond_syscall infrastructure instead. Acked-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| | * | [CVE-2009-0029] Convert all system calls to return a longHeiko Carstens2009-01-143-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert all system calls to return a long. This should be a NOP since all converted types should have the same size anyway. With the exception of sys_exit_group which returned void. But that doesn't matter since the system call doesn't return. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * | | kernel/up.c: omit it if SMP=y, USE_GENERIC_SMP_HELPERS=nAndrew Morton2009-01-141-3/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the sparc build - we were including `up.o' on SMP builds, when CONFIG_USE_GENERIC_SMP_HELPERS=n. Tested-by: Robert Reif <reif@earthlink.net> Fixed-by: Robert Reif <reif@earthlink.net> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2009-01-131-0/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: smp_call_function_single(): be slightly less stupid, fix #2 lockdep, mm: fix might_fault() annotation
| | * | smp_call_function_single(): be slightly less stupid, fix #2Ingo Molnar2009-01-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix m68k build failure: tip/kernel/up.c: In function 'smp_call_function_single': tip/kernel/up.c:16: error: dereferencing pointer to incomplete type make[2]: *** [kernel/up.o] Error 1 Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | async: fix __lowest_in_progress()Arjan van de Ven2009-01-121-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At 37000 feet somewhere near Greenland I woke up from a half-sleep with the realisation that __lowest_in_progress() is buggy. After landing I checked and there were indeed 2 problems with it; this patch fixes both: * The order of the list checks was wrong * The locking was not correct. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2009-01-122-9/+25
| |\ \ \ | | |/ / | |/| / | | |/ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kernel/sched.c: add missing forward declaration for 'double_rq_lock' sched: partly revert "sched debug: remove NULL checking in print_cfs_rt_rq()" cpumask: fix CONFIG_NUMA=y sched.c
| | * kernel/sched.c: add missing forward declaration for 'double_rq_lock'Steven Noonan2009-01-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: build fix on certain configs Added 'double_rq_lock' forward declaration, allowing double_rq_lock to be used in _double_lock_balance(). Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * sched: partly revert "sched debug: remove NULL checking in print_cfs_rt_rq()"Li Zefan2009-01-111-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: avoid accessing NULL tg.css->cgroup In commit 0a0db8f5c9d4bbb9bbfcc2b6cb6bce2d0ef4d73d, I removed checking NULL tg.css->cgroup, but I realized I was wrong when I found reading /proc/sched_debug can race with cgroup_create(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * cpumask: fix CONFIG_NUMA=y sched.cRusty Russell2009-01-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix panic on ia64 with NR_CPUS=1024 struct sched_domain is now a dangling structure; where we really want static ones, we need to use static_sched_domain. (As the FIXME in this file says, cpumask_var_t would be better, but this code is hairy enough without trying to add initialization code to the right places). Reported-by: Mike Travis <travis@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | smp_call_function_single(): be slightly less stupid, fixIngo Molnar2009-01-111-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: build fix on Alpha kernel/up.c: In function 'smp_call_function_single': kernel/up.c:12: error: 'cpuid' undeclared (first use in this function) kernel/up.c:12: error: (Each undeclared identifier is reported only once kernel/up.c:12: error: for each function it appears in.) The typo didnt show up on x86 because 'cpuid' happens to be a function address as well ... Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | smp_call_function_single(): be slightly less stupidAndrew Morton2009-01-112-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you do smp_call_function_single(expression-with-side-effects, ...) then expression-with-side-effects never gets evaluated on UP builds. As always, implementing it in C is the correct thing to do. While we're there, uninline it for size and possible header dependency reasons. And create a new kernel/up.c, as a place in which to put uniprocessor-specific code and storage. It should mirror kernel/smp.c. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | Merge commit 'v2.6.29-rc1' into core/urgentIngo Molnar2009-01-1119-396/+1129
| |\|
| | * Merge git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async-2Linus Torvalds2009-01-091-2/+14
| | |\ | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async-2: async: make async a command line option for now partial revert of asynchronous inode delete
| | | * async: make async a command line option for nowArjan van de Ven2009-01-091-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and have it default off. This does allow people to work with it for testing. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
| | * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommuLinus Torvalds2009-01-092-3/+15
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu: NOMMU: Support XIP on initramfs NOMMU: Teach kobjsize() about VMA regions. FLAT: Don't attempt to expand the userspace stack to fill the space allocated FDPIC: Don't attempt to expand the userspace stack to fill the space allocated NOMMU: Improve procfs output using per-MM VMAs NOMMU: Make mmap allocation page trimming behaviour configurable. NOMMU: Make VMAs per MM as for MMU-mode linux NOMMU: Delete askedalloc and realalloc variables NOMMU: Rename ARM's struct vm_region NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area()
| | | * | NOMMU: Make mmap allocation page trimming behaviour configurable.Paul Mundt2009-01-081-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NOMMU mmap allocates a piece of memory for an mmap that's rounded up in size to the nearest power-of-2 number of pages. Currently it then discards the excess pages back to the page allocator, making that memory available for use by other things. This can, however, cause greater amount of fragmentation. To counter this, a sysctl is added in order to fine-tune the trimming behaviour. The default behaviour remains to trim pages aggressively, while this can either be disabled completely or set to a higher page-granular watermark in order to have finer-grained control. vm region vm_top bits taken from an earlier patch by David Howells. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Mike Frysinger <vapier.adi@gmail.com>
| | | * | NOMMU: Make VMAs per MM as for MMU-mode linuxDavid Howells2009-01-081-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make VMAs per mm_struct as for MMU-mode linux. This solves two problems: (1) In SYSV SHM where nattch for a segment does not reflect the number of shmat's (and forks) done. (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an exec'ing process when VM_EXECUTABLE is specified, regardless of the fact that a VMA might be shared and already have its vm_mm assigned to another process or a dead process. A new struct (vm_region) is introduced to track a mapped region and to remember the circumstances under which it may be shared and the vm_list_struct structure is discarded as it's no longer required. This patch makes the following additional changes: (1) Regions are now allocated with alloc_pages() rather than kmalloc() and with no recourse to __GFP_COMP, so the pages are not composite. Instead, each page has a reference on it held by the region. Anything else that is interested in such a page will have to get a reference on it to retain it. When the pages are released due to unmapping, each page is passed to put_page() and will be freed when the page usage count reaches zero. (2) Excess pages are trimmed after an allocation as the allocation must be made as a power-of-2 quantity of pages. (3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may end up with overlapping VMAs within the tree, the VMA struct address is appended to the sort key. (4) Non-anonymous VMAs are now added to the backing inode's prio list. (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of the backing region. The VMA and region structs will be split if necessary. (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory segment instead of all the attachments at that addresss. Multiple shmat()'s return the same address under NOMMU-mode instead of different virtual addresses as under MMU-mode. (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode. (8) /proc/maps is now the global list of mapped regions, and may list bits that aren't actually mapped anywhere. (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount of RAM currently allocated by mmap to hold mappable regions that can't be mapped directly. These are copies of the backing device or file if not anonymous. These changes make NOMMU mode more similar to MMU mode. The downside is that NOMMU mode requires some extra memory to track things over NOMMU without this patch (VMAs are no longer shared, and there are now region structs). Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Mike Frysinger <vapier.adi@gmail.com> Acked-by: Paul Mundt <lethal@linux-sh.org>