summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* memcgroup: fix zone isolation OOMHugh Dickins2008-02-071-13/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mem_cgroup_charge_common shows a tendency to OOM without good reason, when a memhog goes well beyond its rss limit but with plenty of swap available. Seen on x86 but not on PowerPC; seen when the next patch omits swapcache from memcgroup, but we presume it can happen without. mem_cgroup_isolate_pages is not quite satisfying reclaim's criteria for OOM avoidance. Already it has to scan beyond the nr_to_scan limit when it finds a !LRU page or an active page when handling inactive or an inactive page when handling active. It needs to do exactly the same when it finds a page from the wrong zone (the x86 tests had two zones, the PowerPC tests had only one). Don't increment scan and then decrement it in these cases, just move the incrementation down. Fix recent off-by-one when checking against nr_to_scan. Cut out "Check if the meta page went away from under us", presumably left over from early debugging: no amount of such checks could save us if this list really were being updated without locking. This change does make the unlimited scan while holding two spinlocks even worse - bad for latency and bad for containment; but that's a separate issue which is better left to be fixed a little later. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Pavel Emelianov <xemul@openvz.org> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bugfix for memory cgroup controller: avoid !PageLRU page in ↵KAMEZAWA Hiroyuki2008-02-071-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mem_cgroup_isolate_pages This patch makes mem_cgroup_isolate_pages() to be - ignore !PageLRU pages. - fixes the bug that isolation makes no progress if page_zone(page) != zone page once find. (just increment scan in this case.) kswapd and memory migration removes a page from list when it handles a page for reclaiming/migration. Because __isolate_lru_page() doesn't moves page !PageLRU pages, it will be safe to avoid touching !PageLRU() page and its page_cgroup. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bugfix for memory cgroup controller: migration under memory controller fixKAMEZAWA Hiroyuki2008-02-073-3/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While using memory control cgroup, page-migration under it works as following. == 1. uncharge all refs at try to unmap. 2. charge regs again remove_migration_ptes() == This is simple but has following problems. == The page is uncharged and charged back again if *mapped*. - This means that cgroup before migration can be different from one after migration - If page is not mapped but charged as page cache, charge is just ignored (because not mapped, it will not be uncharged before migration) This is memory leak. == This patch tries to keep memory cgroup at page migration by increasing one refcnt during it. 3 functions are added. mem_cgroup_prepare_migration() --- increase refcnt of page->page_cgroup mem_cgroup_end_migration() --- decrease refcnt of page->page_cgroup mem_cgroup_page_migration() --- copy page->page_cgroup from old page to new page. During migration - old page is under PG_locked. - new page is under PG_locked, too. - both old page and new page is not on LRU. These 3 facts guarantee that page_cgroup() migration has no race. Tested and worked well in x86_64/fake-NUMA box. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bugfix for memory controller: add helper function for assigning cgroup to pageKAMEZAWA Hiroyuki2008-02-071-29/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds following functions. - clear_page_cgroup(page, pc) - page_cgroup_assign_new_page_group(page, pc) Mainly for cleanup. A manner "check page->cgroup again after lock_page_cgroup()" is implemented in straight way. A comment in mem_cgroup_uncharge() will be removed by force-empty patch Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kswapd should only wait on IO if there is IORik van Riel2008-02-071-5/+22
| | | | | | | | | | | | | | | | The current kswapd (and try_to_free_pages) code has an oddity where the code will wait on IO, even if there is no IO in flight. This problem is notable especially when the system scans through many unfreeable pages, causing unnecessary stalls in the VM. Additionally, tasks without __GFP_FS or __GFP_IO in the direct reclaim path will sleep if a significant number of pages are encountered that should be written out. This gives kswapd a chance to write out those pages, while the direct reclaim task sleeps. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: add sysctl to enable task memory dumpDavid Rientjes2008-02-073-5/+75
| | | | | | | | | | | | | | | | | | | | | | | | | Adds a new sysctl, 'oom_dump_tasks', that enables the kernel to produce a dump of all system tasks (excluding kernel threads) when performing an OOM-killing. Information includes pid, uid, tgid, vm size, rss, cpu, oom_adj score, and name. This is helpful for determining why there was an OOM condition and which rogue task caused it. It is configurable so that large systems, such as those with several thousand tasks, do not incur a performance penalty associated with dumping data they may not desire. If an OOM was triggered as a result of a memory controller, the tasklist shall be filtered to exclude tasks that are not a member of the same cgroup. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcontrol: move oom task exclusion to tasklist scanDavid Rientjes2008-02-073-7/+19
| | | | | | | | | | | | | | | | | | | | Creates a helper function to return non-zero if a task is a member of a memory controller: int task_in_mem_cgroup(const struct task_struct *task, const struct mem_cgroup *mem); When the OOM killer is constrained by the memory controller, the exclusion of tasks that are not a member of that controller was previously misplaced and appeared in the badness scoring function. It should be excluded during the tasklist scan in select_bad_process() instead. [akpm@linux-foundation.org: build fix] Cc: Christoph Lameter <clameter@sgi.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mem-controller gfp-mask fixBadari Pulavarty2008-02-071-1/+2
| | | | | | | | Need to strip __GFP_HIGHMEM flag while passing to mem_container_cache_charge(). Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory controller BUG_ON()Balbir Singh2008-02-072-13/+13
| | | | | | | | | | | | | | Move mem_controller_cache_charge() above radix_tree_preload(). radix_tree_preload() disables preemption, even though the gfp_mask passed contains __GFP_WAIT, we cannot really do __GFP_WAIT allocations, thus we hit a BUG_ON() in kmem_cache_alloc(). This patch moves mem_controller_cache_charge() to above radix_tree_preload() for cache charging. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcgroup: reinstate swapoff modHugh Dickins2008-02-071-8/+34
| | | | | | | | | | | | | | | | | | | | | This patch reinstates the "swapoff: scan ptes preemptibly" mod we started with: in due course it should be rendered down into the earlier patches, leaving us with a more straightforward mem_cgroup_charge mod to unuse_pte, allocating with GFP_KERNEL while holding no spinlock and no atomic kmap. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Pavel Emelianov <xemul@openvz.org> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcontrol: move mm_cgroup to header fileDavid Rientjes2008-02-072-7/+9
| | | | | | | | | | | | | | | | | Inline functions must preceed their use, so mm_cgroup() should be defined in linux/memcontrol.h. include/linux/memcontrol.h:48: warning: 'mm_cgroup' declared inline after being called include/linux/memcontrol.h:48: warning: previous declaration of 'mm_cgroup' was here [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: make charging gfp mask awareBalbir Singh2008-02-079-30/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | Nick Piggin pointed out that swap cache and page cache addition routines could be called from non GFP_KERNEL contexts. This patch makes the charging routine aware of the gfp context. Charging might fail if the cgroup is over it's limit, in which case a suitable error is returned. This patch was tested on a Powerpc box. I am still looking at being able to test the path, through which allocations happen in non GFP_KERNEL contexts. [kamezawa.hiroyu@jp.fujitsu.com: problem with ZONE_MOVABLE] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: make page_referenced() cgroup awareBalbir Singh2008-02-075-10/+40
| | | | | | | | | | | | | | | | | | | | | Make page_referenced() cgroup aware. Without this patch, page_referenced() can cause a page to be skipped while reclaiming pages. This patch ensures that other cgroups do not hold pages in a particular cgroup hostage. It is required to ensure that shared pages are freed from a cgroup when they are not actively referenced from the cgroup that brought them in Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: add switch to control what type of pages to limitBalbir Singh2008-02-074-5/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Choose if we want cached pages to be accounted or not. By default both are accounted for. A new set of tunables are added. echo -n 1 > mem_control_type switches the accounting to account for only mapped pages echo -n 3 > mem_control_type switches the behaviour back [bunk@kernel.org: mm/memcontrol.c: clenups] [akpm@linux-foundation.org: fix sparc32 build] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: OOM handlingPavel Emelianov2008-02-073-4/+41
| | | | | | | | | | | | | | | | | | | | | | | Out of memory handling for cgroups over their limit. A task from the cgroup over limit is chosen using the existing OOM logic and killed. TODO: 1. As discussed in the OLS BOF session, consider implementing a user space policy for OOM handling. [akpm@linux-foundation.org: fix build due to oom-killer changes] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller improve user interfaceBalbir Singh2008-02-074-31/+81
| | | | | | | | | | | | | | | | | | | | | | | | | Change the interface to use bytes instead of pages. Page sizes can vary across platforms and configurations. A new strategy routine has been added to the resource counters infrastructure to format the data as desired. Suggested by David Rientjes, Andrew Morton and Herbert Poetzl Tested on a UML setup with the config for memory control enabled. [kamezawa.hiroyu@jp.fujitsu.com: possible race fix in res_counter] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: add per cgroup LRU and reclaimBalbir Singh2008-02-076-30/+286
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the page_cgroup to the per cgroup LRU. The reclaim algorithm has been modified to make the isolate_lru_pages() as a pluggable component. The scan_control data structure now accepts the cgroup on behalf of which reclaims are carried out. try_to_free_pages() has been extended to become cgroup aware. [akpm@linux-foundation.org: fix warning] [Lee.Schermerhorn@hp.com: initialize all scan_control's isolate_pages member] [bunk@kernel.org: make do_try_to_free_pages() static] [hugh@veritas.com: memcgroup: fix try_to_free order] [kamezawa.hiroyu@jp.fujitsu.com: this unlock_page_cgroup() is unnecessary] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: task migrationBalbir Singh2008-02-071-0/+35
| | | | | | | | | | | | | | | | | | Allow tasks to migrate from one cgroup to the other. We migrate mm_struct's mem_cgroup only when the thread group id migrates. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: memory accountingBalbir Singh2008-02-079-27/+295
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the accounting hooks. The accounting is carried out for RSS and Page Cache (unmapped) pages. There is now a common limit and accounting for both. The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap() time. Page cache is accounted at add_to_page_cache(), __delete_from_page_cache(). Swap cache is also accounted for. Each page's page_cgroup is protected with the last bit of the page_cgroup pointer, this makes handling of race conditions involving simultaneous mappings of a page easier. A reference count is kept in the page_cgroup to deal with cases where a page might be unmapped from the RSS of all tasks, but still lives in the page cache. Credits go to Vaidyanathan Srinivasan for helping with reference counting work of the page cgroup. Almost all of the page cache accounting code has help from Vaidyanathan Srinivasan. [hugh@veritas.com: fix swapoff breakage] [akpm@linux-foundation.org: fix locking] Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: <Valdis.Kletnieks@vt.edu> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: accounting setupPavel Emelianov2008-02-075-7/+104
| | | | | | | | | | | | | | | | | Basic setup routines, the mm_struct has a pointer to the cgroup that it belongs to and the the page has a page_cgroup associated with it. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: cgroups setupBalbir Singh2008-02-075-0/+161
| | | | | | | | | | | | | | | | | | Setup the memory cgroup and add basic hooks and controls to integrate and work with the cgroup. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: resource countersPavel Emelianov2008-02-074-0/+230
| | | | | | | | | | | | | | | | | | | | | | | With fixes from David Rientjes <rientjes@google.com> Introduce generic structures and routines for resource accounting. Each resource accounting cgroup is supposed to aggregate it, cgroup_subsystem_state and its resource-specific members within. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcgroup: temporarily revert swapoff modHugh Dickins2008-02-071-31/+7
| | | | | | | | | | | | | This patch precisely reverts the "swapoff: scan ptes preemptibly" patch just presented. It's a temporary measure to allow existing memory controller patches to apply without rejects: in due course they should be rendered down into one sensible patch, and this reversion disappear. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory controller: add documentBalbir Singh2008-02-071-0/+259
| | | | | | | | | | | | | | | Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/cgroup.c: make 2 functions staticAdrian Bunk2008-02-071-2/+2
| | | | | | | | | | cgroup_is_releasable() and notify_on_release() should be static, not global inline. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroups: move cgroups destroy() callbacks to cgroup_diput()Paul Menage2008-02-072-23/+35
| | | | | | | | | | | | | | | | | Move the calls to the cgroup subsystem destroy() methods from cgroup_rmdir() to cgroup_diput(). This allows control file reads and writes to access their subsystem state without having to be concerned with locking against cgroup destruction - the control file dentry will keep the cgroup and its subsystem state objects alive until the file is closed. The documentation is updated to reflect the changed semantics of destroy(); additionally the locking comments for destroy() and some other methods were clarified and decrustified. Signed-off-by: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroup simplify space strippingPaul Jackson2008-02-071-14/+3
| | | | | | | | | | Simplify the space stripping code in cgroup file write. [akpm@linux-foundation.org: s/BUG_ON/BUILD_BUG_ON/] Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroup brace coding style fixPaul Jackson2008-02-071-10/+5
| | | | | | | | | Coding style fix - one line conditionals don't get braces. Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/cgroup.c: remove dead codeAdrian Bunk2008-02-071-10/+8
| | | | | | | | | | | This patch removes dead code spotted by the Coverity checker (look at the "(nbytes >= PATH_MAX)" check). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Paul Jackson <pj@sgi.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* email-clients.txt: sylpheed is OK at IMAPRandy Dunlap2008-02-071-1/+0
| | | | | | | | | | | | This comment is not helpful (no reason given) and is incorrect. Just stick to facts that are useful regarding working on Linux. (akpm: I've used sylpheed+imap for years) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel-doc: warn on badly formatted short descriptionRandy Dunlap2008-02-071-0/+8
| | | | | | | | | | | | | | | | Make kernel-doc warn when a function/struct/union/typedef does not contain a properly formatted short description, such as: * scsi_devinfo: set up the dynamic device list or * scsi_devinfo - This warning is only generated when verbose (-v) mode is used. Also explain the -v command line option in the -h output. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel-doc: prevent duplicate description: outputRandy Dunlap2008-02-071-1/+1
| | | | | | | | | | | | | | Prevent duplicate output of a Description: section when there is a "blank" ("*") line between the initial function name/description line and the "Description:" header. Test case: drivers/scsi/scsi_devinfo.c::scsi_init_devinfo(). Rob Landley hit this while he was producing SCSI kernel-doc. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* REPORTING-BUGS: cc the mailing list tooJ. Bruce Fields2008-02-071-5/+6
| | | | | | | | | People should also cc relevant mailing lists when reporting bugs. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Documentation: create new scheduler/ subdirectoryJ. Bruce Fields2008-02-0710-15/+19
| | | | | | | | | | | | The top-level Documentation/ directory is unmanageably large, so we should take any obvious opportunities to move stuff into subdirectories. These sched-*.txt files seem an obvious easy case. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Documentation: move sharedsubtrees.txt to filesystems/J. Bruce Fields2008-02-073-2/+2
| | | | | | | | | This documentation is also vfs-related. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Documentation: move dnotify.txt to filesystems/J. Bruce Fields2008-02-073-7/+7
| | | | | | | | | I'm inclined to think dnotify belongs in filesystems/. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* move edac.txt two levels upAdrian Bunk2008-02-072-2/+2
| | | | | | | | | | There's no reason for edac.txt for being at this unusual place. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Alan Cox <alan@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add chapter IDs to z8530book.tmplRob Landley2008-02-071-6/+6
| | | | | | | | | Add chapter IDs to z8530book.tmpl Signed-off-by: Rob Landley <rob@landley.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add table IDs to videobook.tmplRob Landley2008-02-071-17/+17
| | | | | | | | | Add table IDs to videobook.tmpl Signed-off-by: Rob Landley <rob@landley.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add section IDs to rapidio.tmplRob Landley2008-02-071-9/+9
| | | | | | | | | Add section IDs to rapidio.tmpl Signed-off-by: Rob Landley <rob@landley.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add missing IDs to procfs-guide.tmplRob Landley2008-02-071-8/+8
| | | | | | | | | Add missing IDs to procfs-guide.tmpl Signed-off-by: Rob Landley <rob@landley.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add section IDs to mtdnand.tmplRob Landley2008-02-071-29/+29
| | | | | | | | | Add section IDs to mtdnand.tmpl Signed-off-by: Rob Landley <rob@landley.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add missing section ID to lsm.tmplRob Landley2008-02-071-1/+1
| | | | | | | | | Add missing section ID to lsm.tmpl Signed-off-by: Rob Landley <rob@landley.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add missing section IDs to genericirq.tmplRob Landley2008-02-071-13/+13
| | | | | | | | | Add missing section IDs to genericirq.tmpl Signed-off-by: Rob Landley <rob@landley.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* tty: Kill TTY_FLIPBUF_SIZEAlan Cox2008-02-071-7/+0
| | | | | | | | | This legacy define from the old buffer code is now only used in a single power pc driver than doesn't compile anyway. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* tty: fix tty network driver interactions with TCGET/TCSET calls (x86 fix)Dave Miller2008-02-071-29/+45
| | | | | | | | | | | And to go with it Dave's type checking x86 termios headers. I've updated these as the original sent by Dave had some wrong types in it. Signed-off-by: Alan Cox <alan@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel-doc: fix for vunmap function prototypeRandy Dunlap2008-02-071-1/+1
| | | | | | | | | | | Fix kernel-doc function prototype parsing which was exposed by vunmap() by allowing more than one '*' before the function name. Error(linux-2.6.24-mm1//mm/vmalloc.c:438): cannot understand prototype: 'struct page **vunmap(const void *addr) ' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: factor out three helpers for FIBMAP/FIONBIO/FIOASYNC file ioctlsErez Zadok2008-02-071-54/+75
| | | | | | | | | | | Factor out file-specific ioctl code into smaller helper functions, away from file_ioctl(). This helps code readability and also reduces indentation inside case statements. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: swap do_ioctl and vfs_ioctl namesErez Zadok2008-02-073-10/+24
| | | | | | | | | | | | | | | | | | | Rename old vfs_ioctl to do_ioctl, because the comment above it clearly indicates that it is an internal function not to be exported to modules; therefore it should have a more traditional do_XXX name. The new do_ioctl is exported in fs.h but not to modules. Rename the old do_ioctl to vfs_ioctl because the names vfs_XXX should preferably be reserved to callable VFS functions which modules may call, as many other vfs_XXX functions already do. Export the new vfs_ioctl to GPL modules so others can use it (including Unionfs and eCryptfs). Add DocBook for new vfs_ioctl. [akpm@linux-foundation.org: fix build] Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: apply coding standards to fs/ioctl.cErez Zadok2008-02-071-80/+84
| | | | | | | Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>