diff options
author | Ying Han <yinghan@google.com> | 2011-05-26 16:25:33 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2011-05-26 17:12:35 -0700 |
commit | 889976dbcb1218119fdd950fb7819084e37d7d37 (patch) | |
tree | 7508706ddb6bcbe0f673aca3744f30f281b17734 /mm/vmscan.c | |
parent | 4e4c941c108eff10844d2b441d96dab44f32f424 (diff) | |
download | linux-889976dbcb1218119fdd950fb7819084e37d7d37.tar.gz linux-889976dbcb1218119fdd950fb7819084e37d7d37.tar.bz2 linux-889976dbcb1218119fdd950fb7819084e37d7d37.zip |
memcg: reclaim memory from nodes in round-robin order
Presently, memory cgroup's direct reclaim frees memory from the current
node. But this has some troubles. Usually when a set of threads works in
a cooperative way, they tend to operate on the same node. So if they hit
limits under memcg they will reclaim memory from themselves, damaging the
active working set.
For example, assume 2 node system which has Node 0 and Node 1 and a memcg
which has 1G limit. After some work, file cache remains and the usages
are
Node 0: 1M
Node 1: 998M.
and run an application on Node 0, it will eat its foot before freeing
unnecessary file caches.
This patch adds round-robin for NUMA and adds equal pressure to each node.
When using cpuset's spread memory feature, this will work very well.
But yes, a better algorithm is needed.
[akpm@linux-foundation.org: comment editing]
[kamezawa.hiroyu@jp.fujitsu.com: fix time comparisons]
Signed-off-by: Ying Han <yinghan@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/vmscan.c')
-rw-r--r-- | mm/vmscan.c | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c index 884ae08c16cc..b0875871820d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2226,6 +2226,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, { struct zonelist *zonelist; unsigned long nr_reclaimed; + int nid; struct scan_control sc = { .may_writepage = !laptop_mode, .may_unmap = 1, @@ -2242,7 +2243,14 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, .gfp_mask = sc.gfp_mask, }; - zonelist = NODE_DATA(numa_node_id())->node_zonelists; + /* + * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't + * take care of from where we get pages. So the node where we start the + * scan does not need to be the current node. + */ + nid = mem_cgroup_select_victim_node(mem_cont); + + zonelist = NODE_DATA(nid)->node_zonelists; trace_mm_vmscan_memcg_reclaim_begin(0, sc.may_writepage, |