memcg: punt high overage reclaim to return-to-userland path

Currently, try_charge() tries to reclaim memory synchronously when the high limit is breached; however, if the allocation doesn't have __GFP_WAIT, synchronous reclaim is skipped. If a process performs only speculative allocations, it can blow way past the high limit. This is actually easily reproducible by simply doing "find /". slab/slub allocator tries speculative allocations first, so as long as there's memory which can be consumed without blocking, it can keep allocating memory regardless of the high limit. This patch makes try_charge() always punt the over-high reclaim to the return-to-userland path. If try_charge() detects that high limit is breached, it adds the overage to current->memcg_nr_pages_over_high and schedules execution of mem_cgroup_handle_over_high() which performs synchronous reclaim from the return-to-userland path. As long as kernel doesn't have a run-away allocation spree, this should provide enough protection while making kmemcg behave more consistently. It also has the following benefits. - All over-high reclaims can use GFP_KERNEL regardless of the specific gfp mask in use, e.g. GFP_NOFS, when the limit was breached. - It copes with prio inversion. Previously, a low-prio task with small memory.high might perform over-high reclaim with a bunch of locks held. If a higher prio task needed any of these locks, it would have to wait until the low prio task finished reclaim and released the locks. By handing over-high reclaim to the task exit path this issue can be avoided. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@kernel.org> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Tejun Heo <tj@kernel.org> 2015-11-05 18:46:11 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2015-11-05 19:34:48 -0800
commit: b23afb93d317c65cef553b804f08dec8a7a0f7e1 (patch)
tree: 41c1e46049d02c6649f2e773a7b9fbb26094e115 /include/linux/tracehook.h
parent: 626ebc4100285be56fe3546f29b6afeb36b6871a (diff)
download: linux-b23afb93d317c65cef553b804f08dec8a7a0f7e1.tar.gz
linux-b23afb93d317c65cef553b804f08dec8a7a0f7e1.tar.bz2
linux-b23afb93d317c65cef553b804f08dec8a7a0f7e1.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 84d497297c5f..26c152122a42 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -50,6 +50,7 @@
 #include <linux/ptrace.h>
 #include <linux/security.h>
 #include <linux/task_work.h>
+#include <linux/memcontrol.h>
 struct linux_binprm;
 
 /*
@@ -188,6 +189,8 @@ static inline void tracehook_notify_resume(struct pt_regs *regs)
 	smp_mb__after_atomic();
 	if (unlikely(current->task_works))
 		task_work_run();
+
+	mem_cgroup_handle_over_high();
 }
 
 #endif	/* <linux/tracehook.h> */
author	Tejun Heo <tj@kernel.org>	2015-11-05 18:46:11 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2015-11-05 19:34:48 -0800
commit	b23afb93d317c65cef553b804f08dec8a7a0f7e1 (patch)
tree	41c1e46049d02c6649f2e773a7b9fbb26094e115 /include/linux/tracehook.h
parent	626ebc4100285be56fe3546f29b6afeb36b6871a (diff)
download	linux-b23afb93d317c65cef553b804f08dec8a7a0f7e1.tar.gz linux-b23afb93d317c65cef553b804f08dec8a7a0f7e1.tar.bz2 linux-b23afb93d317c65cef553b804f08dec8a7a0f7e1.zip